EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH

Similar documents
Optimizing Tiered Storage Workloads with Precise for Storage Tiering

EMC DATA PROTECTION, FAILOVER AND FAILBACK, AND RESOURCE REPURPOSING IN A PHYSICAL SECURITY ENVIRONMENT

EMC Ionix ControlCenter (formerly EMC ControlCenter) 6.0 StorageScope

EMC Celerra Manager Makes Customizing Storage Pool Layouts Easy. Applied Technology

Shine a Light on Dark Data with Vertica Flex Tables

EMC Disk Library Automated Tape Caching Feature

Managing Oracle Real Application Clusters. An Oracle White Paper January 2002

Oracle Database 10g Resource Manager. An Oracle White Paper October 2005

EMC DiskXtender for Windows and EMC RecoverPoint Interoperability

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Jet Data Manager 2014 SR2 Product Enhancements

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

SQL Diagnostic Manager Management Pack for Microsoft System Center

EMC Business Continuity for Microsoft SharePoint Server (MOSS 2007)

Sage MAS 200 SQL Server Edition Introduction and Overview

Crystal Reports. Overview. Contents. How to report off a Teradata Database

Optimize Your Databases Using Foglight for Oracle s Performance Investigator

EMC GREENPLUM DATA COMPUTING APPLIANCE: PERFORMANCE AND CAPACITY FOR DATA WAREHOUSING AND BUSINESS INTELLIGENCE

PUBLIC SAP Vora Sizing Guide

1.0. Quest Enterprise Reporter Discovery Manager USER GUIDE

Part 1: Indexes for Big Data

Ebook : Overview of application development. All code from the application series books listed at:

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

CLOUDIQ OVERVIEW. The Quick and Smart Method for Monitoring Unity Systems ABSTRACT

EMC CLARiiON Backup Storage Solutions

EMC Virtual Architecture for Microsoft SharePoint Server Reference Architecture

PRISM - FHF The Fred Hollows Foundation

Oracle Big Data Connectors

Maintaining a Microsoft SQL Server 2008 Database (Course 6231A)

Enterprise Reporting -- APEX

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

White Paper: Clustering of Servers in ABBYY FlexiCapture

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

About ADS 1.1 ADS comprises the following components: HAWQ PXF MADlib

SQL Server 2008 Consolidation

Netezza Basics Class Outline

EMC SourceOne Management Pack for Microsoft System Center Operations Manager

Hyperion Interactive Reporting Reports & Dashboards Essentials

MicroStrategy Desktop Quick Start Guide

Creating Reports using Report Designer Part 1. Training Guide

The InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure.

Veritas Storage Foundation for Windows by Symantec

Performance Tuning in SAP BI 7.0

Microsoft Dynamics AX. Lifecycle Services: Operate Phase. Last Updated: June 2014 AX 2012 R3 / Version 1.0.0

CA ERwin Data Modeler

Veritas Storage Foundation for Windows by Symantec

MIGRATING TO DELL EMC UNITY WITH SAN COPY

EMC VIPR SRM: VAPP BACKUP AND RESTORE USING VMWARE VSPHERE DATA PROTECTION ADVANCED

IBM. Database Database overview. IBM i 7.1

Microsoft Developing SQL Databases

Analytic Workspace Manager and Oracle OLAP 10g. An Oracle White Paper November 2004

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

EMC VNX Series: Introduction to SMB 3.0 Support

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

20762B: DEVELOPING SQL DATABASES

Course Description. Audience. Prerequisites. At Course Completion. : Course 40074A : Microsoft SQL Server 2014 for Oracle DBAs

Oracle Reports 6.0 New Features. Technical White Paper November 1998

WHITE PAPER: ENTERPRISE AVAILABILITY. Introduction to Adaptive Instrumentation with Symantec Indepth for J2EE Application Performance Management

Microsoft. [MS20762]: Developing SQL Databases

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

Security Explorer 9.1. User Guide

IBM i Version 7.2. Database Database overview IBM

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

DELL EMC UNITY: DATA REDUCTION

Netwrix Auditor for Active Directory

Installation Guide. EventTracker Enterprise. Install Guide Centre Park Drive Publication Date: Aug 03, U.S. Toll Free:

White Paper. Major Performance Tuning Considerations for Weblogic Server

Quick Reference. EMC ApplicationXtender Media Distribution Viewer 5.40 P/N REV A01

TOAD TIPS & TRICKS. Written by Jeff Podlasek, Toad DB2 product manager, Quest

Developing SQL Databases

Using EMC FAST with SAP on EMC Unified Storage

Veritas Storage Foundation for Windows by Symantec

SQL Server New innovations. Ivan Kosyakov. Technical Architect, Ph.D., Microsoft Technology Center, New York

Quest Central for DB2

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop

Project and Portfolio Management Center

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Enterprise Data Catalog for Microsoft Azure Tutorial

Apache HAWQ (incubating)

Plug-in for VMware vcenter

Technical Note P/N REV A01 March 29, 2007

Tabular Building Template Manager (BTM)

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Microsoft Office SharePoint Server 2007

Massive Scalability With InterSystems IRIS Data Platform

Data Lake Based Systems that Work

Management Reports Centre. User Guide. Emmanuel Amekuedi

EMC Documentum Connector for Microsoft SharePoint Farm Solution

In-Memory Computing EXASOL Evaluation

Scorebook Navigator. Stage 1 Independent Review User Manual Version

Migrate from Netezza Workload Migration

EMC Documentum TaskSpace

07/20/2016 Blackbaud Altru 4.91 Reports US 2016 Blackbaud, Inc. This publication, or any part thereof, may not be reproduced or transmitted in any

InfoSphere Master Data Management Reference Data Management Hub Version 10 Release 0. User s Guide GI

Standardize Microsoft SQL Server Cluster Provisioning Using HP DMA

EMC SourceOne for Microsoft SharePoint Version 6.7

Business Insight Authoring

EMC Celerra Virtual Provisioned Storage

CENTRALIZED MANAGEMENT DELL POWERVAULT DL 2100 POWERED BY SYMANTEC

Transcription:

White Paper EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH A Detailed Review EMC SOLUTIONS GROUP Abstract This white paper discusses the features, benefits, and use of Aginity Workbench for EMC Greenplum a comprehensive management and development tool, specially tailored for the features and architecture of the EMC Greenplum Database. August 2011

Copyright 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All trademarks used herein are the property of their respective owners. Part Number: H8762 EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 2

Table of contents Executive summary... 5 Business case... 5 Solution overview... 5 Key benefits... 5 Introduction... 7 Purpose... 7 Scope... 7 Audience... 7 Terminology... 7 Technology overview... 8 Overview... 8 Aginity Workbench... 8 EMC Greenplum Database... 8 Configuration... 9 Overview... 9 Environment diagram... 9 Greenplum environment description... 10 EMC Greenplum Master Server... 10 EMC Greenplum Segment Servers... 10 Operational scenarios... 11 Overview... 11 List of scenarios... 11 Scenario 1: Browse objects in the Greenplum Database... 11 Scenario 2: Examine data distribution in the Greenplum Database... 13 Scenario 3: Identify poorly performing queries and optimize performance... 16 Scenario 4: Examine the status of Greenplum segments... 19 Scenario 5: Optimize space usage in a Greenplum Database... 21 Scenario 6: Examine roles and resource queues... 23 Scenario 7: Import or export data into or out of a database... 24 Conclusion... 27 Summary... 27 Findings... 27 EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 3

References... 28 White papers... 28 Product documentation... 28 Other information... 28 EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 4

Executive summary Business case The EMC Greenplum Database is a high-performance data warehouse system that employs a massively parallel processing (MPP) architecture many servers working in parallel on database tasks. While the details of the architecture and operation are largely hidden from database users, database administrators (DBAs) and developers often need access to these details to check system health, ensure optimal performance, and develop business analytics quickly and easily to derive value from the data in the warehouse. Standard query and DBA tools fall short of providing visibility into the features of parallel-processing architecture in general, and the unique features of the Greenplum Database in particular. Solution overview Aginity Workbench for EMC Greenplum (Aginity Workbench) offers a simple and efficient method of managing a Greenplum Database. Aginity Workbench gives you a single point of access to manage, monitor, and develop a Greenplum Database, by offering a range of tools and functions that look deep into the Greenplum architecture. With Aginity Workbench, you can: Examine the operational status of all segments Browse all objects in the Greenplum Database and make modifications Run multiple queries and export results to common file formats including Microsoft Excel Generate SQL and DDL with drag-and-drop ease Analyze query plans Quickly find tables that should be vacuumed to free up database resources See how primary and mirror Segment Instances are distributed across the Segment Servers Graphically view table distribution and easily spot distribution skew Easily redistribute data Key benefits Aginity Workbench brings a new level of insight into the Greenplum Database that no other graphical user interface (GUI) tool can provide. Benefits of using Aginity Workbench include: Ease of use - With a single access point from a user-friendly GUI, you require less time and effort to accomplish daily tasks with the Greenplum Database. Access to individual components allows for detailed diagnostics - You can analyze, test, and reset the database servers more quickly, which reduces down time. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 5

Optimization of database performance - You can adjust the database settings to maximize its performance. Reduction of user errors - Developers can use the built-in functions instead of user-written scripts, which reduces errors and time spent on scripting. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 6

Introduction Purpose The purpose of this white paper is to examine the functionality of the Aginity Workbench and demonstrate the benefits of using it to access, manipulate, and monitor a Greenplum Database. Scope This white paper describes the features and benefits of using Aginity Workbench in a Greenplum Database environment and describes the functionality of the main features of the product. This white paper does not provide configuration information for installing Aginity Workbench into a Greenplum environment. Audience This white paper is intended for EMC employees, partners, customers, and anyone interested in using Aginity Workbench to manage a Greenplum Database. Terminology Term Analytics This white paper includes the following terminology. Table 1. Definition Terminology Analytics is the study of operational data using statistical analysis with a goal of identifying and using patterns to optimize business performance. Business intelligence DDL Master Server Massively parallel processing (MPP) Segment Server Shared-nothing architecture SQL Business intelligence is the effective use of information assets to improve the profitability, productivity, or efficiency of a business. Frequently, IT professionals use this term to refer to the business applications and tools that enable such information usage. Data Definition Language is the syntax that is used to define and create objects in a relational database. In an EMC Greenplum Database, the Master Server or Host controls the operation of the entire system and is the main connection point for external clients accessing the database. The Master Server distributes incoming queries to the Segment Servers, gathers the results, and returns them to the client. MPP is the coordinated processing of data by multiple machines that work together on a task. In a shared-nothing MPP architecture, such as EMC Greenplum, each machine has its own memory and storage and is not choked by negotiation of shared resources. In an EMC Greenplum Database, a Segment Server is one of the worker nodes/servers that is used to do the work in the MPP deployment. Shared-nothing is a distributed computing architecture made up of a collection of independent, self-sufficient servers. This is in contrast to a traditional central computer that hosts all information and processing in a single location. Structured Query Language is the syntax that is used to access data from a relational database. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 7

Technology overview Overview The primary components used in this environment are: Aginity Workbench EMC Greenplum Database Aginity Workbench Aginity Workbench makes developers and DBAs more productive by using tools that give new access and insight into the Greenplum Database and Greenplum Data Computing Appliance. Created by and for Aginity s own developers, Aginity Workbench is a client-based application that communicates with the Greenplum Database and has a deep understanding of the Greenplum internal architecture. For developers, Aginity Workbench has an intuitive interface for creating, managing, and tracking both individual SQL queries and entire databases. Sophisticated tools help developers analyze and tune queries for maximum performance. Results can be easily viewed or exported to other formats, such as Microsoft Excel, for further use. For DBAs, Aginity Workbench provides graphical information on important properties such as node status, database size and bloat, and table distribution and skew. Builtin functions assist with generating the commands used to maintain and optimize the database operation and health. EMC Greenplum Database EMC Greenplum Database is a shared-nothing, MPP architecture that has been designed for business intelligence and analytical processing. In this architecture, each server node acts as a self-contained database management system that owns and manages a distinct portion of the overall data. The system automatically distributes data and parallelizes query workloads across all available hardware. The core shared-nothing MPP architecture enables massive data storage, loading, and processing with linear scalability. Adaptive services provide worldwide businesses with high availability, workload management, and online expansion of capacity. Key product features enable petabyte-scale loading, hybrid storage (row or column) to best fit the unique needs of each analytical use case, and embedded support for SQL, MapReduce, and programmable analytics. In addition, all major third-party analytic and administration tools are supported through standard client interfaces. The core principle of the EMC Greenplum Database is to move the processing dramatically closer to the data and its users. This effectively enables the computational resources to process every query in a fully parallel manner, use all storage connections simultaneously, and flow data efficiently between resources as the query plan dictates. The result is that complex processing can be pushed down in close proximity to the data for maximum efficiency and incredible performance. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 8

Configuration Overview Aginity Workbench is a Microsoft Windows-based tool and can attach to any Greenplum Database. Aginity Workbench uses a native EMC Greenplum connection from the Microsoft Windows client to the Greenplum Database. Aginity Workbench is a.net application and is currently supported on the following platforms: Windows XP (32-bit) Windows 7 (32-bit and 64-bit) Windows Server 2003 (32-bit and 64-bit) Windows Server 2008 (32-bit and 64-bit) Environment diagram In this white paper, several operational scenarios are described to show how the Aginity Workbench integrates with the Greenplum Database and makes it easier for you to manage the system. Figure 1 shows a generic Greenplum environment being managed by Aginity Workbench. Figure 1. Aginity Workbench in a generic Greenplum environment EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 9

Greenplum environment description Aginity Workbench runs on a Windows client that has a connection to the Greenplum Master Server through the data center network. You can use Aginity Workbench to develop and analyze queries, as well as maintain and optimize the database. EMC Greenplum Master Server The Greenplum Master Server is the access point for all user requests to the Greenplum Database and it also handles all coordination of the Segment Servers. EMC Greenplum Segment Servers The Greenplum Segment Servers are the workers of the Greenplum Database and perform all MPP tasks. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 10

Operational scenarios Overview This section details some common operational scenarios of the Aginity Workbench that you can use to manage the Greenplum Database. List of scenarios Aginity Workbench was exercised in the following scenarios: Scenario 1: Browse objects in the Greenplum Database Scenario 2: Examine data distribution in the Greenplum Database Scenario 3: Identify poorly performing queries and optimize performance Scenario 4: Examine the status of Greenplum segments Scenario 5: Optimize space usage in a Greenplum Database Scenario 6: Examine roles and resource queues Scenario 7: Import or export data into or out of a database Scenario 1: Browse objects in the Greenplum Database The purpose of this scenario is to expand schemas to view tables, columns, views, stored procedures, and other database objects. A key function of any database tool is to simply allow browsing and examination of database objects. Aginity Workbench has a familiar tree structure to walk into the hierarchy of the database. Figure 2 shows the top-level view of a Greenplum Database showing the databases - and their sizes - in the system. Figure 2. Aginity Workbench tree structure EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 11

Figure 3 shows a database expanded to display database objects. The view displays Greenplum-specific objects and information such as Partitions and the Distributed By clause in a table definition. This information is typically missed by tools that do not understand the Greenplum architecture. Figure 3. Expanded database showing database objects EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 12

Each of the objects has a robust context menu that provides many useful functions that DBAs and developers can use to work more efficiently. Figure 4 shows the ability to quickly construct a Select statement for a particular table. Figure 4. Select statement script The resulting Select statement can be edited as desired and then executed. Additional menu selections will build Insert, Update, and Delete statements as well as the DDL commands to create the table. These commands can be sent to the workbench query window as well as to the clipboard for pasting into other programs. These shortcut functions are handy for both initial design as well as reverse engineering of existing designs. Note Commands are only shown in the menu if they are relevant to the object. Scenario 2: Examine data distribution in the Greenplum Database The purpose of this scenario is to: Check the data distribution of tables to determine how well the data is balanced across all the Segment Servers Identify a poorly distributed table and redistribute the data for better query performance EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 13

Figure 5 shows a poor table distribution. Figure 5. Query results showing poor table distribution EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 14

To change the table distribution, you need to choose the Change distribution option, under Advanced, as shown in Figure 6. Figure 6. Select Change distribution menu option As shown in Figure 7, you can choose one or more of the Available Columns by which to redistribute the table. In this example, proc_id was selected. While Aginity Workbench makes it easy to change the distribution key, it is up to you to choose the column (or columns) that will actually result in a better distribution of the data. Selecting multiple columns for a distribution key makes a composite key from those columns. Figure 7. Select redistribution criteria and execute command EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 15

After clicking OK, Aginity Workbench provides you with the commands that perform the redistribution. As redistribution is a significant activity on all the data in a table, you must manually verify and start the execution of the command. Choosing Show Distribution again now shows the results of this redistribution activity. Figure 8 shows the successful completion of the table redistribution. Figure 8. Successful completion of redistribution showing good table distribution Scenario 3: Identify poorly performing queries and optimize performance The purpose of this scenario is to: Identify poorly performing queries Examine the Explain Plan for the query and determine the reason for the poor performance Optimize the query and verify that it performs better EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 16

To identify poorly performing queries, you go to the Object menu, and under Database choose Show Query History. Figure 9 shows the Query History window. It provides several filters to narrow down the list. The Duration column visualizes query duration, for ease of interpretation. Figure 9. Query History After a query is selected, the context menu enables you to choose Explain SQL Statement, which shows the full query and the query plan. It also provides the output of an Explain Analysis of the query. Figure 10 shows the Explain Plan for the selected query. However, for larger and more complex Explain Plans, it may be difficult to read through all the output. Figure 10. Explain Plan for the selected query EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 17

As shown in Figure 11, Aginity Workbench supports you by providing iterator output of the query. This option is available in the Context menu of the query. Figure 11. Explain Plan The iterators give much more detailed information for the steps of the Explain Plan. Iterators are available for queries that have been executed and captured in the Greenplum Performance Monitor Database. Figure 12 shows the Query Plan window with the query plan as a navigation tree in the left pane, and summary and detail information in the right panes. You can immediately see the steps that are color-highlighted, which indicates that these are possible causes of slow performance. Figure 12. Query Plan showing iterator details EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 18

It is evident that without such easy to navigate, interactive support, it would be much more difficult to narrow down pain points in problematic queries this quickly and efficiently. Scenario 4: Examine the status of Greenplum segments The purpose of this scenario is to: Determine the operational status of Greenplum segments Determine the location of primary segments and their corresponding mirror segments Identify primary segments that have failed over to their mirror segments Observe the failback of mirror segments to the primary server when the Segment Server is restored to operation Managing a Greenplum Database means managing multiple database instances on multiple servers. Aginity Workbench supports you by providing Server Explorer. This gives a detailed view of the inner workings of the Greenplum architecture, which allows DBAs to easily visualize the system status. Server Explorer can be accessed from the Server Node in the navigation tree, as shown in Figure 13. Figure 13. Server explorer EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 19

Figure 14 shows the server in a healthy running state. Figure 14. Server explorer showing a healthy status The left pane shows the Segment Servers in the cluster. The right pane shows the configuration of each Segment Instance on each Segment Server. Columns can easily be sorted by clicking on the title of a column. Color-highlighting is used to visualize the placement of the primary-mirror pairs. For each primary-mirror pair, there is one row that shows all the configuration details, for example, role, mode, status, host, and so on. The colors show how the primary Segment Instances of a server are spread over different Segment Servers. This overview immediately informs you that there are no failed segments and that each Segment Server has six primary and six mirror Segment Instances. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 20

If any Segment Instances are in a mode or status other than Synchronized or Up, this is highlighted as shown in Figure 15 and Figure 16. Figure 15. Server Explorer showing a failover Figure 16. Server Explorer showing resynchronization In situations where you want to focus on a certain Segment Server, clicking the node name in the left pane filters the list with segments only to that particular server. Scenario 5: Optimize space usage in a Greenplum Database The purpose of this scenario is to: Determine space utilization of tables in the database Find tables that have bloat caused by deletes that have not been vacuumed Reduce system resource usage by easily executing vacuum statements on the database Periodic vacuuming of database tables helps ensure that the space occupied by deleted items is reclaimed and available for use for new data in the database. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 21

The Aginity Workbench makes it very easy to find the space used by the tables in the database. When you right-click on the database, it lets you choose the Database Maintenance option as shown in Figure 17. Figure 17. Database Maintenance This brings up a display of all the tables in that database and includes columns that show the Expected Bytes used, Actual Bytes used, Expired Bytes, and the Percent Unused. As shown in Figure 18, the Diagnostics Message column gives an indication of the amount of bloat in the table. Tables with high bloat (deleted objects whose space can be reclaimed) can be easily vacuumed right from the menu. Figure 18. Diagnostics Message showing bloat EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 22

Scenario 6: Examine roles and resource queues The purpose of this scenario is to: Examine the properties of resource queues Identify the resource queues to which roles are assigned An important aspect of Greenplum performance management is the notion of roles and resource queues. Roles roughly correspond to database users, and each user or role is assigned to a particular resource queue. Resource queues have associated properties that determine how much of the Greenplum system resources are applied to queries that run in those queues. Aginity Workbench can display the properties of resource queues as shown in Figure 19. Figure 19. Resource queues and user roles EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 23

Aginity Workbench understands the difference between resource queues with active statement limits and resource queues that have maximum query cost limits. It also understands the different priorities that resource queues can have. Aginity Workbench also displays properties of user roles, and can show the resource queue to which each role or user is assigned, as shown in Figure 19. This easy access to workload management information helps DBAs properly allocate system resources so that database jobs are executed with the greatest efficiency. Scenario 7: Import or export data into or out of a database The purpose of this scenario is to: Import data from a disk file to the database Export data from the database to a disk file Moving data into a database from a flat file (TXT or CSV), and exporting data from a table into a flat file, are common actions for developers as well as DBAs. Greenplum provides the SQL COPY command, which can load an entire file into the database, and is considerably more efficient than executing INSERT statements and much easier than writing a script to load data. Unfortunately, the syntax for the SQL COPY command is a little tricky and, unless you use it every day, easy to forget or enter incorrectly. Aginity Workbench provides an easy way of importing data into the database from flat files and also exporting data from a table back to a disk file. To import data from a CSV file, you right-click the table into which you want to load the data and choose Import Data. In Import Data, as shown in Figure 20, you can specify the location of the file and the format. You can also specify the encoding, delimiters, escape characters, whether the input file has a header row, as well as the Segment reject limit. The reject limit sets the number of errors in the input file that you are willing to accept before aborting the load. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 24

Figure 20. Import Data As shown in Figure 21, the SQL tab shows the corresponding SQL COPY command that is generated, which can be edited further. Figure 21. SQL tab in Import Data window Getting data out of the database and into flat files is just as easy; you right-click the table and choose Export Data. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 25

In Export Data, as shown in Figure 22, on the Parameters tab, you can specify many of the same kinds of properties as for importing data. The Selection tab allows you to specify the columns you want to export as well as an order-by clause for your desired sorting order. Figure 22. Export Data While the import and export functions do not use the Greenplum gpload/gpfdist programs for parallel bulk loading of extremely large amounts of data, these functions are very handy for quickly getting smaller amounts of data into and out of the database. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 26

Conclusion Summary Aginity Workbench integrates easily with EMC Greenplum Database and allows you to quickly and efficiently manage, monitor, and access large-scale enterprise data warehouses. Findings Aginity Workbench features and functionality provides many benefits including: Ease of use, reduction of overhead, and improved return on investment Access to individual components in the database, which allows for detailed diagnostics and fine tuning Optimization of database performance Reduction of errors and down time Aginity Workbench is unmatched in its ability to expose the internals of the Greenplum Database and optimize the database with ease. EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 27

References White papers For additional information, see the white papers listed below. EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence An Architectural Overview EMC Greenplum Database 4.0 Critical Mass Innovation Product documentation Other information For additional information, see the product document listed below. Greenplum Database 4.1 Administrator Guide For additional information and to download the software, see the websites listed below. Aginity.com Greenplum.com EMC Greenplum Management Enabled by Aginity Workbench A Detailed Review 28