InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary

Similar documents
EMC CLARiiON CX3 Series FCP

vstart 50 VMware vsphere Solution Specification

IBM Data Warehousing Balanced Configuration Unit for AIX, V1.1 accelerates development of data warehouse and business intelligence infrastructures

IBM SPSS Text Analytics for Surveys

DELL Reference Configuration Microsoft SQL Server 2008 Fast Track Data Warehouse

IBM System Storage DS5020 Express

Reference Architecture

EMC CLARiiON Backup Storage Solutions

Designing a Reference Architecture for Virtualized Environments Using IBM System Storage N series IBM Redbooks Solution Guide

12/04/ Dell Inc. All Rights Reserved. 1

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

WebSphere Commerce Developer Professional

IBM BigFix Lifecycle 9.5

DS8880 High Performance Flash Enclosure Gen2

IBM System Storage. Tape Library. A highly scalable, tape solution for System z, IBM Virtualization Engine TS7700 and Open Systems.

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Introduction to the CX700

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Dell Microsoft Reference Configuration Performance Results

Deploying EMC CLARiiON CX4-240 FC with VMware View. Introduction... 1 Hardware and Software Requirements... 2

Dell/EMC CX3 Series Oracle RAC 10g Reference Architecture Guide

p5 520 server Robust entry system designed for the on demand world Highlights

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning

EMC DMX Disk Arrays with IBM DB2 Universal Database Applied Technology

EMC Backup and Recovery for Microsoft SQL Server

... IBM Advanced Technical Skills IBM Oracle International Competency Center September 2013

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

EMC Business Continuity for Microsoft Applications

IBM System Storage DS4800

IBM Storwize V5000 disk system

IBM System Storage DCS3700

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

IBM SPSS Statistics Desktop

Models PDC/O5000 9i W2K Cluster Kit B24

DS8880 High-Performance Flash Enclosure Gen2

IBM s Data Warehouse Appliance Offerings

IBM Geographically Dispersed Resiliency for Power Systems. Version Release Notes IBM

A GPFS Primer October 2005

IBM System Storage DS5020 Express

Lenovo RAID Introduction Reference Information

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange Enabled by MirrorView/S

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange 2007

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

WebSphere Commerce Professional

Microsoft Office SharePoint Server 2007

IBM FlashSystem V Quick Start Guide IBM GI

EMC Backup and Recovery for Microsoft Exchange 2007

DELL TM AX4-5 Application Performance

PowerVM simplification enhancements. PowerVM simplification enhancements. PowerVM infrastructure overview

Microsoft SQL Server 2012 Fast Track Reference Architecture Using PowerEdge R720 and Compellent SC8000

IBM TS7700 grid solutions for business continuity

IBM FlashSystem V MTM 9846-AC3, 9848-AC3, 9846-AE2, 9848-AE2, F, F. Quick Start Guide IBM GI

EMC Tiered Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON CX4 and Enterprise Flash Drives

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

White Paper. A System for Archiving, Recovery, and Storage Optimization. Mimosa NearPoint for Microsoft

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

IBM System p5 570 POWER5+ processor and memory features offer new options

IBM Ethernet Switch J08E and IBM Ethernet Switch J16E

Technical Note P/N REV A01 March 29, 2007

IBM Storwize V7000: For your VMware virtual infrastructure

IBM Z servers running Oracle Database 12c on Linux

IBM System Storage DS6800

IBM System p5 550 and 550Q Express servers

EMC Backup and Recovery for Microsoft Exchange 2007 SP1. Enabled by EMC CLARiiON CX4-120, Replication Manager, and VMware ESX Server 3.

Dell Solution for JD Edwards EnterpriseOne with Windows and SQL 2000 for 50 Users Utilizing Dell PowerEdge Servers And Dell Services

The Deployment of SAS Enterprise Business Intelligence Solution in a large IBM POWER5 Environment

IBM Proventia Management SiteProtector. Scalability Guidelines Version 2.0, Service Pack 7.0

Speed to Market with Open Source

Vblock Architecture Accelerating Deployment of the Private Cloud

IBM TotalStorage Storage Switch L10

EMC Integrated Infrastructure for VMware. Business Continuity

EMC CLARiiON CX3-80. Enterprise Solutions for Microsoft SQL Server 2005

IBM System i Model 515 offers new levels of price performance

WebSphere Commerce Developer Professional 9.0

Vblock Architecture. Andrew Smallridge DC Technology Solutions Architect

Stellar performance for a virtualized world

Lenovo Database Configuration

IBM High IOPS SSD PCIe Adapters IBM System x at-a-glance guide

Dell Compellent Storage Center and Windows Server 2012/R2 ODX

Lenovo Database Configuration

IBM TotalStorage Enterprise Storage Server Model 800

Exam : Title : High-End Disk for Open Systems V2. Version : DEMO

IBM FileNet Content Manager and IBM GPFS

Demartek December 2007

Solutions for iseries

IBM Real-time Compression and ProtecTIER Deduplication

ServeRAID-BR10il SAS/SATA Controller v2 for IBM System x IBM System x at-a-glance guide

Storage Consolidation with the Dell PowerVault MD3000i iscsi Storage

SAS workload performance improvements with IBM XIV Storage System Gen3

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

DELL MICROSOFT REFERENCE CONFIGURATIONS PHASE II 7 TERABYTE DATA WAREHOUSE

Using Synology SSD Technology to Enhance System Performance Synology Inc.

IBM Shared Disk Clustering. Hardware Reference

Tivoli Storage Manager for Virtual Environments: Data Protection for VMware Solution Design Considerations IBM Redbooks Solution Guide

HP StorageWorks MSA/P2000 Family Disk Array Installation and Startup Service

Lenovo Enterprise Capacity Solid State Drives Product Guide

Disk Storage Systems. Module 2.5. Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved. Disk Storage Systems - 1

Lenovo Database Configuration for Microsoft SQL Server TB

Continuous Availability with the IBM DB2 purescale Feature IBM Redbooks Solution Guide

IBM DB2 Analytics Accelerator for z/os, v2.1 Providing extreme performance for complex business analysis

Transcription:

InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary v1.0 January 8, 2010

Introduction This guide describes the highlights of a data warehouse reference architecture developed jointly between the IBM DB2 development lab and the EMC CLARiiON engineering lab. The intent of the project is to bring together IBM and EMC best practices for deploying a warehouse and to develop a prescriptive approach to integrating the following components: IBM Power 570 and Power 550 servers EMC CLARiiON CX4 storage AIX operating system IBM InfoSphere Warehouse software A full reference guide is available that describes the architecture in detail and provides a racking scheme to help standardize deployments. In addition, software tools for use solely by IBM and EMC field teams have been created and are available to assist in deploying this architecture. A standardized testing approach was used, comprised of performance and stability phases as well as validation of the deployment tools. The intent of this project is to offer prescriptive guidance to customers who wish to deploy a warehouse with IBM servers and software plus EMC storage. This guidance simplifies the deployment process and provides customers with quicker time to value by eliminating the need to design a customized solution. It is important to note that this architecture is a joint reference architecture only and is not offered as a product from either company. IBM and EMC product support for the individual server, storage, and software components still apply. Customers will assume ultimate responsibility for integrating a warehouse solution built using this guide. Technical Details Background This architecture is designed specifically to support the IBM DB2 Database Partitioning Feature (DPF), a feature that provides shared nothing parallel processing. When DPF is used: A system consists of separate processing segments known as database partitions Each database partition has its own private memory, logging, and locking facility Memory and processes are not shared, providing independence and linear scalability 2

The IBM best practices extend this implementation into the physical infrastructure by providing for separate servers and dedicated disk, which meets the requirements for a shared nothing parallelism environment. Architecture Component Highlights The following diagram illustrates the architecture. Detailed descriptions of each unit follow. 3

Administration unit The function of the administration unit is to house the database catalog and small tables that are not large enough to be partitioned across the data nodes. It also serves as the connection and coordination point for users of the data warehouse. The administration unit consists of a Power 550 server logical partition (LPAR) and an EMC CLARiiON CX4-240 storage device with the following specifications: 1 x administration database partition for the database catalog, small tables, and query coordination 1 x POWER6 dual core processor, 3.5GHz or faster 30GB memory 2 x internal 146GB, 15K RPM disks 2 x 2 port, Gigabit Ethernet cards 1 x 2 port, 4Gbps Fiber host bus adapter (HBA) card An external remote I/O drawer to provide expanded slot capacity 1 x CX4-240 CLARiiON storage device with the following drives: o 5 x 300GB, 15K RPM disks for internal CLARiiON vault and the operating system o 1 x 300GB, 15K RPM hot spare o 9 x 300GB, 15K RPM disks for the database, allocated as follows: 1 x 2+P RAID5 storage device for the database home file system 2 x 2+P RAID5 storage devices for database data, temporary space, small tables, and logging Management unit The function of the management unit is to provide a central server to administrate the cluster of servers that make up the data warehouse. With the addition of external storage, this module can also host the IBM Performance Expert software and database. Performance Expert software allows database administrators and system administrators to track system and database performance. The management unit consists of a Power 550 LPAR on the same server as the administration unit and an optional expansion of the EMC CLARiiON CX4-240 storage device. The unit has the following specifications: Optional Performance Expert repository database 1 x POWER6 dual core processor, 3.5GHz or faster 2GB memory 2 x internal 146GB, 15K RPM disks 2 x 2 port, Gigabit Ethernet cards 1 x 2 port, 4Gbps Fiber HBA card External remote I/O drawer to provide expanded slot capacity shared with the administration unit If Performance Expert will be installed, the following optional drives are added to the external CX4-240 storage device: o 1 x 300GB, 15K RPM hot spare o 9 x 300GB, 15K RPM disks for the database, allocated as follows: 1 x 2+P RAID5 storage device for the database home file system 2 x 2+P RAID5 storage devices for database data, temporary space, and logging 4

Data unit The function of the data unit is to house one or more shared nothing database partitions. Testing has determined the recommended number of database partitions per server to be 8 or fewer. For this reference architecture, the most economical use of the Power 570 server is to host 8 database partitions. Individual data rows in tables are distributed across database partitions using a common hash partitioning scheme. The fact that the database tables have been broken into multiple database partitions to facilitate parallel processing is invisible to the application developer or query submitter. The data unit consists of a Power 570 server and an EMC CLARiiON CX4-960 storage device with the following specifications: 8 x database partitions for data 2 x dual core POWER6 processors, 5.0GHz or faster 32GB memory 2 x internal 146GB, 15K RPM disks 2 x 2 port, Gigabit Ethernet cards 2 x 2 port, 4Gbps Fiber HBA cards 1 x CX4-960 storage device with the following drives: o 5 x 300GB, 15K RPM disks for internal CLARiiON vault and the operating system o 4 x 300GB, 15K RPM hot spares o 96 x 300GB, 15K RPM disk drives for data allocated to 8 database partitions as follows: 12 drives per each of the 8 database partitions, configured as 4 x 2+P RAID5 storage devices for database path and data Add-on units The architecture allows for additional optional units to be added to the solution. These units include: ETL (extract, transform, and load) unit This unit can function as an additional coordinator connection node for ETL processing on established ETL servers or actually host ETL software such as IBM Information Server Application unit This unit can host a variety of application software such as the IBM InfoSphere Warehouse Enterprise Edition suite of products or the InfoSphere Warehouse Cubing Services engine. TSM unit This unit hosts IBM Tivoli Storage Manager software for dedicated backup capability. This unit can be configured to support network-based backups by attaching tape drives directly or LAN-free backup by attaching tape drives on the SAN. SAN Fabric Dedicated, redundant SAN switches are a required part of the solution. 4Gbps SAN and Fiber Channel drives are specified throughout. The intent is to deliver the SAN as part of the storage subsystem. Due to the unique processing requirements of shared nothing database parallelism, it is highly recommended that customers planning to implement this reference architecture use the dedicated, private SAN that is delivered as part of the storage subsystem. 5

Ethernet Networking This architecture uses 3 physical networks FCM network (DB2 Fast Communication Manager) This network supports inter-partition communication within the parallel database. The FCM network has the most stringent specifications. It must be hosted on a non-blocking switch to ensure that latency is not introduced, because latency would significantly reduce the benefit of the parallel processing. It also is highly recommended that it support jumbo frames to allow for more efficient transfer of large amounts of information. It is recommended that it be a private network on dedicated switches. This network is link aggregated for performance and redundancy. Corporate network The corporate network (sometimes called the external network) allows for connectivity to the company s extended network. It too can carry significant amounts of data, from ETL feeds to outbound query results to backup flows. It is recommended this network also be on a non-blocking switch and be link aggregated for performance and redundancy, allowing a high performance connectivity for any server in the solution that needs to exchange significant amounts of data with the database. A patch panel connection can be used to support connectivity to the regular company network for user and administrator connectivity.. HMC network This network is the standard Power Series Hardware Management Console (HMC) network used to command and control the servers. High Availability High availability capability is available as an option. IBM Tivoli System Automation software is used to support high availability clustering. It is important to note that the base configuration for each server is implemented with no single point of failure. Redundant network and SAN switching is also specified. Two types of HA configurations are supported and can be used throughout the solution to support failover in the event of a server failure: 1. Mutual Failover: In this configuration each pair of data and/or administration units is grouped into a high availability cluster where each server can fail over to the other. The following requirements must be met: a. IBM Tivoli System Automation for Multiplatforms b. Two Fiber Channel SAN switches c. An even number of servers to be clustered 2. HA Group: In this configuration the servers are grouped into high availability groups (HA groups) of up to 10 servers (9 administration and/or data units and 1 standby unit). This configuration has the following requirements: a. IBM Tivoli System Automation for Multiplatforms b. Fiber Channel SAN switches 2 switches for each cluster of 10 or fewer servers in HA group(s) c. One standby server per HA group, configured identically to the server component in a data unit. 6

Storage Design Why CLARiiON and Narrow RAID IBM experience with customers has shown that modular, mid-range storage is the most effective storage class to support the DB2 parallel processing model, and that it does so efficiently and in a cost effective manner. EMC also targets the CLARiiON storage device for data warehousing and continues to develop the CLARiiON product to support the requisite I/O workloads. This reference architecture makes use of the latest EMC technology by incorporating products from the new CX4 storage line. The CLARiiON CX4-240 and CX4-960 configurations were thoroughly tested to verify that they satisfy the database requirements for MBs/s for each database partition simultaneously across multiple database partitions while also meeting IOPS requirements. The configurations also provide sufficient usable space to accommodate storing enough database objects so that the resulting database workload fully utilizes both the storage subsystem and the server resources. A narrow or small RAID5 device size of 2 data disks + 1 parity disk was chosen because, in the EMC CLARiiON architecture, it best supports DB2 I/O requirements for processing data warehousing and business intelligence workloads. Because of DB2 data retrieval techniques, the I/O subsystem must support not only synchronous page reads and high IOPS rates for index processing, but also synchronous read ahead for DB2 list prefetch, sequential detection, and table space scans. This RAID device design on CLARiiON storage also provides the right balance of loop and controller loading and an economical number of disk drives. By employing the narrow RAID design, the patented EMC CLARiiON parity rotation scheme was leveraged to keep synchronous page reads small while also allowing asynchronous read ahead to be processed without additional head seek time. The EMC CLARiiON CX4 line of storage devices uses a fixed size 64KB stripe element. However, the system uses a unique scheme of rotating the parity between the drives after multiple full stripes. Host data reads that involve multiple full stripes worth of data are detected and coalesced efficiently by the storage system into reads that pull multiple consecutive pieces of 64KB from each drive as a single request to the drive. Therefore, unlike with other storage systems, there is no need to try to adjust the stripe element size when creating a striped LUN because both large and small reads are efficiently supported. After the head seek is complete, this architecture is able to read as much data as needed to satisfy the host request, which results in the most sustained MB/s of data read rate from the drives. This 2+P configuration supports the DB2 I/O configuration parameters commonly used for data warehousing, including a 16 KB page size, a 256 KB extent size, and a DB2_PARALLEL_IO setting of 4. Additional information is available on the calculations used to determine these settings. Denser 300GB Fiber Channel drives were chosen to enhance performance and to decrease the number of drives required. Tightly packing the data into bigger drives enhances performance by improving the MB/s data read rate from each of the drives because more bits per second rotate under the read head. Additionally, power consumption is improved by the use of fewer, denser drives. 7

Storage Array Layouts The following diagrams illustrate the storage layouts used. CX4-240 configuration supporting only the administration unit: In this configuration, only the administration unit is supported in a single disk-array enclosure (DAE) tray. CX4-240 configuration supporting the administration unit and one additional DAE tray: This configuration supports the administration unit and one additional tray for Performance Expert software. 8

The CX4-960 configuration supporting the data unit This configuration supplies storage for a single data unit and is repeated for every data unit in the data warehouse solution. Ninety-six data drives arrayed in 32, 2+P, RAID5 devices, each with 530GB usable space, provide the data unit with sufficient I/O for warehouse and business intelligence workloads and 16,960GBs of useable space. 9

Racking The architecture calls for an integrated racking design to reduce floor space, cabling, and complexity. It also provides for simplified expansion of the data warehouse solution by adding additional data units, which are contained in a single, EMC 40U rack. The racking design also helps to simplify the delivery of the components from both companies. IBM Power servers and components are mounted in standard EMC CLARiiON racks using a defined railing scheme. The following diagram represents a configuration consisting of an administration unit, a management unit, two data units, and drive space for optional CX4-240 storage for two other optional units, including the Performance Expert software on the management unit. 10

Capacities The InfoSphere Warehouse with Power Systems and EMC CLARiiON has the following capacities: Usable space per data unit: 16,960GB Approximate raw user data per data unit: 2TB or greater, depending on user workload Additional Resources To learn more follow these links or contact these individuals: For IBM components: Power 570: http://www-03.ibm.com/systems/power/hardware/570/index.html Power 550: http://www-03.ibm.com/systems/power/hardware/550/index.html InfoSphere Warehouse: http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp For EMC components: CX4-240: http://www.emc.com/products/detail/hardware/clariion-cx4-model-240.htm CX4-960: http://www.emc.com/products/detail/hardware/clariion-cx4-model-960.htm EMC Navisphere Management Suite: http://www.emc.com/products/detail/software/navisphere-management-suite.htm EMC PowerPath : http://www.emc.com/products/detail/software/powerpath-multipathing.htm Who to contact to learn more: EMC 508-435-1000 Sales Bruce Brinson; IBM Technology Alliances Manager brinson_bruce@emc.com Services / Delivery Bruce Kreis Global Practice Manager, CLARiiON kreis_bruce@emc.com IBM Sales Contact your IBM InfoSphere sales representative. 11

Notices IBM Corporation and EMC Corporation 2010. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, THIS PUBLICATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. This information could include technical inaccuracies or typographical errors. The information and the products or programs described in this publication are subject to change without notice. Neither IBM nor EMC shall be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from the Authors or their suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this publication to IBM or EMC products, programs, or services do not imply that they will be available in all countries in which IBM or EMC operate. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM or EMC's sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth, savings or other results. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on developmentlevel systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Trademarks IBM, the IBM logo, ibm.com, AIX, DB2, InfoSphere, Power Series, POWER6, and Tivoli are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. EMC, CLARiiON, Navisphere, and PowerPath are either registered trademarks or trademarks of EMC Corporation in the United States and/or other countries. Other company, product, or service names may be trademarks or service marks of others. Part number h6918 12