Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation
Thought for the Day Would you still use Google if it took 3 days and 7 people to get a search result?
days for a single query constant tuning Nearly 70% of data warehouses experience performance-constrained issues of various types. - Gartner 2010 Magic Quadrant specialized resources required months to deploy 3
Traditional data warehouses are just too complex They are based on databases optimized for transaction processing NOT to meet the demands of advanced analytics on big data. Too complex an infrastructure Too inefficient at analytics Too complicated to deploy Too many people needed to maintain Too much tuning required Too costly to operate Too long to get answers 4
IBM Netezza s revolutionary approach The Appliance Simpler, faster, more accessible analytics This is what IBM Netezza has done in the data warehousing market: It has totally changed the way we think about data warehousing. - Philip Howard, Bloor Research 5
TwinFin The true data warehousing appliance. Purpose-built analytics engine Integrated database, server and storage Standard interfaces Low total cost of ownership Speed: 10-100x faster than traditional system Simplicity: Minimal administration and tuning Scalability: Peta-scale user data capacity Smart: High-performance advanced analytics 6
Traditional Data Warehouse Complexity
Data Warehousing Simplified
Information Management Inside the TwinFin Optimized Hardware + Software Purpose-built for high performance analytics; requires no tuning True MPP All processors fully utilized for maximum speed and efficiency Streaming Data Hardware-based query acceleration for blistering-fast results Deep Analytics Complex analytics executed in-database for deeper insights 9
AIX Legacy Solution: Move Data to Query Resulting in Significant I/O Bottlenecks Client SOLARIS HP-UX WINDOWS LINUX Database Server Storage SQL DATA ETL Server Source Systems DBA CLI 3rd Party Apps I/O CACHE I/O CACHE I/O I/O CACHE I/O SQL High Performance Loader Data 10
The Netezza Approach -- Asymmetric Massively Parallel Processing Move the Query to the Data to eliminate I/O limitations AIX Netezza TwinFin Appliance Client SOLARIS HP-UX WINDOWS LINUX ODBC 3.X JDBC Type 4 OLE-DB SQL/92 SQL Compiler 1 2 S-Blade Processor & streaming DB logic S-Blade Processor & Query Plan Execution Engine 3 streaming DB logic S-Blade Processor & streaming DB logic Source Systems ETL Server DBA CLI 3rd Party Apps High-Speed Loader/Unloader Optimize Admin Front End DBOS SMP Host Network Fabric 960 High-Performance Database Engine Streaming joins, aggregations, sorts S-Blade Processor & streaming DB logic Massively Parallel Intelligent Storage High Performance Loader 11
S-Blade Data Stream Processing Move the Query to the Data FPGA Core CPU Core Compression Engine Project Restrict, Visibility Complex Joins, Aggs, etc. 12 96 S-Blade Data Processing Streams per cabinet IBM Netezza processes DB functions in HW, with compression boosting performance up to 4x Data I/O is reduced by 95%-98%
Advanced Analytics the Traditional Way SAS Data Warehouse Analytics Grid Data SQL ETL Demand Forecasting ETL SQL R, S+ ETL C/C++, Java, Python, Fortran, Fraud Detection SQL 13
Advanced Analytics with TwinFin SAS Data Warehouse Analytics Grid Data SQL ETL Demand Forecasting ETL SQL R, S+ ETL C/C++, Java, Python, Fortran, Fraud Detection SQL 14
Advanced Analytics with TwinFin SAS SQL Demand Forecasting R, S+ C/C++, Java, Python, Fortran, SQL Fraud Detection 15
In Database Analytics: Moving application work In-Database 1. Replace a traditional database with Netezza for serving up SAS datasets 50 times faster at Endo Pharma 2. Recode portions of SAS Procedures to Netezza SQL 22 hours on Oracle/IBM RS6000 9 minutes on Netezza 3. Fully embedded partner applications SAS Scoring Accelerator for Netezza o 270 times faster at Catalina o http://www.sas.com/news/preleases/catalinanetezza.html Fuzzy Logix C++ Regression Models & Algorithm Library o Sears Market Basket Analysis 5 minutes at Catalina o Provider Scoring at Humana o Six weeks (25 SAS jobs on Oracle) 28 minutes (Fuzzy Logix/Netezza) 4. Extension of Code support beyond C/C++ Java, Python, Fortran, R, MapReduce, Hadoop 16
The IBM Netezza TwinFin Appliance Disk Enclosures Slice of User Data Swap and Mirror partitions High speed data streaming SMP Hosts S-Blades (with FPGA-based Database Accelerator) SQL Compiler Query Plan Optimize Admin Processor & streaming DB logic High-performance database engine streaming joins, aggregations, sorts, etc. 17
IBM Netezza S-Blade Operates as a logical unit of 1: 1 Disk 1 CPU Core 1 Field Programmable Gate Array (FPGA) Core 18
IBM Netezza TwinFin Appliance Scalability 1 10... TF3 TF6 TF12 TF18 TF24 TF36 TF48 TF72 TF96 TF120 Cabinets 1/4 1/2 1 1.5 2 3 4 6 8 10 Processing Units Capacity (TB) Effective Capacity (TB)* 24 48 96 144 192 288 384 576 768 960 8 16 32 48 64 96 128 192 256 320 32 64 128 192 256 384 512 768 1024 1280 Predictable, Linear Scalability throughout entire family Capacity = User Data space Effective Capacity = User Data Space with compression *: 4X compression assumed 19
IBM Netezza = Simplicity NO INDEXES saves disk space, allows maximum flexibility NO dbspace/tablespace sizing and configuration NO redo/physical log sizing and configuration NO journaling/logical log sizing and configuration NO page/block sizing and configuration for tables NO extent sizing and configuration for tables NO temp space allocation and monitoring NO RAID level decisions for dbspaces NO logical volume creations of files NO integration of OS kernel recommendations NO maintenance of OS recommended patch levels NO JAD sessions to configure host/network/storage Simple partitioning strategies: HASH or ROUND ROBIN Benefits Instead of spending time and effort on tedious DBA tasks, use the time for higher BUSINESS VALUE tasks: Bring on new applications and groups Quickly build out new data marts Provide more functionality to your end users 20
IBM Netezza Simplicity and Effects on TCO Telco Service Provider Deployment Telecom Call Detail Record FACT (6 billion rows) Oracle Object Count * IBM Netezza Object Count Tables 1 1 Indexes 12 Table Partitions 47 Index Partitions 564 Table Partitions tablespaces 47 Index Partitions tablespaces 47 Table Data Files 170 Index Data Files 122 TOTAL 1,010 1 Look at all the weeks/months worth of effort, DBA design and maintenance that we don't have with IBM Netezza. The appliance claims are true. *: Oracle data does not account for ADDITIONAL effort required in configuring and engineering the file system design to accommodate this index management scheme. 21 2010 IBM Corporation
Time to Deployment Time to Deployment Comparison Traditional Analytics Solution Months Final Testing & Production Preparation & Initialization Planning & Installation IBM Netezza TwinFin Traditional DW Approach IBM Netezza Approach Days or Weeks 22
Seamless Integration with Enterprise Ecosystem Certified interoperability with leading applications and tools Data Integration Business Intelligence and Analytics Advanced Analytics and Data Mining 23
IBM Netezza Analytics Appliance = Value Network Scale Performance Predictable, Linear Scalability to meet growing network and data analytics demands 10-100x Faster than other solutions..enables speed of thought analysis Extreme Performance for any Ad Hoc query, Predictive Modeling, What-if Analysis Appliance Simplicity Simple Installation and Operation No Configuration, No Tuning, No Indexing Perfect Fit as Embedded Technology Component within Partner Solution Architectural Flexibility Performance adapts to changing business models ad hoc ability to question everything Avoids brittle architecture that comes with physical database design Flexible, Extensive Workload Management Capabilities 24
Retail Success: Merchandising & Planning 25
Telco Success: Subscriber Data Mgmt 26
Digital Media Success: Consumer Insight 27
Financial Services Success: Risk & Credit Management 28
Health & Life Sciences Success: Customer Intelligence for Providers 29
Government Success: Smart Grid 30
31 31
IBM Netezza is part of IBM Information Management Transactional & Collaborative Applications Integrate Analyze Business Analytic Applications Manage Master Data Data Warehouses www Big Data Structured Data Streams External Information Sources Data Content Data Warehouse Appliances Streaming Information Govern Quality Lifecycle Management Security & Privacy 32