SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

Similar documents
Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

VOLTDB + HP VERTICA. page

Automating Information Lifecycle Management with

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Analyze Big Data Faster and Store It Cheaper

HANA Performance. Efficient Speed and Scale-out for Real-time BI

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

In-Memory Data Management Jens Krueger

Introduction to Database Services

Oracle Database In-Memory

data tiering in BW/4HANA and SAP BW on HANA Update 2017

SAP NLS Update Roland Kramer, SAP EDW (BW/HANA), SAP SE PBS Customer Information Day, July 1st, 2016

April Copyright 2013 Cloudera Inc. All rights reserved.

Evolving To The Big Data Warehouse

In-Memory Data Management

PUBLIC SAP Vora Sizing Guide

Oracle 1Z0-515 Exam Questions & Answers

SAP HANA SAP HANA Introduction Description:

Evolution of Capabilities Hunter Downey, Solution Advisor

In-Memory Computing EXASOL Evaluation

10/29/2013. Program Agenda. The Database Trifecta: Simplified Management, Less Capacity, Better Performance

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Insider s Guide on Using ADO with Database In-Memory & Storage-Based Tiering. Andy Rivenes Gregg Christman Oracle Product Management 16 November 2016

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Main-Memory Databases 1 / 25

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

SAP HANA Data Warehousing Foundation Data Distribution Optimizer / Data Life Cycle Manager DWF SP03

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Designing dashboards for performance. Reference deck

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Part 1: Indexes for Big Data

SAP Business Warehouse powered by SAP HANA

Jyotheswar Kuricheti

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Oracle Database In-Memory What s New and What s Coming

Deep Dive Into Storage Optimization When And How To Use Adaptive Compression. Thomas Fanghaenel IBM Bill Minor IBM

Introduction to SAP HANA and what you can build on it. Jan 2013 Balaji Krishna Product Management, SAP HANA Platform

Netezza The Analytics Appliance

Cloud Computing & Visualization

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Root Cause Analysis for SAP HANA. June, 2015

C-STORE: A COLUMN- ORIENTED DBMS

Capture Business Opportunities from Systems of Record and Systems of Innovation

Safe Harbor Statement

Guide to Licensed Options. SAP Sybase IQ 16.0 SP03

Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios

SAP HANA ADMINISTRATION

Strategic Briefing Paper Big Data

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Shabnam Watson. SQL Server Analysis Services for DBAs

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

LazyBase: Trading freshness and performance in a scalable database

EsgynDB Enterprise 2.0 Platform Reference Architecture

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Data Warehouse Appliance: Main Memory Data Warehouse

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

Oracle Database In-Memory

Data Warehousing in the Age of In-Memory Computing and Real-Time Analytics. Erich Schneider, Daniel Rutschmann June 2014

Oracle CoreTech Update OASC Opening 17. November 2014

Oracle Database In-Memory

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

Exadata Implementation Strategy

HyPer-sonic Combined Transaction AND Query Processing

Was ist dran an einer spezialisierten Data Warehousing platform?

Flash Storage Complementing a Data Lake for Real-Time Insight

HyPer-sonic Combined Transaction AND Query Processing

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992

SAP- HANA ADMIN. SAP HANA Landscape SAP HANA components, editions scenarios and guides

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

Modern Data Warehouse The New Approach to Azure BI

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

PowerCenter 7 Architecture and Performance Tuning

<Insert Picture Here> Controlling resources in an Exadata environment

SAP HANA ONLINE TRAINING. Modelling. Abstract This Course deals with SAP HANA Introduction, Advanced Modelling, and Data provision with SAP HANA

Customer SAP BW/4HANA. Salvador Gimeno 7 December SAP SE or an SAP affiliate company. All rights reserved. Customer

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Concurrency Control Goals

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

What s New in SAP Sybase IQ 16 Tap Into Big Data at the Speed of Business

Postgres Plus and JBoss

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

OLAP Introduction and Overview

An Introduction to Big Data Formats

Customer Coffee Corner for SAP IQ Using sp_iqrebuildindex()

Apache Kylin. OLAP on Hadoop

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Document databases Graph databases Metadata Column databases

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Scott Meder Senior Regional Sales Manager

Oracle Exadata: Strategy and Roadmap

Performance Tuning. Chapter 25

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Data-Intensive Distributed Computing

Transcription:

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less Dipl.- Inform. Volker Stöffler Volker.Stoeffler@DB-TecKnowledgy.info Public

Agenda Introduction: What is SAP IQ - in a nutshell Architecture, Idea, Background Exercise: Create a database and database objects What makes SAP IQ eligible for Big Data Scenarios (Un-) Limits, Scalability Aspects Exercise: Populate the database using bulk load Ad-hoc Queries What IQ is good at Exercise: Run predefined or own queries against your database

Learning Objective After completing this sesson, you will be able to: Recognize the benefits of data compression mechanisms in Big Data scenarios Describe how ad-hoc queries against raw fact data give you the flexibility to evaluate these data just along the dimensions you want NOW. Match evaluation patterns against the data structures offered by SAP IQ.

What is SAP IQ - in a nutshell Architecture, Idea, Background

Real- Time Evaluation on Very Large Tables with SAP IQ SAP IQ is a pure-bred Data Warehouse engine designed for Very Large Databases. Like SAP HANA, it utilizes a Columnar Data Store. Unlike SAP HANA, it stores data on Disk and utilizes RAM to cache parts of it. Data Compression multiplies the range of Storage Resources. Dictionary Compression for repeating column values Storage compression for all data structures Storage required for data can be 30% - 80% less than in traditional RDBMS. SAP IQ integrates seamlessly with core components of the Big Data ecosystem. SAP HANA via Smart Data Access / Extended Storage Hadoop via Component Integration Service or Table User Defined Functions

SAP IQ Terms Columnar Data Store: In traditional (OLTP style) RDBMS, the various column values of a data row are stored together, making access to a single complete row very efficient. In a columnar data store, the column values of many rows are stored together. A row is distributed over various column vectors Row ID: Since a row does not exist as a memory entity, it exists as a Row ID indicating the position in the various column vectors Cardinality: The number of unique / distinct column values for a column. Optimized Fast Projection: The SAP IQ term for dictionary compression Bitmap Index: Since a row exists as a Row ID only, columns of low cardinality can be reflected as (usually sparsely populated) bitmaps where each bit represents one row. There is one bitmap per unique value. A set bit indicates a row with that value.

SAP IQ for Big Data Scenarios What makes SAP IQ eligible for Big Data Scenarios (Un-) Limits, Scalability Aspects

Data Acquisition Big Data Data is acquired through bulk mechanism fast SAP IQ holds the Guinness World Record of 34.3 TB / hour (2014) scalable Parallel Processing of Load data streams cost efficient Runs on standard hardware versatile IQ can load from a wide variety of data sources including leading RDBMSs and Hadoop

Procedure: SAP IQ Data Acquisition Incoming data (row oriented, tabular result set or data file / data pipe) Green blocks are eligible for massive parallel execution Transformation to vertical Dictionary Compression (where applicable) Storage Compression as data is written to disk Auxiliary Indexes (incremental or non-incremental)

Optimized Fast Projection Dictionary Compression Eligible Columns have a metadata Lookup Table Each distinct value is represented once in the lookup table Each column value is stored as the position in the lookup table Lookup Table size depends on column data type and cardinality Number of rows in lookup table = cardinality Lookup table row size is calculated upon the column data type Up to cardinality 2^31 (2,147,483,647) Column Vector size depends on number of rows and column cardinality Each column value is represented by as few bits as required to store cardinality in binary E.g. a column with a cardinality of 9.. 16 requires 4 bits / row, a column with a cardinality of 513.. 1024 requires 10 bits / row

Data Storage Big Data SAP IQ can maintain as many containers (files / raw devices) as the OS allows, each up to 4 TB in size Life Cycle SAP IQ can organize the database for different kinds of storage, reflecting data life cycle or temperature. Compression Raw data size typically reduced by 30 70% cost efficient Data compression reduces the disk footprint. integrated SAP IQ can integrate with HANA to hold no longer hot enough for inmemory and Hadoop to age out data even colder

SAP IQ Storage (Un-) Limitations SAP IQ maximum database size: number of files times maximum file size the OS allows. Maximum file size supported by IQ is 4 TB Organized as DBSpaces consisting of up to 2000 files Up to 2^48 1 rows per table 15-digit decimal number Table size is only limited by database size Special declarations required to extend a table beyond the size of a DBSpace Up to 2^32 indexes per table Up to 45000 columns per table (recommended limit: 10000)

Big Data Specific Features Very Large Database Option Semantic Partitioning Read-Only DBSpaces I/O Striping Can control data location by data values When fully populated with archive data, DBSpaces can be declared read-only and excluded from full backups Tables can be distributed over multiple devices by column and / or by partition I/O Striping Auxiliary indexes can be separated from raw data Data Aging Tables or Partitions (through semantic partitioning) with cold data can be assigned to cheaper storage

Background What are we doing Storage Containers Catalog Store Temp. Store System Main Store User Data Store User Data Store

Background What are we doing System Storage Containers First, we create Catalog Store, System Main Store and Temporary Store (0CreateDB.SQL) Catalog Store Temp. Store System Main Store User Data Store User Data Store

Background What are we doing System Storage Containers First, we create Catalog Store, System Main Store and Temporary Store (0CreateDB.SQL) Catalog Store: database '...\FlightStats.db Database Handle one file system file (accompanied by a.log) Holds system tables Grows on demand System Main Store: IQ path '...\FlightStatsMain.IQ One or multiple file system files or raw devices Holds system data Specified current and optionally reserved size for later extension Temp Store: temporary path '...\FlightStatsTemp_00.IQ One or multiple file system files or raw devices Holds temporary data (work tables, temporary tables, processing data) Specified current and optionally reserved size for later extension

Background What are we doing User Storage Containers Next, we create a User Data Store (1AdjustExtendDB.SQL) Catalog Store Temp. Store System Main Store User Data Store User Data Store

Background What are we doing Create Tables and Indexes Then, we create Tables and Indexes (2TablesIndexes.SQL) Table: create table FlightsOnTime Standard SQL Except iq unique clause (here to bypass dictionary compression) Indexes: Various Index Types Many apply to one column LF Low Fast for low cardinality columns HNG High Non Group for parallel calculation of totals and averages DATE for the low cardinality elements of date values more and details to follow

Ad-hoc Queries What IQ is good at

Real- Time Evaluation on Very Large Tables with SAP IQ Product Acct. Rep State Year Quarter Revenue IQ Steve TX 2013 1 600 ASE Bill OK 2013 1 515 ESP Tom MA 2013 1 780 HANA Steve AZ 2013 1 340 HANA Tom NJ 2013 1 375 IQ Tom PH 2013 1 410 ESP Greg CA 2013 1 875 HANA Steve TX 2013 1 724 IQ Bill CO 2013 2 415 ESP Steve TX 2013 2 655 HANA Bill UT 2013 2 820 HANA Tom NH 2013 2 570

Real- Time Evaluation on Very Large Tables with SAP IQ Product Acct. Rep State Year Quarter Revenue IQ Steve TX 2013 1 600 ASE Bill OK 2013 1 515 ESP Tom MA 2013 1 780 HANA Steve AZ 2013 1 340 HANA Tom NJ 2013 1 375 IQ Tom PH 2013 1 410 ESP Greg CA 2013 1 875 HANA Steve TX 2013 1 724 IQ Bill CO 2013 2 415 ESP Steve TX 2013 2 655 HANA Bill UT 2013 2 820 HANA Tom NH 2013 2 570

Real- Time Evaluation on Very Large Tables with SAP IQ Product Acct. Rep State Year Quarter Revenue IQ Steve TX 2013 1 600 ASE Bill OK 2013 1 515 ESP Tom MA 2013 1 780 HANA Steve AZ 2013 1 340 HANA Tom NJ 2013 1 375 IQ Tom PH 2013 1 410 ESP Greg CA 2013 1 875 HANA Steve TX 2013 1 724 IQ Bill CO 2013 2 415 ESP Steve TX 2013 2 655 HANA Bill UT 2013 2 820 HANA Tom NH 2013 2 570

Data Processing Big Data Columnar Data Store allows evaluation of very large numbers of rows irrelevant columns have no impact on query performance. scalable Server or Query Workload can be distributed across multiple machines (Multiplex / PlexQ). scalable I/O Striping across all eligible disk containers. efficient Pipeline Processing Bitmap indexes allow complex aggregations through elementary binary operators. Subsequent query operators can start before completion of previous operators

Showcase: Grouped Average Calculation in 2 Dimensions We have a numeric fact value (like number or value of items sold) for which we want to calculate total or average values. Assumptions: Every fact row has one out of 23 status values. We re only interested in status current or historic. These two make up ~98% of the stored data. Every fact row is assigned to a geography. The geography dimension has a cardinality of ~100, but we re only interested in 8 of them (e.g. AT, BE, CH, DE, FR, IE, NL, UK). Every fact row is assigned to a product line. There s 43 of them, and we ll evaluate them all.

Showcase: Sample Data Excerpt Low Fast (LF) Index Status Geo PL current ES 3 current DK 5 pending UK 9 current UK 16 historic DE 29 current NL 2 historic FR 4 historic GA 5 current DE 16 current AT 31 current IT 24 current historic pending 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 0 D E DK E S UK 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 PL2 PL3 PL4 PL5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

Procedure: Showcase initial process steps Filter: Create a combined bitmap current OR historic Pipeline Execution Permutation 1: Create a combination of this bitmap (AND) with each of AT, BE, CH, DE, FR, IE, NL, UK Threads: 8 Permutation 2: Create an AND combination of each bitmap with each product line Threads: 8*43 Intermediate Result: 8*43 bitmaps each indicating the row set for a combination of Geo and PL

Showcase: Bit Slice Sum Calculation with HNG Index Value 16 8 4 2 1 23 1 0 1 1 1 11 0 1 0 1 1 17 1 0 0 0 1 5 0 0 1 0 1 15 0 1 1 1 1 24 1 1 0 0 0 16 1 0 0 0 0 7 0 0 1 1 1 As an auxiliary index structure, numeric values can be stored in bit slices This is called High Non Group (HNG) Index Every bit value is represented by an own bitmap E.g. for an unsigned smallint (2 bytes; 0.. 65535) 16 bitmaps are stored Each represents a power of 2 (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768) 12 0 1 1 0 0 11 0 1 0 1 1 25 1 1 0 0 1

Procedure: Showcase final process steps Intermediate Result: 8*43 bitmaps each indicating the row set for a combination of Geo and PL Pipeline Execution Permutation 3: AND combine each bitmap with each HNG bit slice and count resulting set bits Threads: 8*43*16 Accumulation: Multiply the number of set bits with the weight of the bit and add up for each Geo / PL Threads: 8*43 Result: (Up to) 8*43 result rows

Showcase: Summary Why this is efficient We re utilizing a very high number of threads These can be executed in parallel if sufficient cores are available but they don t have to They introduce no overhead and are completely independent of each other Even could be executed on different nodes in a PlexQ setup The operations executed are technically trivial and highly efficient on every hardware The intermediate results fit into hardware registers The persistent input bitmaps can be distributed over multiple disks for I/O striping The intermediate bitmaps can be expected to fit in the cache 128 Mbytes for 1G rows per bitmap (uncompressed)

Scalability Aspects Load time vs. number of existing rows Incremental indexes (for low cardinality data) are insensitive to the number of existing rows. Non- Incremental (B-Tree) indexes are principally sensitive to the number of existing rows, this impact is minimized using tiered B-Trees Query execution time vs. number of cores Most Analytics style queries can efficiently scale out for a high number of CPU cores. Increasing processing power can be expected to produce an adequate gain in response time. Query execution time vs. number of rows Typically, query execution time rises linear with the number of rows or slower (due to pipeline execution) Multinode Setup (Multiplex / PlexQ) Processing power and RAM is not restricted to the capabilities of a single box

Using SAP IQ Standard SQL SAP IQ is addressed using standard SQL easy to use for developers familiar with other RDBMS OLAP The SQL dialect is enhanced by OLAP extensions bringing analytics into the database server Standard APIs Reporting Tools Import Export ODBC, JDBC, OLE-DB, OpenClient Simply use your preferred client (unless it s proprietary) All reporting tools supporting at least one of the standard APIs can retrieve data from SAP IQ ASCII Files are the most versatile data exchange format SAP IQ reads from and writes to these

Consistency - Concurrency Snapshot Isolation No Blocks Full Consistency SAP IQ uses Snapshot Isolation Read operations never get into lock conflicts. This minimizes the impact of data provisioning. Data visible to a reader is always consistent nothing like dirty reads, non-repeatable reads or phantom rows Parallel If CPU cores are available, typical Analytics operations can massively utilize them

Integration into the SAP Big Data Landscape HANA integration Near Line Storage for SAP BW systems Smart Data Access / HANA Extended Storage Hadoop integration User defined functions in IQ to access Hadoop data and Table Parametrized Functions (TPF) Event Stream Processing SAP ESP comes with a native adapter for SAP IQ Reporting / Predictive Data Analysis Standard APIs (ODBC / JDBC / ) available for SAP and third party products OLAP in the database removes Workload from the Reporting systems

Thank you! Contact information: Volker Stöffler DB-TecKnowledgy Independant Consultant Germany 70771 Leinfelden-Echterdingen mailto: Volker.Stoeffler@DB-TecKnowledgy.info http://scn.sap.com/people/volker.stoeffler