Building PetaByte Warehouses with Unmodified PostgreSQL. Emmanuel Cecchet Member of Research Staff May 21 st, 2009

Size: px
Start display at page:

Download "Building PetaByte Warehouses with Unmodified PostgreSQL. Emmanuel Cecchet Member of Research Staff May 21 st, 2009"

Transcription

1 Building PetaByte Warehouses with Unmodified PostgreSQL Emmanuel Cecchet Member of Research Staff May 21 st, 2009 Topics Introduction to frontline data warehouses PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A 2 PGCon 2009, Ottawa 2009 Aster Data Systems

2 Enterprise Data Warehouse Under Stress Enterprise Data Warehouse 3 PGCon 2009, Ottawa 2009 Aster Data Systems Offloading The Enterprise Data Warehouse Frontline Data Warehouse Enterprise Data Warehouse Archival Data Warehouse Petabytes source data 24 x7 Availability TB/Day Load capacity In-database transforms Rapid data access Petabytes detailed data Low cost/tb Flexible compression On-line access Aging out to off-line 4 PGCon 2009, Ottawa 2009 Aster Data Systems

3 Requirements for Frontline Data Warehouses! "! #$ % & ' & ($ ) * +, $ + -. / PGCon 2009, Ottawa 2009 Aster Data Systems Who is Aster Data Systems? Aster ncluster is a software-only RDBMS for large-scale frontline data warehousing High performance High availability High value analytics Low cost Always Parallel MPP architecture Always On on-line operations In-Database MapReduce Petabytes on commodity HW 6 PGCon 2009, Ottawa 2009 Aster Data Systems

4 MySpace Frontline Data Warehouse 0 -./ %% 0 1 * 2%3 1!+', # D<E FGH IJKLMKNJNO PJQR ST UVWXYZ[!"#! $%&' ()*+, \] ^_ `a^ bc _c dc e] f ghhi jklmno pqhh rs tuvutwrx 7 PGCon 2009, Ottawa 2009 Aster Data Systems Topics Introduction to frontline data warehouses PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A 8 PGCon 2009, Ottawa 2009 Aster Data Systems

5 Petabyte Datawarehouse Design PostgreSQL as a building block Do not hack PostgreSQL to serve the distributed database Build on top of mainline PostgreSQL Use standard Postgres APIs Service Oriented Architecture Hierarchical query planning and optimization Shared nothing replication treating Postgres and Linux as a service Compression at the OS level transparently to PostgreSQL In-database Map-Reduce out-of-process 9 PGCon 2009, Ottawa 2009 Aster Data Systems Aster ncluster Database Queries/Answers Queen Server Group Queries Worker Server Group Data Loader/Exporter Server Group Aster ncluster Database Queen Node QKQ O O OL JKM N QLMJKO O Q QKR J QKRMK IJJ RMKQL O N M O Worker Node! " # $ % % &' Loader Node ( ) $ ) *+, - '. / % % 01 &' 10 PGCon 2009, Ottawa 2009 Aster Data Systems

6 Query Processing: How It Works C F5 F5 #, 32 5 C887> %& & %& & %& & %& &. A: Users send queries to the Queen (via ACT, ODBC/JDBC, BI tools, etc.) B: The Queen parses the query, creates an optimal distributed plan, sends subqueries to the Workers, and supervises the processing C: Workers execute their subqueries (locally or via distributed communication) D: The Queen aggregates Worker results and returns the final result to the user 11 PGCon 2009, Ottawa 2009 Aster Data Systems Table Compression Architecture Query RS TUVWXX?@ABCDEEF GDHIJA?@ABCDEEF GDHIJA?@ABCDEEF GDHIJA Logical Database (eg. Schema)?@ABCDEEF F K ILM?@ABCDEEF KILM K?@ABCDEEFF K ILM?@ABCDEEF KILM?@ABCDEEF F K ILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF N@O?@ABCDEEF N@O How It Works!"##$ % $#!"##$ &#'( ')' $&# *"#$ + $,,#"#+ &'#!# $#!#+$+) +!"#+ '#-#' Architecture Benefits:.)/!"#+ " 0&))#" &'1 2# 3#'$ "# -+)4 &# "+!"#+ 0##,,55"# 6)"# 5!)"$#4 7+5""#+3 0/)/8!#"," +# 5'8&'#!"#+4?@ABCDEEF GDHIJA?@ABCDEEF GDHIJA 6#"," +#9 :;#(8"(< =5#"# $+> +##$,5'' &'# $#!"#+?@ABCDEEF F K ILM?@ABCDEEF?@ABCDEEF KILM K F K ILM?@ABCDEEF?@ABCDEEF KILM F K ILM Physical Tablespaces YWZRS TUVWXX 12 PGCon 2009, Ottawa 2009 Aster Data Systems

7 Table Compression Enables Powerful Archival Compressed - High Compressed - Medium Compressed - Low Not Compressed Older data accessed less frequently Compress to save space and cost Oldest data is compressed the most, recent is compressed the least Compressed tables are fully available for queries (true online archival) 13 PGCon 2009, Ottawa 2009 Aster Data Systems What is MapReduce and Why Should I Care? What is MapReduce? Popularized by Google Processes data in parallel across distributed cluster Why is MapReduce significant? Empowers ordinary developers Write application logic, not debug cluster communication code Why is In-Database MapReduce significant? Unites MapReduce with SQL: power invoked from SQL Develop SQL/MR functions with common languages 14 PGCon 2009, Ottawa 2009 Aster Data Systems

8 Aster In-Database MapReduce Users, analysts, applications SQL/MR Functions In-Database MapReduce Data Store Engine Aster ncluster 15 PGCon 2009, Ottawa 2009 Aster Data Systems Slide 15 In-Database MapReduce Extensible framework (MapReduce + SQL) Flexible: Map-Reduce expressiveness, languages, polymorphism Performance: Massively parallel, computational push-down Availability: Fault isolation, resource management Out-of-process executables Does not use PL/* for custom code execution Can execute Map and Reduce functions in any language that has a runtime on Linux (e.g. Java, Python, C#, C/C++, Ruby, Perl, R, etc ) Standard PostgreSQL APIs to send/receive data to executables Fault isolation, security and resource management for arbitrary user code 16 PGCon 2009, Ottawa 2009 Aster Data Systems

9 Always On: Minimize Planned Downtime Rebalance Data Live Queries Add Capacity Data Backup Load & Export Backup & Restore 17 PGCon 2009, Ottawa 2009 Aster Data Systems Precision Scaling W V [ [S [ [ [ [. [ [ When more CPU/memory/capacity are needed, new nodes can be added for scale-out. Precision Scaling uses standard PostgreSQL APIs to migrate vworker partitions to new nodes either for load balancing (more compute power) or capacity expansion Example: Assume Workers 1/2/3 are 100% CPU-bottlenecked. Incorporation adds a new Worker4 node and migrates over vworker partitions D/H/L. As a result of loadbalancing, CPU-utilization drops to 75% per node, eliminating hotspots. 18 PGCon 2009, Ottawa 2009 Aster Data Systems

10 Replication 0 " 0" 1 Q Q LMLMJK J O MO M QL R QL QOL JK JK QKJL KJR K L KL J KJR QMN M QLMJK JL LO Q QMKOL RQLQ JOO QKR O OL JNLQ O 0 QKOQ KL LJ JOL O M J P QKR QLQ O MMK PMKN Q MMLM O LJ Q LN RQLQ QK O 19 PGCon 2009, Ottawa 2009 Aster Data Systems Fault Tolerance & Automatic Online Failover '(##) *+, -./012!"#!$!"#!%!"#!& !"#!4!"#! Replication Failover Automatic, non-disruptive, graceful performance impact Replication Restoration Delta-based (fast) and online (non-disruptive) 20 PGCon 2009, Ottawa 2009 Aster Data Systems

11 Using Commodity Hardware V [ M Q ML L K L IMO J NKM L... V [ [X NLMIJ I O NLM I O O MOL M NL R J MO M O Q MK Q Q ML NK L W X [ VW [ [ X [ [ X [ WX [ W VX QQ J QK M QKQ Q M QMQ PJ OL I 21 PGCon 2009, Ottawa 2009 Aster Data Systems Scaling On-Demand to a PetaByte Commodity Hardware More Blocks = More Power 2 TB Building Block Dell, HP, IBM, Intel x86 16 GB Memory 2.4TB of Storage 8 Disks $5k to $10k Node 7!>=<E "8 # < Massive Power Per Rack 160 Cores 640 GB RAM 48 TB SAS 22 PGCon 2009, Ottawa 2009 Aster Data Systems

12 Heterogeneous Hardware Support W V 8E <E CPU Memory Disk [ [ [ [ [ [ [ [ Heterogeneous HW support enables customers to add cheaper/faster nodes to existing systems for investment protection Mix-n-match different servers as you grow (faster CPUs, more memory, bigger disk drives, different vendors, etc) 23 PGCon 2009, Ottawa 2009 Aster Data Systems Topics Introduction to frontline data warehouses PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A 24 PGCon 2009, Ottawa 2009 Aster Data Systems

13 Error Logging in COPY Separate good from malformed tuples Per tuple error information Errors to capture Type mismatch (e.g. text vs. int) Check constraints (e.g. int < 0) Malformed chars (e.g. invalid UTF-8 seq.) Missing / extra column Low-performance overhead Activated on-demand using environment variables 25 PGCon 2009, Ottawa 2009 Aster Data Systems Error Logging in COPY Detailed error context is logged along with tuple content 26 PGCon 2009, Ottawa 2009 Aster Data Systems

14 Error Logging Performance 1 million tuples COPY performance 27 PGCon 2009, Ottawa 2009 Aster Data Systems Auto-partitioning in COPY COPY into a parent table route tuples directly in the child table with matching constraints COPY y2008 FROM data.txt 28 PGCon 2009, Ottawa 2009 Aster Data Systems

15 Auto-partitioning in loading Activated on-demand set tuple_routing_in_copy = 0/1; Configurable LRU cache size set tuple_routing_cache_size = 3; COPY performance into parent table similar to direct COPY in child table if data is sorted Will leverage partitioning information in the future (WIP for 8.5) 29 PGCon 2009, Ottawa 2009 Aster Data Systems Other contributions Temporary tables and 2PC transactions [Auto-]Partitioning infrastructure Regression test suite LFI ( Fault injection at the library level or below out-of-memory conditions, network connection errors, interrupted system calls, data corruption, hardware failures, etc Lightning talk on Friday! 30 PGCon 2009, Ottawa 2009 Aster Data Systems

16 Topics Introduction to Aster and data warehousing PetaByte warehouse design with PostgreSQL Aster contributions to PostgreSQL Q&A 31 PGCon 2009, Ottawa 2009 Aster Data Systems PetaByte Warehouses with PostgreSQL Unmodified PostgreSQL Always Parallel MPP architecture Always On on-line operations In-Database MapReduce PetaByte on commodity Hardware 32 PGCon 2009, Ottawa 2009 Aster Data Systems

17 Aster Data Systems Learn more Free TDWI report on advanced analytics: asterdata.com/mapreduce Free Gartner webcast on mission-critical DW: asterdata.com/gartner Contact us 33 PGCon 2009, Ottawa 2009 Aster Data Systems Bonus slides 34 PGCon 2009, Ottawa 2009 Aster Data Systems

18 ncluster Components 5 C887> FG=G 6969 DE8EG> 1 / &' $ % & ' (!+, + "&)+&+, #%! # )%&, / ncluster 35 PGCon 2009, Ottawa 2009 Aster Data Systems Aster Loader ) 8G# 5 G7G9*69 +,-../01/ :79: ; <:=75 >75=?5@4<;7 A?99B7<7;CD EF.GHIF02 JKB9L>B7 3?4875: 9? MN?5C75:?< N?5C75 <?87:D O=975 >459L9L?<L<6P 3?4875 QLBB >454BB7B B?48 K<LRK L<9? MN?5C75:D S/0/TIH2 3L<745BU :;4B4AB7 B?48 >75=?5@4<;7 V=5?@ 3?4875: 9? N?5C75:W Scalable Partitioning +,-../01/2 A7 :;4B4AB7 4<8 <?9 L@>787 9Y7 B?48L<6 >5?;7:: EF.GHIF02 Z4;Y 3?4875 ;?<94L<: 4 [X459L9L?<75\P QYL;Y K:7: 4< 4B6?5L9Y@ 9? 4::L6< 8494 L<9? [AK;C79:\D Z4;Y AK;C79 L: 9? QL9YL< 9Y7 ;BK:975D S/0/TIH2 ]4:9P L<97BBL67<9 >459L9L?<L< B?48: _ G! 7= `G9 #76 9 +,-../01/2 ]4LB78 B?:7 8494?5 :L6<L=L;4<9BU 85?> B?48 >75=?5@4<;7D EF.GHIF02 a= 4 <?87 =4LB:P?9Y75 :9574@: ;?<9L<K7 9? B?48 Vbcd >75=?5@4<;7 YL9WD O8@L<: ;4< 74:LBU 57;?M75 B?: AU 57^B?48L<6 ekbc ] D S/0/TIH2 f494 B?:: >5?97;9L?< g >75=?5@4<;7 ;?<:L:97<;U 8K5L<6 <?87 =4LBK57D 36 PGCon 2009, Ottawa 2009 Aster Data Systems

white paper Aster Data ncluster In - database Analytics with R

white paper Aster Data ncluster In - database Analytics with R white paper Aster Data ncluster In - database Analytics with R Contents Introduction to Aster Data ncluster and SQL-MapReduce... 3 R in Aster Data ncluster... 3 Proprietary Scoring using R without In-database

More information

Oracle #1 RDBMS Vendor

Oracle #1 RDBMS Vendor Oracle #1 RDBMS Vendor IBM 20.7% Microsoft 18.1% Other 12.6% Oracle 48.6% Source: Gartner DataQuest July 2008, based on Total Software Revenue Oracle 2 Continuous Innovation Oracle 11g Exadata Storage

More information

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

Netezza The Analytics Appliance

Netezza The Analytics Appliance Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for

More information

Pervasive Insight. Mission Critical Platform

Pervasive Insight. Mission Critical Platform Empowered IT Pervasive Insight Mission Critical Platform Dynamic Development Desktop & Mobile Server & Datacenter Cloud Over 7 Million Downloads of SQL Server 2008 Over 30,000 partners are offering solutions

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1 Appliances and DW Architecture John O Brien President and Executive Architect Zukeran Technologies 1 OBJECTIVES To define an appliance Understand critical components of a DW appliance Learn how DW appliances

More information

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Improving Blade Economics with Virtualization

Improving Blade Economics with Virtualization Improving Blade Economics with Virtualization John Kennedy Senior Systems Engineer VMware, Inc. jkennedy@vmware.com The agenda Description of Virtualization VMware Products Benefits of virtualization Overview

More information

IBM Db2 Analytics Accelerator Version 7.1

IBM Db2 Analytics Accelerator Version 7.1 IBM Db2 Analytics Accelerator Version 7.1 Delivering new flexible, integrated deployment options Overview Ute Baumbach (bmb@de.ibm.com) 1 IBM Z Analytics Keep your data in place a different approach to

More information

What is wrong with PostgreSQL? OR What does Oracle have that PostgreSQL should? Richard Stephan

What is wrong with PostgreSQL? OR What does Oracle have that PostgreSQL should? Richard Stephan What is wrong with PostgreSQL? OR What does Oracle have that PostgreSQL should? Richard Stephan PostgreSQL is an Enterprise RDBMS Schemas, Roles, Accounts Tablespace Management Table Partitioning Write-Ahead

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Storage Optimization with Oracle Database 11g

Storage Optimization with Oracle Database 11g Storage Optimization with Oracle Database 11g Terabytes of Data Reduce Storage Costs by Factor of 10x Data Growth Continues to Outpace Budget Growth Rate of Database Growth 1000 800 600 400 200 1998 2000

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Providing Superior Server and Storage Performance, Efficiency and Return on Investment As Announced and Demonstrated at

More information

Oracle 10g on Solaris to Oracle RAC 11gR2 on Linux Upgrade

Oracle 10g on Solaris to Oracle RAC 11gR2 on Linux Upgrade Image courtesy of Mammoth-WEBCO, Inc. Image courtesy of ADEPT Airmotive (Pty) Ltd. Courtesy of KlingStubbins Oracle 10g on Solaris to Oracle RAC 11gR2 on Linux Upgrade Alan Williams Database Administrator

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013 SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure Mario Beck (mario.beck@oracle.com) Principal Sales Consultant MySQL Session Agenda Requirements for

More information

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Achieving Horizontal Scalability. Alain Houf Sales Engineer Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches

More information

Planning the Installation and Installing SQL Server

Planning the Installation and Installing SQL Server Chapter 2 Planning the Installation and Installing SQL Server In This Chapter c SQL Server Editions c Planning Phase c Installing SQL Server 22 Microsoft SQL Server 2012: A Beginner s Guide This chapter

More information

Large-Scale GPU programming

Large-Scale GPU programming Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University

More information

1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Engineered Systems - Exadata Juan Loaiza Senior Vice President Systems Technology October 4, 2012 2 Safe Harbor Statement "Safe Harbor Statement: Statements in this presentation relating to Oracle's

More information

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case 1 / 39 Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case PAN Jie 1 Yann LE BIANNIC 2 Frédéric MAGOULES 1 1 Ecole Centrale Paris-Applied Mathematics and Systems

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)

More information

Was ist dran an einer spezialisierten Data Warehousing platform?

Was ist dran an einer spezialisierten Data Warehousing platform? Was ist dran an einer spezialisierten Data Warehousing platform? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Data warehousing, Exadata, specialized hardware proprietary hardware Introduction

More information

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY DELL MD FAMILY MODULAR STORAGE THE DELL MD STORAGE FAMILY Simplifying IT The Dell PowerVault MD family can simplify IT by optimizing your data storage architecture and ensuring the availability of your

More information

Top Trends in DBMS & DW

Top Trends in DBMS & DW Oracle Top Trends in DBMS & DW Noel Yuhanna Principal Analyst Forrester Research Trend #1: Proliferation of data Data doubles every 18-24 months for critical Apps, for some its every 6 months Terabyte

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

EMC GREENPLUM DATA COMPUTING APPLIANCE: PERFORMANCE AND CAPACITY FOR DATA WAREHOUSING AND BUSINESS INTELLIGENCE

EMC GREENPLUM DATA COMPUTING APPLIANCE: PERFORMANCE AND CAPACITY FOR DATA WAREHOUSING AND BUSINESS INTELLIGENCE White Paper EMC GREENPLUM DATA COMPUTING APPLIANCE: PERFORMANCE AND CAPACITY FOR DATA WAREHOUSING AND BUSINESS INTELLIGENCE A Detailed Review EMC GLOBAL SOLUTIONS Abstract This white paper provides readers

More information

e-warehousing with SPD

e-warehousing with SPD e-warehousing with SPD Server Leigh Bates SAS UK Introduction!e -Warehousing! Web-enabled SAS Technology! Portals! Infrastructure! Storage End to End Data Warehouse Web Enabled Data Warehouse Web Enabled

More information

Focus On: Oracle Database 11g Release 2

Focus On: Oracle Database 11g Release 2 Focus On: Oracle Database 11g Release 2 Focus on: Oracle Database 11g Release 2 Oracle s most recent database version, Oracle Database 11g Release 2 [11g R2] is focused on cost saving, high availability

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

IBM s Data Warehouse Appliance Offerings

IBM s Data Warehouse Appliance Offerings IBM s Data Warehouse Appliance Offerings RChaitanya IBM India Software Labs Agenda 1 IBM Smart Analytics System (D5600) System Overview Technical Architecture Software / Hardware stack details 2 Netezza

More information

IBM Storage Software Strategy

IBM Storage Software Strategy IBM Storage Software Strategy Hakan Turgut 1 Just how fast is the data growing? 128 GB/ person The World s total data per person No Problem I I think I can do this We have a problem 24 GB/ person 0.8 GB/

More information

In-Memory Computing EXASOL Evaluation

In-Memory Computing EXASOL Evaluation In-Memory Computing EXASOL Evaluation 1. Purpose EXASOL (http://www.exasol.com/en/) provides an in-memory computing solution for data analytics. It combines inmemory, columnar storage and massively parallel

More information

SQL Server 2014 Column Store Indexes. Vivek Sanil Microsoft Sr. Premier Field Engineer

SQL Server 2014 Column Store Indexes. Vivek Sanil Microsoft Sr. Premier Field Engineer SQL Server 2014 Column Store Indexes Vivek Sanil Microsoft Vivek.sanil@microsoft.com Sr. Premier Field Engineer Trends in the Data Warehousing Space Approximate data volume managed by DW Less than 1TB

More information

Postgres Plus and JBoss

Postgres Plus and JBoss Postgres Plus and JBoss A New Division of Labor for New Enterprise Applications An EnterpriseDB White Paper for DBAs, Application Developers, and Enterprise Architects October 2008 Postgres Plus and JBoss:

More information

Oracle Exadata. Smart Database Platforms - Dramatic Performance and Cost Advantages. Juan Loaiza Senior Vice President Oracle Database Systems

Oracle Exadata. Smart Database Platforms - Dramatic Performance and Cost Advantages. Juan Loaiza Senior Vice President Oracle Database Systems Oracle Exadata Smart Database Platforms - Dramatic Performance and Cost Advantages Juan Loaiza Senior Vice President Oracle Database Systems Exadata X5-2 Exadata X5-8 SuperCluster M7-8 Exadata Vision Dramatically

More information

Oracle 1Z0-515 Exam Questions & Answers

Oracle 1Z0-515 Exam Questions & Answers Oracle 1Z0-515 Exam Questions & Answers Number: 1Z0-515 Passing Score: 800 Time Limit: 120 min File Version: 38.7 http://www.gratisexam.com/ Oracle 1Z0-515 Exam Questions & Answers Exam Name: Data Warehousing

More information

Resiliency at Scale in the Distributed Storage Cloud

Resiliency at Scale in the Distributed Storage Cloud Resiliency at Scale in the Distributed Storage Cloud Alma Riska Advanced Storage Division EMC Corporation In collaboration with many at Cloud Infrastructure Group Outline Wi topic but this talk will focus

More information

map(k1, v1) list(k2, v2) reduce(k2, Union_List(k2,v2)) List (v3)

map(k1, v1) list(k2, v2) reduce(k2, Union_List(k2,v2)) List (v3) A lack of expressive techniques to transform and analyze big data volumes has led to costly workarounds in data warehousing and analytics. Google, one of the first to face the challenge of analyzing petabytescale

More information

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card OVERVIEW SALES OPPORTUNITY Lenovo Database Solutions for Microsoft SQL Server bring together the right mix of hardware infrastructure, software, and services to optimize a wide range of data warehouse

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Oracle #1 for Data Warehousing. Data Warehouses Growing Rapidly Tripling In Size Every Two Years

Oracle #1 for Data Warehousing. Data Warehouses Growing Rapidly Tripling In Size Every Two Years Extreme Performance HP Oracle Machine & Exadata Storage Server October 16, 2008 Robert Stackowiak Vice President, EPM & Data Warehousing Solutions, Oracle ESG Oracle #1 for Data Warehousing Microsoft 14.8%

More information

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief Digital Business Demands Performance and Scale As enterprises shift to online and softwaredriven business models, Oracle infrastructure is being pushed to run at exponentially higher scale and performance.

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

Oracle made it easy: Cloud DB Vergleich

Oracle made it easy: Cloud DB Vergleich Oracle made it easy: Cloud DB Vergleich MATTHIAS FUCHS, ESENTRI BORYS NESELOVSKYI, OPITZ CONSULTING DOAG 2018 KONFERENZ, NÜRNBERG Cloud Angebote für Oracle Datenbank ORACLE CLOUD Oracle Datenbank Microsoft

More information

The Oracle Database Appliance I/O and Performance Architecture

The Oracle Database Appliance I/O and Performance Architecture Simple Reliable Affordable The Oracle Database Appliance I/O and Performance Architecture Tammy Bednar, Sr. Principal Product Manager, ODA 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

More information

Data Warehousing 11g Essentials

Data Warehousing 11g Essentials Oracle 1z0-515 Data Warehousing 11g Essentials Version: 6.0 QUESTION NO: 1 Indentify the true statement about REF partitions. A. REF partitions have no impact on partition-wise joins. B. Changes to partitioning

More information

2 de Diciembre #SqlSaturdayMontevideo

2 de Diciembre #SqlSaturdayMontevideo 2 de Diciembre 2017 #SqlSaturdayMontevideo Patrocinadores del SQL Saturday 3 02/12/2017 SQL Saturday #691 Montevideo, Uruguay Javier Villegas DBA Manager at Mediterranean Shipping Company (MSC) Microsoft

More information

Eliminating Downtime When Migrating or Upgrading to Oracle 10g

Eliminating Downtime When Migrating or Upgrading to Oracle 10g Transactional Data Management Solutions December 13, 2005 NYOUG Eliminating Downtime When Migrating or Upgrading to Oracle 10g Agenda GoldenGate Overview What is Transactional Data Management? Why Migrate/Upgrade

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

TS7700 Technical Update What s that I hear about R3.2?

TS7700 Technical Update What s that I hear about R3.2? TS7700 Technical Update What s that I hear about R3.2? Ralph Beeston TS7700 Architecture IBM Session objectives Brief Overview TS7700 TS7700 Release 3.2 TS7720T TS7720 Tape Attach The Basics Partitions

More information

Oracle Database 18c and Autonomous Database

Oracle Database 18c and Autonomous Database Oracle Database 18c and Autonomous Database Maria Colgan Oracle Database Product Management March 2018 @SQLMaria Safe Harbor Statement The following is intended to outline our general product direction.

More information

The vsphere 6.0 Advantages Over Hyper- V

The vsphere 6.0 Advantages Over Hyper- V The Advantages Over Hyper- V The most trusted and complete virtualization platform SDDC Competitive Marketing 2015 Q2 VMware.com/go/PartnerCompete 2015 VMware Inc. All rights reserved. v3b The Most Trusted

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel

More information

Advanced Database Technologies NoSQL: Not only SQL

Advanced Database Technologies NoSQL: Not only SQL Advanced Database Technologies NoSQL: Not only SQL Christian Grün Database & Information Systems Group NoSQL Introduction 30, 40 years history of well-established database technology all in vain? Not at

More information

Step into the future. HP Storage Summit Converged storage for the next era of IT

Step into the future. HP Storage Summit Converged storage for the next era of IT HP Storage Summit 2013 Step into the future Converged storage for the next era of IT 1 HP Storage Summit 2013 Step into the future Converged storage for the next era of IT Karen van Warmerdam HP XP Product

More information

Virtualizing Oracle on VMware

Virtualizing Oracle on VMware Virtualizing Oracle on VMware Sudhansu Pati, VCP Certified 4/20/2012 2011 VMware Inc. All rights reserved Agenda Introduction Oracle Databases on VMware Key Benefits Performance, Support, and Licensing

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 17: MapReduce and Spark Announcements Midterm this Friday in class! Review session tonight See course website for OHs Includes everything up to Monday s

More information

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data: The Next Age of Data Management Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104 Real Time for Big Data The Next Age of Data Management Introduction

More information

PeerStorage Arrays Unequalled Storage Solutions

PeerStorage Arrays Unequalled Storage Solutions Simplifying Networked Storage PeerStorage Arrays Unequalled Storage Solutions John Joseph, VP of Marketing EqualLogic,, 9 Townsend West, Nashua NH 03063 Phone: +1-603 603-249-7772, FAX: +1-603 603-579-6910

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Adaptive Cluster Computing using JavaSpaces

Adaptive Cluster Computing using JavaSpaces Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

Architecture of a Real-Time Operational DBMS

Architecture of a Real-Time Operational DBMS Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.

More information

System Requirements. v7.5. May 10, For the most recent version of this document, visit kcura's Documentation Site.

System Requirements. v7.5. May 10, For the most recent version of this document, visit kcura's Documentation Site. System Requirements v7.5 May 10, 2013 For the most recent version of this document, visit kcura's Documentation Site. Table of Contents 1 System requirements overview 3 1.1 Scalable infrastructure example

More information

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel

Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO Oliver Michel Network monitoring is important Security issues Performance issues Equipment failure Analytics Platform

More information

Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source

Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC https://ignite.apache.org @apacheignite @dsetrakyan Agenda About In- Memory Computing Apache Ignite

More information

Dell EMC ScaleIO Ready Node

Dell EMC ScaleIO Ready Node Essentials Pre-validated, tested and optimized servers to provide the best performance possible Single vendor for the purchase and support of your SDS software and hardware All-Flash configurations provide

More information

IBM SmartCloud Resilience offers cloud-based services to support a more rapid, reliable and cost-effective enterprise-wide resiliency.

IBM SmartCloud Resilience offers cloud-based services to support a more rapid, reliable and cost-effective enterprise-wide resiliency. Arjan Mooldijk 27 September 2012 Choice and control developing resilient cloud strategies IBM SmartCloud Resilience offers cloud-based services to support a more rapid, reliable and cost-effective enterprise-wide

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation Oracle 10G Lindsey M. Pickle, Jr. Senior Solution Specialist Technologies Oracle Corporation Oracle 10g Goals Highest Availability, Reliability, Security Highest Performance, Scalability Problem: Islands

More information

IBM Storwize V7000 Unified

IBM Storwize V7000 Unified IBM Storwize V7000 Unified Pavel Müller IBM Systems and Technology Group Storwize V7000 Position Enterprise Block DS8000 For clients requiring: Advanced disaster recovery with 3-way mirroring and System

More information

TS7700 Technical Update TS7720 Tape Attach Deep Dive

TS7700 Technical Update TS7720 Tape Attach Deep Dive TS7700 Technical Update TS7720 Tape Attach Deep Dive Ralph Beeston TS7700 Architecture IBM Session objectives Brief Overview TS7700 Quick background of TS7700 TS7720T Overview TS7720T Deep Dive TS7720T

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)

More information

Lenovo Database Configuration

Lenovo Database Configuration Lenovo Database Configuration for Microsoft SQL Server Standard Edition DWFT 9TB Reduce time to value with pretested hardware configurations Data Warehouse problem and a solution The rapid growth of technology

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Take Back Lost Revenue by Activating Virtuozzo Storage Today

Take Back Lost Revenue by Activating Virtuozzo Storage Today Take Back Lost Revenue by Activating Virtuozzo Storage Today JUNE, 2017 2017 Virtuozzo. All rights reserved. 1 Introduction New software-defined storage (SDS) solutions are enabling hosting companies to

More information

IDS V11.50 and Informix Warehouse Feature V11.50 Offerings Packaging

IDS V11.50 and Informix Warehouse Feature V11.50 Offerings Packaging IBM Dynamic Server IDS V11.50 and Feature V11.50 Offerings Packaging Cindy Fung IDS Product Manager IBM Dynamic Server IDS V11.50 Edition Packaging Changes Licensing Limits AU= authorized user, CS = concurrent

More information

REGULATORY REPORTING FOR FINANCIAL SERVICES

REGULATORY REPORTING FOR FINANCIAL SERVICES REGULATORY REPORTING FOR FINANCIAL SERVICES Gordon Hughes, Global Sales Director, Intel Corporation Sinan Baskan, Solutions Director, Financial Services, MarkLogic Corporation Many regulators and regulations

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

CSE 344 MAY 2 ND MAP/REDUCE

CSE 344 MAY 2 ND MAP/REDUCE CSE 344 MAY 2 ND MAP/REDUCE ADMINISTRIVIA HW5 Due Tonight Practice midterm Section tomorrow Exam review PERFORMANCE METRICS FOR PARALLEL DBMSS Nodes = processors, computers Speedup: More nodes, same data

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

TECHED USER CONFERENCE MAY 3-4, 2016

TECHED USER CONFERENCE MAY 3-4, 2016 TECHED USER CONFERENCE MAY 3-4, 2016 Bob Jeffcott Software AG Big Data Adabas In Memory Data Management with Terracotta 2016 Software AG. All rights reserved. For internal use only AGENDA 1. ADABAS/NATURAL

More information

Hacking PostgreSQL Internals to Solve Data Access Problems

Hacking PostgreSQL Internals to Solve Data Access Problems Hacking PostgreSQL Internals to Solve Data Access Problems Sadayuki Furuhashi Treasure Data, Inc. Founder & Software Architect A little about me... > Sadayuki Furuhashi > github/twitter: @frsyuki > Treasure

More information

Efficient On-Demand Operations in Distributed Infrastructures

Efficient On-Demand Operations in Distributed Infrastructures Efficient On-Demand Operations in Distributed Infrastructures Steve Ko and Indranil Gupta Distributed Protocols Research Group University of Illinois at Urbana-Champaign 2 One-Line Summary We need to design

More information

Your New Autonomous Data Warehouse

Your New Autonomous Data Warehouse AUTONOMOUS DATA WAREHOUSE CLOUD Your New Autonomous Data Warehouse What is Autonomous Data Warehouse Autonomous Data Warehouse is a fully managed database tuned and optimized for data warehouse workloads

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

New Informix Packaging July 20, Steve Stroisi Rajesh Nair

New Informix Packaging July 20, Steve Stroisi Rajesh Nair New Informix Packaging July 20, Steve Stroisi Rajesh Nair New Informix Editions Creating Growth Opportunities for Clients and Partners New free edition to drive new solutions adoption Informix Innovator-C

More information

Directions in Data Centre Virtualization and Management

Directions in Data Centre Virtualization and Management Directions in Data Centre Virtualization and Management Peter West Product Marketing Manager, Product Marketing EMEA, VMware, Inc. New Approach To Data Centre Scalability Simplify the containers Add Hardware

More information