OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks

Size: px
Start display at page:

Download "OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks"

Transcription

1 OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015

2 The Tradi6onal Database Market Opera2onal - $24B Analy2cal - $11B 2

3 Origins of Hadoop Big Batch Processing Jobs to Batch AnalyGcs Batch Processing for Web Search Index Batch AnalyGcs Google File System (GFS) paper published Google Map Reduce paper published Hadoop created based on GFS and Map Reduce Hive created by Facebook for Analy6cs 3

4 Hadoop Not Just for Data Scien6sts Anymore Moving Hadoop Beyond Batch AnalyGcs to Power Real- Time Apps Distributed File System Java MapReduce Programs Read- Only Batch Analy6cs Distributed RDBMS SQL- 99 Queries Real- Time Updates with ACID Transac6ons Real- Time Concurrent Apps and Analy6cs 4

5 SQL- on- Hadoop Market OLTP OLAP 5

6 OLTP Requirements ACID Transac6ons High concurrency Secondary indexes Joins Stored procedures Triggers Constraints Foreign keys Sub- queries Views 6

7 ACID What is it? Why is it important? Reliable updates across mulgple rows and tables Atomicity Transac6ons are all or nothing Consistency Isola2on Durability Only valid data is saved Transac6ons do not affect each other Wrigen data will not be lost Use Cases Update a value and a secondary index Transfer $ between bank accounts $ $ Recover from batch update error without reload 7

8 Doesn t Hive Have ACID Transac6ons? Hive Can t Power an App: High Concurrency vs Batch TransacGons ~10 concurrent users Batch updates Table locking Up to 100,000 concurrent users Mul6- Version Concurrency Control Power applica6ons Opera2onal RDBMSs designed for an analy2c workload low concurrency users upda6ng Mul6ple writers wait behind each other Hive is not meant for low latency updates and deletes Hortonworks blog, Apache Hive ACID TransacGons in HDP 2.2 8

9 Snapshot Isola6on: High- Concurrency MVCC Transac6ons Leverages Mul6- Version Concurrency Control (MVCC) Each update creates a new version with a new 6mestamp Each transac6on can see its own virtual snapshot of database Writers don t block readers 9

10 Splice Machine The RDBMS on Hadoop Replace Oracle to scale out your applica2ons " Affordable, Scale- Out Commodity hardware " Elas2c Easy to expand or scale back " Transac2onal Real- 6me updates & ACID Transac6ons " ANSI SQL Leverage exis6ng SQL code, tools, & skills " Flexible Support opera6onal and analy6cal workloads 10

11 Compelling TCO: Sample Oracle Replacement Oracle RAC Costs List Price Unit 3 Year Cost Splice Machine Costs List Price Unit 3 Year Cost Oracle Database Enterprise Edi6on with RAC $37, $2,416,000 Splice Machine Annual Subscrip6on $10,000 7 $210,000 3 years DB Maintenance (22% list price/yr) 3 years Opera6ng System Support (Oracle Linux) $24, $1,594,560 $6,897 4 $27,588 Cloudera Enterprise Edi6on Annual Subscrip6on 3 years Opera6ng System Support (Oracle Linux) $7,500 8 $180,000 $6,897 4 $27,588 Server Costs (mid- range, Intel Xeon- based) $16,000 4 $64,000 Server Costs with Storage $5,000 8 $40,000 Primary Storage $143,360 1 $143,360 TOTAL $22,500 $457,588 TOTAL $228,922 $4,245,508 Assumes Oracle Enterprise Edi6on ($47.5K/CPU) and RAC ($23K/CPU) 90% TCO Reduc6on ($3.8M) 3-7x faster 11

12 TPC- C Benchmark Gold standard for OLTP Requires high concurrency transac6ons 5 very complex queries Models ERP order- entry: 12

13 Ini6al TPC- C Results* on Commodity Hardware Linear scalability for transacgonal workload on Hadoop * Unaudited Transac2ons per Minute (tmpc) 70,000 60,000 50,000 40,000 30,000 20,000 10, Nodes 13

14 Experimental TPC- C Results: Splice Machine HBase Fork 160,000 Linear scalability for transacgonal workload on Hadoop * Unaudited Transac2ons per Minute (tmpc) 140, , ,000 80,000 60,000 40,000 20,000 - tpmc tpmc (Hbase Patch) Nodes 14

15 3 rd - Party Applica6ons vs Ad- Hoc Queries Far more difficult to accommodate than analygc queries Dynamically Generated SQL No workarounds since generated code Not human- friendly Machine- generated symbols not meant to be interpreted Code not indented or styled No comments Complex Sub- Queries Object Rela6onal Mappings create many levels of sub- queries Applica6ons: object oriented to achieve code efficiency, reuse, and understandability Databases: rela6onal to achieve performance, ACID proper6es, and minimal storage High Concurrency Must support 1,000s to 10,000s of concurrent users

16 OLTP Applica6on: Unica Splice Machine powers the Unica ApplicaGon Campaign Defini2on Customer Segmenta6on by Household Output of 8 Segments By Country By Previous 12 Months Use of Select, Merge and Audience Processes Data Flow Process Select, Merge & Audience Processes Selec6ons Rules Household that Have Opted in Grouped by Loyalty and Non Loyalty SQL for Complex Selec2on Rule INSERT INTO UAC_639_1c SELECT A.CUSTOMER_MASTER_ID FROM UAC_639_v A WHERE A.CUSTOMER_MASTER_ID NOT IN ( SELECT UAC_639_14.CUSTOMER_MASTER_ID FROM UAC_639_14 UNION SELECT UAC_639_11.CUSTOMER_MASTER_ID FROM UAC_639_11 UNION SELECT UAC_639_12.CUSTOMER_MASTER_ID FROM UAC_639_12 U NION SELECT UAC_639_13.CUSTOMER_MASTER_ID FROM UAC_639_13); 16

17 OLTP Applica6on: Unica Splice Machine powers the Unica ApplicaGon Cross Channel Campaigns Architecture Real- Time Personaliza6on Ini2al Results vs. Oracle RAC 3-7x faster through parallelized queries Consumers Real- Time Ac6ons ¼ cost with commodity scale out 17

18 OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon Campaign Defini2on Audience selec6on Dataflow Offers Data Flow Process Suppression Split rules 18

19 OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon SQL for Complex Selec2on Rule INSERT INTO "AMEX"."RP_BC_69_2" SELECT a4."pid" AS "PID", a4."hhid" AS "HHID", (CASE WHEN a5."pid" IS NULL THEN 'N' ELSE 'Y' END) AS "Standard_Suppression_2044", (CASE WHEN a10."pid" IS NULL THEN 'N' ELSE 'Y' END) AS "Low_Value_Customer_1729", (CASE WHEN a13."pid" IS NULL THEN 'N' ELSE 'Y' END) AS "Not_Mailable_1128" FROM "AMEX"."RP_BC_69_1" a4 LEFT OUTER JOIN (SELECT a6."pid" FROM "AMEX"."RP_BC_69_1" a6 INNER JOIN "AMEX"."PERSON" a7 ON a6."pid" = a7."pid" WHERE EXISTS (SELECT a8."pid" FROM "AMEX"."PERSON_ACCOUNT_MASTER_FINANCE" a8 INNER JOIN "AMEX"."UD_ACCOUNT_DETAILS_FINANCE" a9 ON a8."rp_account_id" = a9."rp_account_id" WHERE a7."pid" = a8."pid" AND a9."account_status" = 'O')) AS a5 ON a4."pid" = a5."pid" LEFT OUTER JOIN (SELECT a11."pid" FROM "AMEX"."RP_BC_69_1" a11 INNER JOIN "AMEX"."PERSON" a12 ON a11."pid" = a12."pid" WHERE a12."customer_segment" = 1) AS a10 ON a4."pid" = a10."pid" LEFT OUTER JOIN (SELECT a14."pid" FROM "AMEX"."RP_BC_69_1" a14 INNER JOIN "AMEX"."PERSON" a15 ON a14."pid" = a15."pid" WHERE EXISTS (SELECT a16."pid" FROM "AMEX"."PERSON_ADDRESS" a16 INNER JOIN "AMEX"."ADDRESS" a17 ON a16."addr_id" = a17."addr_id" WHERE a15."pid" = a16."pid" AND a17."std_status_code" IN ('M', 'X', '7'))) AS a13 ON a4."pid" = a13."pid" ; 19

20 OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon Architecture Ini2al Results vs. Oracle RAC Stream or Batch Updates Real- Time Offers 5-10x faster through parallelized queries Consumers Real- Time Data ¼ cost with commodity scale out 20

21 Digital Marke6ng ETL/Opera6onal Data Lake Use Cases Precision Medicine Fraud Detec6on Internet of Things 21

22 Sneak Peek: Splice Machine 2.0 First Hybrid, In- Memory RDBMS Powered by Hadoop and Spark Advantages OLAP + OLTP Massive scalability Spark in- memory compu6ng engine High- concurrency ACID transac6ons ANSI SQL Seamless integra6on Isolated resource management Benchmarks Simultaneous TPC- C and TPC- DS Never done before 22

23 Summary Power OLTP Apps on Hadoop First TPC- C benchmark run on Hadoop Leverage Hadoop for both OLTP and OLAP Stay Tuned! Hybrid, in- memory RDBMS powered by Hadoop and Spark Support mixed OLTP & OLAP workloads Look for simultaneous TPC- C & TPC- DS benchmark results

24 Ques6ons? Monte Zweben CEO Splice Machine John Leach CTO Splice Machine 24

25 OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015

26 OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Appendix

27 Unica Customer Marke6ng Service Provider Pilot Original number of records in each table: Household Master million Customer Preference million Household Computed Value million Customer Address Quality - 71 million Unica 6mings Strategic Segments EM (36 processes) (oracle: 5 hours - splice: 2 hours, 10 minutes) Strategic Segments DM (23 processes) (oracle: 3 hours - splice: 2 hours, 15 minutes) 27

28 Unica Demo Campaign Flowchart 1 " 8 segments of household IDs created " Preference indicated to receive direct mail " Iden6fied as valid (vs. ghost) household in advance " Segments used as inputs for other Flowcharts " Valid name and address " Loyalty customers converted to unique list " Household ID iden6fier by audience " Addi6onal Extract of Canadian household details " Real 6me updates to Splice DB table 28

29 Unica Demo Campaign (cont.) Household Segments: " a list of US based households (US DM HH) " the list of US based household just men6oned but where there has been at least one transac6on in the last 12 months. (US DM HH 12M) " a list of US based loyalty households (US PP DM HH) " a list of US based loyalty households with at least one transac6on in the last 12 months (US PP DM HH 12M) " a list of Canadian house holds (Canada DM) " a list of Canadian house holds with at least one transac6on in the last 12 months (Canada DM PP) " a list of Puerto Rico based households (Puerto Rico DM) " a list of Puerto Rico based loyalty households with at least one transac6on in the last 12 months (PR PP DM) 29

GETTING STARTED WITH NUODB

GETTING STARTED WITH NUODB February 15, 2017 GETTING STARTED WITH NUODB The elastic SQL database for hybrid cloud applications LOGISTICS AND INTRODUCTIONS 2 + All a&endees are muted + Submit ques3ons in the Q&A box on the right

More information

Shen PingCAP 2017

Shen PingCAP 2017 Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

h7ps://bit.ly/citustutorial

h7ps://bit.ly/citustutorial Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul

More information

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage

More information

Stay Informed During and AEer OpenWorld

Stay Informed During and AEer OpenWorld Stay Informed During and AEer OpenWorld TwiIer: @OracleBigData, @OracleExadata, @Infrastructure Follow #CloudReady LinkedIn: Oracle IT Infrastructure Oracle Showcase Page Oracle Big Data Oracle Showcase

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Top 10 SQL- on- Hadoop Pi1alls Monte Zweben

Top 10 SQL- on- Hadoop Pi1alls Monte Zweben Top 10 SQL- on- Hadoop Pi1alls Monte Zweben CEO, Splice Machine SQL- on- Hadoop Landscape A crowded, confusing landscape, full of poten4al and pi5alls Pi1all #1: Individual Lookups and Range Queries Issues!

More information

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp. Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Benchmarks Prove the Value of an Analytical Database for Big Data

Benchmarks Prove the Value of an Analytical Database for Big Data White Paper Vertica Benchmarks Prove the Value of an Analytical Database for Big Data Table of Contents page The Test... 1 Stage One: Performing Complex Analytics... 3 Stage Two: Achieving Top Speed...

More information

Trafodion Enterprise-Class Transactional SQL-on-HBase

Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Introduction (Welsh for transactions) Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop Leveraging 20+

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle NoSQL Database: Release 3.0 What s new and why you care Dave Segleau NoSQL Product Manager The following is intended to outline our general product direction. It is intended for information purposes

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

Latest Trends in Database Technology NoSQL and Beyond

Latest Trends in Database Technology NoSQL and Beyond Latest Trends in Database Technology NoSQL and Beyond Sebas>an Marsching www.aquenos.com Why we want more than SQL Performance / Data Size Opera>onal Costs Availability 2 NoSQL NoSQL Not Only SQL 3 NoSQL

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Distributed ACID Transac2ons in Apache Ignite

Distributed ACID Transac2ons in Apache Ignite Distributed ACID Transac2ons in Apache Ignite Akmal Chaudhri GridGain hbp://ignite.apache.org #apacheignite My Background Pre-2000 Developer Academic (City University) Consultant Technical Architect Post-2000

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Ian Choy. Technology Solutions Professional

Ian Choy. Technology Solutions Professional Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration

More information

Why Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me.

Why Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me. Transac'ons 1 Why Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me. Both queries and modifica'ons. Unlike opera'ng systems, which support interac'on

More information

From BigBench to TPCx-BB: Standardization of a Big Data Benchmark

From BigBench to TPCx-BB: Standardization of a Big Data Benchmark From BigBench to TPCx-BB: Standardization of a Big Data Benchmark Paul Cao, Bhaskar Gowda, Seetha Lakshmi, Chinmayi Narasimhadevara, Patrick Nguyen, John Poelman, Meikel Poess, Tilmann Rabl TPCTC New Delhi,

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

In-Memory Computing EXASOL Evaluation

In-Memory Computing EXASOL Evaluation In-Memory Computing EXASOL Evaluation 1. Purpose EXASOL (http://www.exasol.com/en/) provides an in-memory computing solution for data analytics. It combines inmemory, columnar storage and massively parallel

More information

Simplified and fast Fraud Detec4on. developer.oracle.com/ code

Simplified and fast Fraud Detec4on. developer.oracle.com/ code Simplified and fast Fraud Detec4on developer.oracle.com/ code developer.oracle.com/ code About me Keith Laker Senior Principal Product Management SQL and Data Warehousing Marathon runner, mountain biker

More information

Apache HAWQ (incubating)

Apache HAWQ (incubating) HADOOP NATIVE SQL What is HAWQ? Apache HAWQ (incubating) Is an elastic parallel processing SQL engine that runs native in Apache Hadoop to directly access data for advanced analytics. Why HAWQ? Hadoop

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

CSE 344 Final Review. August 16 th

CSE 344 Final Review. August 16 th CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join

More information

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. 17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations

More information

In-Memory Data Management Jens Krueger

In-Memory Data Management Jens Krueger In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing

More information

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led

Course Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led About this course This four-day instructor-led course provides students who manage and maintain SQL Server databases

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

HBase... And Lewis Carroll! Twi:er,

HBase... And Lewis Carroll! Twi:er, HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies

More information

A scalability comparison study of data management approaches for smart metering systems

A scalability comparison study of data management approaches for smart metering systems A scalability comparison study of data management approaches for smart metering systems Houssem Chihoub, Chris.ne Collet Grenoble INP houssem.chihoub@imag.fr Journées Plateformes Clermont Ferrand 6-7 octobre

More information

MySQL Cluster Web Scalability, % Availability. Andrew

MySQL Cluster Web Scalability, % Availability. Andrew MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended

More information

SQL in the Hybrid World

SQL in the Hybrid World SQL in the Hybrid World Tanel Poder a long time computer performance geek 1 Tanel Põder Intro: About me Oracle Database Performance geek (18+ years) Exadata Performance geek Linux Performance geek Hadoop

More information

Modern Database Concepts

Modern Database Concepts Modern Database Concepts Introduction to the world of Big Data Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz What is Big Data? buzzword? bubble? gold rush? revolution? Big data is like teenage

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Introduc3on to Data Management

Introduc3on to Data Management ICS 101 Fall 2014 Introduc3on to Data Management Assoc. Prof. Lipyeow Lim Informa3on & Computer Science Department University of Hawaii at Manoa Lipyeow Lim - - University of Hawaii at Manoa 1 The Data

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Western Michigan University

Western Michigan University CS-6030 Cloud compu;ng Google App engine Sepideh Mohammadi Summer II 2017 Western Michigan University content Categories of cloud compu;ng Google cloud plaborm Google App Engine Storage technologies Datastore

More information

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 2011/2012 Week 3. Lecture 6 Previous Class Dimensions & Measures Dimensions: Item Time Loca0on Measures: Quan0ty Sales TransID ItemName ItemID Date Store Qty T0001 Computer I23

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Achieving Horizontal Scalability. Alain Houf Sales Engineer Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches

More information

Apache Kylin. OLAP on Hadoop

Apache Kylin. OLAP on Hadoop Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

Importing and Exporting Data Between Hadoop and MySQL

Importing and Exporting Data Between Hadoop and MySQL Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for

More information

Database In- Memory and Exadata: Do I sgll need Exadata?

Database In- Memory and Exadata: Do I sgll need Exadata? Database In- Memory and Exadata: Do I sgll need Exadata? Mathew Steinberg Exadata and Database In- Memory Product Management IOUG BIWA Summit January 27-29, 2014 Redwood City, CA Oracle ConfidenGal Internal/Restricted/Highly

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc. PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit

More information

Crescando: Predictable Performance for Unpredictable Workloads

Crescando: Predictable Performance for Unpredictable Workloads Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation

More information

Practical Big Data Processing An Overview of Apache Flink

Practical Big Data Processing An Overview of Apache Flink Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers Oracle zsig Conference IBM LinuxONE and z System Servers Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers Sam Amsavelu Oracle on z Architect IBM Washington

More information

Hortonworks and The Internet of Things

Hortonworks and The Internet of Things Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform

More information

YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa

YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org About me Tsuyoshi Ozawa Research Engineer @ NTT Twitter: @oza_x86_64 Over 150 reviews in 2015

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

TiDB: NewSQL over HBase.

TiDB: NewSQL over HBase. TiDB: NewSQL over HBase liuqi@pingcap.com https://github.com/pingcap/tidb weibo: @goroutine Agenda HBase introduction TiDB features Internals of TiDB over HBase Features of HBase Linear and modular scalability.

More information

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunnie Chung Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunny Chung Presented by: Sonal Deshmukh Jay Upadhyay What is Apache Survey shows huge popularity spike for Apache

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

Migrating Oracle Databases To Cassandra

Migrating Oracle Databases To Cassandra BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra

More information

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP Analyze Big Data Faster and Store it Cheaper Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP ABOUT CENTERPOINT ENERGY, INC. Ø Ø Ø Ø Ø Ø Publicly traded on New York Stock Exchange

More information

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools

More information

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE

More information

MySQL for Developers Ed 3

MySQL for Developers Ed 3 Oracle University Contact Us: 1.800.529.0165 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications

More information

SQL Server 2017 Power your entire data estate from on-premises to cloud

SQL Server 2017 Power your entire data estate from on-premises to cloud SQL Server 2017 Power your entire data estate from on-premises to cloud PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS Vulnerabilities (2010-2016) Power your entire data estate

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

MySQL for Developers Ed 3

MySQL for Developers Ed 3 Oracle University Contact Us: 0845 777 7711 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

S-Store: Streaming Meets Transaction Processing

S-Store: Streaming Meets Transaction Processing S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing

More information

Introduction to NoSQL by William McKnight

Introduction to NoSQL by William McKnight Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

SQL Server Internals: The Practical Angle Sneak Peek. Dmitri Korotkevitch Moderated by Roberto Fonseca

SQL Server Internals: The Practical Angle Sneak Peek. Dmitri Korotkevitch Moderated by Roberto Fonseca SQL Server Internals: The Practical Angle Sneak Peek Dmitri Korotkevitch Moderated by Roberto Fonseca Technical Assistance Maximize your screen with the zoom button on the top of the presentation window

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive

More information

CS 445 Introduction to Database Systems

CS 445 Introduction to Database Systems CS 445 Introduction to Database Systems TTh 2:45-4:20pm Chadd Williams Pacific University 1 Overview Practical introduction to databases theory + hands on projects Topics Relational Model Relational Algebra/Calculus/

More information