OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks
|
|
- Anissa Dean
- 6 years ago
- Views:
Transcription
1 OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015
2 The Tradi6onal Database Market Opera2onal - $24B Analy2cal - $11B 2
3 Origins of Hadoop Big Batch Processing Jobs to Batch AnalyGcs Batch Processing for Web Search Index Batch AnalyGcs Google File System (GFS) paper published Google Map Reduce paper published Hadoop created based on GFS and Map Reduce Hive created by Facebook for Analy6cs 3
4 Hadoop Not Just for Data Scien6sts Anymore Moving Hadoop Beyond Batch AnalyGcs to Power Real- Time Apps Distributed File System Java MapReduce Programs Read- Only Batch Analy6cs Distributed RDBMS SQL- 99 Queries Real- Time Updates with ACID Transac6ons Real- Time Concurrent Apps and Analy6cs 4
5 SQL- on- Hadoop Market OLTP OLAP 5
6 OLTP Requirements ACID Transac6ons High concurrency Secondary indexes Joins Stored procedures Triggers Constraints Foreign keys Sub- queries Views 6
7 ACID What is it? Why is it important? Reliable updates across mulgple rows and tables Atomicity Transac6ons are all or nothing Consistency Isola2on Durability Only valid data is saved Transac6ons do not affect each other Wrigen data will not be lost Use Cases Update a value and a secondary index Transfer $ between bank accounts $ $ Recover from batch update error without reload 7
8 Doesn t Hive Have ACID Transac6ons? Hive Can t Power an App: High Concurrency vs Batch TransacGons ~10 concurrent users Batch updates Table locking Up to 100,000 concurrent users Mul6- Version Concurrency Control Power applica6ons Opera2onal RDBMSs designed for an analy2c workload low concurrency users upda6ng Mul6ple writers wait behind each other Hive is not meant for low latency updates and deletes Hortonworks blog, Apache Hive ACID TransacGons in HDP 2.2 8
9 Snapshot Isola6on: High- Concurrency MVCC Transac6ons Leverages Mul6- Version Concurrency Control (MVCC) Each update creates a new version with a new 6mestamp Each transac6on can see its own virtual snapshot of database Writers don t block readers 9
10 Splice Machine The RDBMS on Hadoop Replace Oracle to scale out your applica2ons " Affordable, Scale- Out Commodity hardware " Elas2c Easy to expand or scale back " Transac2onal Real- 6me updates & ACID Transac6ons " ANSI SQL Leverage exis6ng SQL code, tools, & skills " Flexible Support opera6onal and analy6cal workloads 10
11 Compelling TCO: Sample Oracle Replacement Oracle RAC Costs List Price Unit 3 Year Cost Splice Machine Costs List Price Unit 3 Year Cost Oracle Database Enterprise Edi6on with RAC $37, $2,416,000 Splice Machine Annual Subscrip6on $10,000 7 $210,000 3 years DB Maintenance (22% list price/yr) 3 years Opera6ng System Support (Oracle Linux) $24, $1,594,560 $6,897 4 $27,588 Cloudera Enterprise Edi6on Annual Subscrip6on 3 years Opera6ng System Support (Oracle Linux) $7,500 8 $180,000 $6,897 4 $27,588 Server Costs (mid- range, Intel Xeon- based) $16,000 4 $64,000 Server Costs with Storage $5,000 8 $40,000 Primary Storage $143,360 1 $143,360 TOTAL $22,500 $457,588 TOTAL $228,922 $4,245,508 Assumes Oracle Enterprise Edi6on ($47.5K/CPU) and RAC ($23K/CPU) 90% TCO Reduc6on ($3.8M) 3-7x faster 11
12 TPC- C Benchmark Gold standard for OLTP Requires high concurrency transac6ons 5 very complex queries Models ERP order- entry: 12
13 Ini6al TPC- C Results* on Commodity Hardware Linear scalability for transacgonal workload on Hadoop * Unaudited Transac2ons per Minute (tmpc) 70,000 60,000 50,000 40,000 30,000 20,000 10, Nodes 13
14 Experimental TPC- C Results: Splice Machine HBase Fork 160,000 Linear scalability for transacgonal workload on Hadoop * Unaudited Transac2ons per Minute (tmpc) 140, , ,000 80,000 60,000 40,000 20,000 - tpmc tpmc (Hbase Patch) Nodes 14
15 3 rd - Party Applica6ons vs Ad- Hoc Queries Far more difficult to accommodate than analygc queries Dynamically Generated SQL No workarounds since generated code Not human- friendly Machine- generated symbols not meant to be interpreted Code not indented or styled No comments Complex Sub- Queries Object Rela6onal Mappings create many levels of sub- queries Applica6ons: object oriented to achieve code efficiency, reuse, and understandability Databases: rela6onal to achieve performance, ACID proper6es, and minimal storage High Concurrency Must support 1,000s to 10,000s of concurrent users
16 OLTP Applica6on: Unica Splice Machine powers the Unica ApplicaGon Campaign Defini2on Customer Segmenta6on by Household Output of 8 Segments By Country By Previous 12 Months Use of Select, Merge and Audience Processes Data Flow Process Select, Merge & Audience Processes Selec6ons Rules Household that Have Opted in Grouped by Loyalty and Non Loyalty SQL for Complex Selec2on Rule INSERT INTO UAC_639_1c SELECT A.CUSTOMER_MASTER_ID FROM UAC_639_v A WHERE A.CUSTOMER_MASTER_ID NOT IN ( SELECT UAC_639_14.CUSTOMER_MASTER_ID FROM UAC_639_14 UNION SELECT UAC_639_11.CUSTOMER_MASTER_ID FROM UAC_639_11 UNION SELECT UAC_639_12.CUSTOMER_MASTER_ID FROM UAC_639_12 U NION SELECT UAC_639_13.CUSTOMER_MASTER_ID FROM UAC_639_13); 16
17 OLTP Applica6on: Unica Splice Machine powers the Unica ApplicaGon Cross Channel Campaigns Architecture Real- Time Personaliza6on Ini2al Results vs. Oracle RAC 3-7x faster through parallelized queries Consumers Real- Time Ac6ons ¼ cost with commodity scale out 17
18 OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon Campaign Defini2on Audience selec6on Dataflow Offers Data Flow Process Suppression Split rules 18
19 OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon SQL for Complex Selec2on Rule INSERT INTO "AMEX"."RP_BC_69_2" SELECT a4."pid" AS "PID", a4."hhid" AS "HHID", (CASE WHEN a5."pid" IS NULL THEN 'N' ELSE 'Y' END) AS "Standard_Suppression_2044", (CASE WHEN a10."pid" IS NULL THEN 'N' ELSE 'Y' END) AS "Low_Value_Customer_1729", (CASE WHEN a13."pid" IS NULL THEN 'N' ELSE 'Y' END) AS "Not_Mailable_1128" FROM "AMEX"."RP_BC_69_1" a4 LEFT OUTER JOIN (SELECT a6."pid" FROM "AMEX"."RP_BC_69_1" a6 INNER JOIN "AMEX"."PERSON" a7 ON a6."pid" = a7."pid" WHERE EXISTS (SELECT a8."pid" FROM "AMEX"."PERSON_ACCOUNT_MASTER_FINANCE" a8 INNER JOIN "AMEX"."UD_ACCOUNT_DETAILS_FINANCE" a9 ON a8."rp_account_id" = a9."rp_account_id" WHERE a7."pid" = a8."pid" AND a9."account_status" = 'O')) AS a5 ON a4."pid" = a5."pid" LEFT OUTER JOIN (SELECT a11."pid" FROM "AMEX"."RP_BC_69_1" a11 INNER JOIN "AMEX"."PERSON" a12 ON a11."pid" = a12."pid" WHERE a12."customer_segment" = 1) AS a10 ON a4."pid" = a10."pid" LEFT OUTER JOIN (SELECT a14."pid" FROM "AMEX"."RP_BC_69_1" a14 INNER JOIN "AMEX"."PERSON" a15 ON a14."pid" = a15."pid" WHERE EXISTS (SELECT a16."pid" FROM "AMEX"."PERSON_ADDRESS" a16 INNER JOIN "AMEX"."ADDRESS" a17 ON a16."addr_id" = a17."addr_id" WHERE a15."pid" = a16."pid" AND a17."std_status_code" IN ('M', 'X', '7'))) AS a13 ON a4."pid" = a13."pid" ; 19
20 OLTP Applica6on: Redpoint Splice Machine powers the RedPoint Convergent MarkeGng ApplicaGon Architecture Ini2al Results vs. Oracle RAC Stream or Batch Updates Real- Time Offers 5-10x faster through parallelized queries Consumers Real- Time Data ¼ cost with commodity scale out 20
21 Digital Marke6ng ETL/Opera6onal Data Lake Use Cases Precision Medicine Fraud Detec6on Internet of Things 21
22 Sneak Peek: Splice Machine 2.0 First Hybrid, In- Memory RDBMS Powered by Hadoop and Spark Advantages OLAP + OLTP Massive scalability Spark in- memory compu6ng engine High- concurrency ACID transac6ons ANSI SQL Seamless integra6on Isolated resource management Benchmarks Simultaneous TPC- C and TPC- DS Never done before 22
23 Summary Power OLTP Apps on Hadoop First TPC- C benchmark run on Hadoop Leverage Hadoop for both OLTP and OLAP Stay Tuned! Hybrid, in- memory RDBMS powered by Hadoop and Spark Support mixed OLTP & OLAP workloads Look for simultaneous TPC- C & TPC- DS benchmark results
24 Ques6ons? Monte Zweben CEO Splice Machine John Leach CTO Splice Machine 24
25 OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Monte Zweben Co- Founder and Chief Execu6ve Officer John Leach Co- Founder and Chief Technology Officer September 30, 2015
26 OLTP on Hadoop: Reviewing the first Hadoop- based TPC- C benchmarks Appendix
27 Unica Customer Marke6ng Service Provider Pilot Original number of records in each table: Household Master million Customer Preference million Household Computed Value million Customer Address Quality - 71 million Unica 6mings Strategic Segments EM (36 processes) (oracle: 5 hours - splice: 2 hours, 10 minutes) Strategic Segments DM (23 processes) (oracle: 3 hours - splice: 2 hours, 15 minutes) 27
28 Unica Demo Campaign Flowchart 1 " 8 segments of household IDs created " Preference indicated to receive direct mail " Iden6fied as valid (vs. ghost) household in advance " Segments used as inputs for other Flowcharts " Valid name and address " Loyalty customers converted to unique list " Household ID iden6fier by audience " Addi6onal Extract of Canadian household details " Real 6me updates to Splice DB table 28
29 Unica Demo Campaign (cont.) Household Segments: " a list of US based households (US DM HH) " the list of US based household just men6oned but where there has been at least one transac6on in the last 12 months. (US DM HH 12M) " a list of US based loyalty households (US PP DM HH) " a list of US based loyalty households with at least one transac6on in the last 12 months (US PP DM HH 12M) " a list of Canadian house holds (Canada DM) " a list of Canadian house holds with at least one transac6on in the last 12 months (Canada DM PP) " a list of Puerto Rico based households (Puerto Rico DM) " a list of Puerto Rico based loyalty households with at least one transac6on in the last 12 months (PR PP DM) 29
GETTING STARTED WITH NUODB
February 15, 2017 GETTING STARTED WITH NUODB The elastic SQL database for hybrid cloud applications LOGISTICS AND INTRODUCTIONS 2 + All a&endees are muted + Submit ques3ons in the Q&A box on the right
More informationShen PingCAP 2017
Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationh7ps://bit.ly/citustutorial
Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul
More informationIntroduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent
Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage
More informationStay Informed During and AEer OpenWorld
Stay Informed During and AEer OpenWorld TwiIer: @OracleBigData, @OracleExadata, @Infrastructure Follow #CloudReady LinkedIn: Oracle IT Infrastructure Oracle Showcase Page Oracle Big Data Oracle Showcase
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationTop 10 SQL- on- Hadoop Pi1alls Monte Zweben
Top 10 SQL- on- Hadoop Pi1alls Monte Zweben CEO, Splice Machine SQL- on- Hadoop Landscape A crowded, confusing landscape, full of poten4al and pi5alls Pi1all #1: Individual Lookups and Range Queries Issues!
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationBenchmarks Prove the Value of an Analytical Database for Big Data
White Paper Vertica Benchmarks Prove the Value of an Analytical Database for Big Data Table of Contents page The Test... 1 Stage One: Performing Complex Analytics... 3 Stage Two: Achieving Top Speed...
More informationTrafodion Enterprise-Class Transactional SQL-on-HBase
Trafodion Enterprise-Class Transactional SQL-on-HBase Trafodion Introduction (Welsh for transactions) Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop Leveraging 20+
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationNOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe
NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationCopyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle NoSQL Database: Release 3.0 What s new and why you care Dave Segleau NoSQL Product Manager The following is intended to outline our general product direction. It is intended for information purposes
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationLatest Trends in Database Technology NoSQL and Beyond
Latest Trends in Database Technology NoSQL and Beyond Sebas>an Marsching www.aquenos.com Why we want more than SQL Performance / Data Size Opera>onal Costs Availability 2 NoSQL NoSQL Not Only SQL 3 NoSQL
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationJargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems
Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationDistributed ACID Transac2ons in Apache Ignite
Distributed ACID Transac2ons in Apache Ignite Akmal Chaudhri GridGain hbp://ignite.apache.org #apacheignite My Background Pre-2000 Developer Academic (City University) Consultant Technical Architect Post-2000
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationIan Choy. Technology Solutions Professional
Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration
More informationWhy Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me.
Transac'ons 1 Why Transac'ons? Database systems are normally being accessed by many users or processes at the same 'me. Both queries and modifica'ons. Unlike opera'ng systems, which support interac'on
More informationFrom BigBench to TPCx-BB: Standardization of a Big Data Benchmark
From BigBench to TPCx-BB: Standardization of a Big Data Benchmark Paul Cao, Bhaskar Gowda, Seetha Lakshmi, Chinmayi Narasimhadevara, Patrick Nguyen, John Poelman, Meikel Poess, Tilmann Rabl TPCTC New Delhi,
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationIn-Memory Computing EXASOL Evaluation
In-Memory Computing EXASOL Evaluation 1. Purpose EXASOL (http://www.exasol.com/en/) provides an in-memory computing solution for data analytics. It combines inmemory, columnar storage and massively parallel
More informationSimplified and fast Fraud Detec4on. developer.oracle.com/ code
Simplified and fast Fraud Detec4on developer.oracle.com/ code developer.oracle.com/ code About me Keith Laker Senior Principal Product Management SQL and Data Warehousing Marathon runner, mountain biker
More informationApache HAWQ (incubating)
HADOOP NATIVE SQL What is HAWQ? Apache HAWQ (incubating) Is an elastic parallel processing SQL engine that runs native in Apache Hadoop to directly access data for advanced analytics. Why HAWQ? Hadoop
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationCSE 344 Final Review. August 16 th
CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationIn-Memory Data Management Jens Krueger
In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing
More informationCourse Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led
Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led About this course This four-day instructor-led course provides students who manage and maintain SQL Server databases
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationHBase... And Lewis Carroll! Twi:er,
HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies
More informationA scalability comparison study of data management approaches for smart metering systems
A scalability comparison study of data management approaches for smart metering systems Houssem Chihoub, Chris.ne Collet Grenoble INP houssem.chihoub@imag.fr Journées Plateformes Clermont Ferrand 6-7 octobre
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationSQL in the Hybrid World
SQL in the Hybrid World Tanel Poder a long time computer performance geek 1 Tanel Põder Intro: About me Oracle Database Performance geek (18+ years) Exadata Performance geek Linux Performance geek Hadoop
More informationModern Database Concepts
Modern Database Concepts Introduction to the world of Big Data Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz What is Big Data? buzzword? bubble? gold rush? revolution? Big data is like teenage
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationIntroduc3on to Data Management
ICS 101 Fall 2014 Introduc3on to Data Management Assoc. Prof. Lipyeow Lim Informa3on & Computer Science Department University of Hawaii at Manoa Lipyeow Lim - - University of Hawaii at Manoa 1 The Data
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationWestern Michigan University
CS-6030 Cloud compu;ng Google App engine Sepideh Mohammadi Summer II 2017 Western Michigan University content Categories of cloud compu;ng Google cloud plaborm Google App Engine Storage technologies Datastore
More informationA Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP
A Brief Introduction of TiDB Dongxu (Edward) Huang CTO, PingCAP About me Dongxu (Edward) Huang, Cofounder & CTO of PingCAP PingCAP, based in Beijing, China. Infrastructure software engineer, open source
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationDecision Support Systems
Decision Support Systems 2011/2012 Week 3. Lecture 6 Previous Class Dimensions & Measures Dimensions: Item Time Loca0on Measures: Quan0ty Sales TransID ItemName ItemID Date Store Qty T0001 Computer I23
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationApache Kylin. OLAP on Hadoop
Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationHow do we build TiDB. a Distributed, Consistent, Scalable, SQL Database
How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationHadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)
Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:
More informationImporting and Exporting Data Between Hadoop and MySQL
Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for
More informationDatabase In- Memory and Exadata: Do I sgll need Exadata?
Database In- Memory and Exadata: Do I sgll need Exadata? Mathew Steinberg Exadata and Database In- Memory Product Management IOUG BIWA Summit January 27-29, 2014 Redwood City, CA Oracle ConfidenGal Internal/Restricted/Highly
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationPROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.
PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationPractical Big Data Processing An Overview of Apache Flink
Practical Big Data Processing An Overview of Apache Flink Tilmann Rabl Berlin Big Data Center www.dima.tu-berlin.de bbdc.berlin rabl@tu-berlin.de With slides from Volker Markl and data artisans 1 2013
More information4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,
More informationIntegrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers
Oracle zsig Conference IBM LinuxONE and z System Servers Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers Sam Amsavelu Oracle on z Architect IBM Washington
More informationHortonworks and The Internet of Things
Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform
More informationYARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa
YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org About me Tsuyoshi Ozawa Research Engineer @ NTT Twitter: @oza_x86_64 Over 150 reviews in 2015
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationTiDB: NewSQL over HBase.
TiDB: NewSQL over HBase liuqi@pingcap.com https://github.com/pingcap/tidb weibo: @goroutine Agenda HBase introduction TiDB features Internals of TiDB over HBase Features of HBase Linear and modular scalability.
More informationSubmitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay
Submitted to: Dr. Sunnie Chung Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunny Chung Presented by: Sonal Deshmukh Jay Upadhyay What is Apache Survey shows huge popularity spike for Apache
More informationThere is a tempta7on to say it is really used, it must be good
Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit
More informationMigrating Oracle Databases To Cassandra
BY UMAIR MANSOOB Why Cassandra Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications. Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data. Cassandra
More informationAnalyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP
Analyze Big Data Faster and Store it Cheaper Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP ABOUT CENTERPOINT ENERGY, INC. Ø Ø Ø Ø Ø Ø Publicly traded on New York Stock Exchange
More informationSwimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad
Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More informationMySQL for Developers Ed 3
Oracle University Contact Us: 1.800.529.0165 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications
More informationSQL Server 2017 Power your entire data estate from on-premises to cloud
SQL Server 2017 Power your entire data estate from on-premises to cloud PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS Vulnerabilities (2010-2016) Power your entire data estate
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationMySQL for Developers Ed 3
Oracle University Contact Us: 0845 777 7711 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationS-Store: Streaming Meets Transaction Processing
S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing
More informationIntroduction to NoSQL by William McKnight
Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationSQL Server Internals: The Practical Angle Sneak Peek. Dmitri Korotkevitch Moderated by Roberto Fonseca
SQL Server Internals: The Practical Angle Sneak Peek Dmitri Korotkevitch Moderated by Roberto Fonseca Technical Assistance Maximize your screen with the zoom button on the top of the presentation window
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive
More informationCS 445 Introduction to Database Systems
CS 445 Introduction to Database Systems TTh 2:45-4:20pm Chadd Williams Pacific University 1 Overview Practical introduction to databases theory + hands on projects Topics Relational Model Relational Algebra/Calculus/
More information