Splout SQL When Big Data Output is also Big Data

Size: px
Start display at page:

Download "Splout SQL When Big Data Output is also Big Data"

Transcription

1 Iván de Prado Alonso CEO of Splout SQL When Big Data Output is also Big Data

2 Big Data consulting & training

3

4 Full SQL * * Within each par??on

5 Full SQL * Unlike NoSQL * Within each par??on

6 Full SQL * Unlike NoSQL For Big Data * Within each par??on

7 Full SQL * For Big Data Unlike NoSQL Unlike RDBMS * Within each par??on

8 Full SQL * For Big Data Unlike NoSQL Unlike RDBMS Web latency & throughput * Within each par??on

9 Full SQL * For Big Data Web latency & throughput Unlike NoSQL Unlike RDBMS Unlike Impala, Apache Drill, etc. * Within each par??on

10 How does it work?

11 How does it work?

12 How does it work? IsolaAon between generaaon and serving

13 GeneraAon Table CLIENTS CID Name U20 Doug U21 Ted U40 John Table SALES SID CID Amount S100 U S101 U20 60 S223 U40 99

14 GeneraAon Generate tablespace CLIENTS_INFO with 2 par??ons for Table CLIENTS CID Name U20 Doug U21 Ted U40 John Table SALES SID CID Amount S100 U S101 U20 60 S223 U40 99

15 GeneraAon Generate tablespace CLIENTS_INFO with 2 par??ons for table CLIENTS Table CLIENTS CID Name U20 Doug U21 Ted U40 John Table SALES SID CID Amount S100 U S101 U20 60 S223 U40 99

16 GeneraAon Generate tablespace CLIENTS_INFO with 2 par??ons for table CLIENTS par??oned by CID Table CLIENTS CID Name U20 Doug U21 Ted U40 John Table SALES SID CID Amount S100 U S101 U20 60 S223 U40 99

17 GeneraAon Generate tablespace CLIENTS_INFO with 2 par??ons for table CLIENTS par??oned by CID table SALES Table CLIENTS CID U20 U21 U40 Name Doug Ted John Table SALES SID CID Amount S100 U S101 U20 60 S223 U40 99

18 GeneraAon Generate tablespace CLIENTS_INFO with 2 par??ons for table CLIENTS par??oned by CID table SALES par??oned by CID Table CLIENTS CID U20 U21 U40 Name Doug Ted John Table SALES SID CID Amount S100 U S101 U20 60 S223 U40 99

19 GeneraAon Table CLIENTS CID Name Generate tablespace CLIENTS_INFO with 2 par??ons for table CLIENTS par??oned by CID table SALES par??oned by CID Tablespace CLIENTS_INFO Par77on U10 U35 U20 U21 U40 Doug Ted John Table CLIENTS CID Name U20 Doug Table SALES SID CID Amount S100 U U21 Ted S101 U20 60 Table SALES SID CID Amount S100 U S101 U20 60 S223 U40 99 Par77on U36 U60 Table CLIENTS CID Name U40 John Table SALES SID CID Amount S223 U40 99

20 Serving Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

21 Serving SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U20 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

22 Serving For key = U20, tablespace= CLIENTS_INFO SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U20 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

23 Serving For key = U20, tablespace= CLIENTS_INFO SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U20 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

24 Serving For key = U20, tablespace= CLIENTS_INFO SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U20 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

25 Serving Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

26 Serving SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U40 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

27 Serving For key = U40, tablespace= CLIENTS_INFO SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U40 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

28 Serving For key = U40, tablespace= CLIENTS_INFO SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U40 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

29 Serving For key = U40, tablespace= CLIENTS_INFO SELECT Name, sum(amount) FROM CLIENTS c, SALES s WHERE c.cid = s.cid AND CID = U40 ; Par??on U10 U35 Table CLIENTS CID Name Par??on U36 U60 Table CLIENTS CID Name U20 Doug U40 John U21 Ted Table SALES SID CID Amount S100 U S101 U20 60 Table SALES SID CID Amount S223 U40 99

30

31 Why does it scale?

32 Why does it scale? Data is paraaoned

33 Why does it scale? Data is paraaoned Par77ons are distributed across nodes

34 Why does it scale? Data is paraaoned Par77ons are distributed across nodes Adding more nodes increases capacity

35 Why does it scale? Data is paraaoned Par77ons are distributed across nodes Adding more nodes increases capacity Queries restricted to a single paraaon

36 Why does it scale? Data is paraaoned Par77ons are distributed across nodes Adding more nodes increases capacity Queries restricted to a single paraaon Genera7on does not impact serving

37

38 Ok, so what is Splout SQL useful for?

39

40 Big Data Analy?cs

41 Big Data Analy?cs

42 Big Data Analy?cs Manageable output

43

44 Big Data Analy?cs

45 Big Data Analy?cs

46 Big Data Analy?cs SomeAmes Big Data output is also Big Data

47 Splout SQL allows to serve Big Data results

48 Let s see an example

49 Building a Google AnalyAcs

50 Building a Google AnalyAcs Imagine that one crazy day you decide to build some kind of Google AnalyAcs

51 Building a Google AnalyAcs Imagine that one crazy day you decide to build some kind of Google AnalyAcs Zillions of events

52 Building a Google AnalyAcs Imagine that one crazy day you decide to build some kind of Google AnalyAcs Zillions of events Millions of domains

53 Building a Google AnalyAcs Imagine that one crazy day you decide to build some kind of Google AnalyAcs Zillions of events Millions of domains Individual panel per domain

54 Requirements

55 Requirements Time- based charts (day/hour aggrega?ons)

56 Requirements Time- based charts (day/hour aggrega?ons) Flexible dimension breakdown Per page, per browser Per country, per language

57 With Splout SQL

58 Splout SQL provides SQL consolidated views for Hadoop data

59 Let s see more details about Splout SQL

60 Splout SQL Architecture

61 Splout SQL Architecture

62 Each paraaon is

63 Each paraaon is Backed by SQLite or MySQL

64 Each paraaon is Backed by SQLite or MySQL Generated on Hadoop

65 Each paraaon is Backed by SQLite or MySQL Generated on Hadoop Including any indexes needed

66 Each paraaon is Backed by SQLite or MySQL Generated on Hadoop Including any indexes needed Data can be sorted before inser?on to minimize disk seeks at query?me

67 Each paraaon is Backed by SQLite or MySQL Generated on Hadoop Including any indexes needed Data can be sorted before inser?on to minimize disk seeks at query?me Pre- sampling for balancing par??on size

68 Each paraaon is Backed by SQLite or MySQL Generated on Hadoop Including any indexes needed Data can be sorted before inser?on to minimize disk seeks at query?me Pre- sampling for balancing par??on size Distributed on Splout SQL cluster

69 Each paraaon is Backed by SQLite or MySQL Generated on Hadoop Including any indexes needed Data can be sorted before inser?on to minimize disk seeks at query?me Pre- sampling for balancing par??on size Distributed on Splout SQL cluster With replica?on for failover

70 Atomicity

71 Atomicity A tablespace is a set of tables that share the same paraaoning schema

72 Atomicity A tablespace is a set of tables that share the same paraaoning schema Tablespaces are versioned

73 Atomicity A tablespace is a set of tables that share the same paraaoning schema Tablespaces are versioned Only one version served at a?me

74 Atomicity A tablespace is a set of tables that share the same paraaoning schema Tablespaces are versioned Only one version served at a?me Several tablespaces can be deployed at once

75 Atomicity A tablespace is a set of tables that share the same paraaoning schema Tablespaces are versioned Only one version served at a?me Several tablespaces can be deployed at once All- or- nothing seman?cs (atomicity)

76 Atomicity A tablespace is a set of tables that share the same paraaoning schema Tablespaces are versioned Several tablespaces can be deployed at once Only one version served at a?me All- or- nothing seman?cs (atomicity) Rollback support

77 CharacterisAcs

78 CharacterisAcs Ensured ms latencies

79 CharacterisAcs Ensured ms latencies Even when queries hit disk

80 CharacterisAcs Ensured ms latencies Even when queries hit disk Controlled by the developer selec?ng the proper:

81 CharacterisAcs Ensured ms latencies Even when queries hit disk Controlled by the developer selec?ng the proper: Cluster topology Par??oning Indexes Data colloca?on (inser?on order)

82 CharacterisAcs (II)

83 CharacterisAcs (II) 100% SQL

84 CharacterisAcs (II) 100% SQL But restricted to a single par??on

85 CharacterisAcs (II) 100% SQL But restricted to a single par??on Real-?me aggrega?ons

86 CharacterisAcs (II) 100% SQL But restricted to a single par??on Real-?me aggrega?ons Joins

87 CharacterisAcs (II) 100% SQL But restricted to a single par??on Real-?me aggrega?ons Joins Scalability

88 CharacterisAcs (II) 100% SQL But restricted to a single par??on Real-?me aggrega?ons Joins Scalability In data capacity

89 CharacterisAcs (II) 100% SQL But restricted to a single par??on Real-?me aggrega?ons Joins Scalability In data capacity In performance

90 CharacterisAcs (III)

91 CharacterisAcs (III) Atomicity

92 CharacterisAcs (III) Atomicity New data replaces old data all at once

93 CharacterisAcs (III) Atomicity New data replaces old data all at once High availability

94 CharacterisAcs (III) Atomicity New data replaces old data all at once High availability Through the use of replica?on

95 CharacterisAcs (III) Atomicity New data replaces old data all at once High availability Through the use of replica?on Open Source

96 CharacterisAcs (IV)

97 CharacterisAcs (IV) Easy to manage

98 CharacterisAcs (IV) Easy to manage Changing the size of the cluster can be done without any down?me

99 CharacterisAcs (IV) Easy to manage Changing the size of the cluster can be done without any down?me Read only

100 CharacterisAcs (IV) Easy to manage Changing the size of the cluster can be done without any down?me Read only Data is updated in batches

101 CharacterisAcs (IV) Easy to manage Changing the size of the cluster can be done without any down?me Read only Data is updated in batches Updates come from new tablespace deployments

102 CharacterisAcs (V)

103 CharacterisAcs (V) NaAve connectors

104 CharacterisAcs (V) NaAve connectors Hive

105 CharacterisAcs (V) NaAve connectors Hive Pig

106 CharacterisAcs (V) NaAve connectors Hive Pig Cascading

107 API - GeneraAon

108 API - GeneraAon Command line

109 API - GeneraAon Command line Loading CSV files

110 API - GeneraAon Command line Loading CSV files $ hadoop jar splout- *-hadoop.jar generate

111 API - GeneraAon Command line Loading CSV files $ hadoop jar splout- Java API *-hadoop.jar generate

112 API - GeneraAon Command line Loading CSV files $ hadoop jar splout- Java API *-hadoop.jar generate

113 API - GeneraAon Command line Loading CSV files $ hadoop jar splout- *-hadoop.jar generate Java API HCatalog

114 API - GeneraAon Command line Loading CSV files $ hadoop jar splout- *-hadoop.jar generate Java API HCatalog Hive

115 API - GeneraAon Command line Loading CSV files $ hadoop jar splout- *-hadoop.jar generate Java API HCatalog Hive Pig

116 API - Service

117 API - Service Rest API

118 API - Service Rest API

119 API - Service Rest API JSON response

120 API - Console

121 Joins

122 Joins Between co- paraaoned tables

123 Joins Between co- paraaoned tables e.g. Clients and Sales by CID

124 Joins Between co- paraaoned tables e.g. Clients and Sales by CID With omnipresent tables

125 Joins Between co- paraaoned tables e.g. Clients and Sales by CID With omnipresent tables Full data present in every par??on Useful for dimension tables in star schemas e.g. countries table

126 What if I need different paraaoning?

127 What if I need different paraaoning? Example

128 What if I need different paraaoning? Example Queries by Merchant cannot be answered by a tablespace par??oned by Client

129 What if I need different paraaoning? Example Queries by Merchant cannot be answered by a tablespace par??oned by Client Just create more tablespaces

130 What if I need different paraaoning? Example Queries by Merchant cannot be answered by a tablespace par??oned by Client Just create more tablespaces First par??oned by Client Second par??oned by Merchant Deploy both atomically

131 Benchmark

132 Benchmark 350 GB Wikipedia logs

133 Benchmark 350 GB Wikipedia logs Aggrega?on queries impac?ng 15 rows in average

134 Benchmark 350 GB Wikipedia logs Aggrega?on queries impac?ng 15 rows in average 2- machines cluster

135 Benchmark 350 GB Wikipedia logs Aggrega?on queries impac?ng 15 rows in average 2- machines cluster 900 queries/second, 80 ms/query, 80 threads

136 Benchmark 350 GB Wikipedia logs Aggrega?on queries impac?ng 15 rows in average 2- machines cluster 900 queries/second, 80 ms/query, 80 threads

137 Benchmark (II)

138 Benchmark (II) 4- machines cluster

139 Benchmark (II) 4- machines cluster 3150 queries/second, 40 ms/query, 160 threads

140 Benchmark (II) 4- machines cluster 3150 queries/second, 40 ms/query, 160 threads More info: hlp://sploutsql.com/performance.html

141

142 Web latency

143 Web latency SQL

144 Web latency SQL Consolidated Views

145 Web latency SQL Consolidated Views For Hadoop

146 Web latency SQL Consolidated Views For Hadoop A good candidate for the serving layer of a lambda architecture

147 Future work

148 Future work Growing the community

149 Future work Growing the community Do you want to collaborate?

150 Future work Growing the community Do you want to collaborate? More engines

151 Future work Growing the community Do you want to collaborate? More engines SQLite, MySQL and Redis already done Columnar formats

152 Future work Growing the community Do you want to collaborate? More engines SQLite, MySQL and Redis already done Columnar formats Rack awareness

153 Future work Growing the community Do you want to collaborate? More engines SQLite, MySQL and Redis already done Columnar formats Rack awareness MulA- tenancy

154 Future work Growing the community Do you want to collaborate? More engines SQLite, MySQL and Redis already done Columnar formats Rack awareness MulA- tenancy Test on scale

155 Future work Growing the community Do you want to collaborate? More engines SQLite, MySQL and Redis already done Columnar formats Rack awareness MulA- tenancy Test on scale Test Splout on bigger clusters

156 Iván de Prado Alonso CEO of hhp://sploutsql.com QuesAons?

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc. PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit

More information

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years

More information

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons The NoSQL Landscape Frank Weigel VP, Field Technical Opera;ons What we ll talk about Why RDBMS are not enough? What are the different NoSQL taxonomies? Which NoSQL is right for me? Macro Trends Driving

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Apache Drill. Interactive Analysis of Large-Scale Datasets. Tomer Shiran

Apache Drill. Interactive Analysis of Large-Scale Datasets. Tomer Shiran Apache Drill Interactive Analysis of Large-Scale Datasets Tomer Shiran Latency Matters Ad-hoc analysis with interactive tools Real-time dashboards Event/trend detection Network intrusions Fraud Failures

More information

MySQL Cluster Web Scalability, % Availability. Andrew

MySQL Cluster Web Scalability, % Availability. Andrew MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended

More information

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE

More information

FROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà

FROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà FROM LEGACY, TO BATCH, TO NEAR REAL-TIME Marc Sturlese, Dani Solà WHO ARE WE? Marc Sturlese - @sturlese Backend engineer, focused on R&D Interests: search, scalability Dani Solà - @dani_sola Backend engineer

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Sempala. Interactive SPARQL Query Processing on Hadoop

Sempala. Interactive SPARQL Query Processing on Hadoop Sempala Interactive SPARQL Query Processing on Hadoop Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen University of Freiburg, Germany ISWC 2014 - Riva del Garda, Italy Motivation

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Ghislain Fourny. Big Data 5. Column stores

Ghislain Fourny. Big Data 5. Column stores Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

HCatalog. Table Management for Hadoop. Alan F. Page 1

HCatalog. Table Management for Hadoop. Alan F. Page 1 HCatalog Table Management for Hadoop Alan F. Gates @alanfgates Page 1 Who Am I? HCatalog committer and mentor Co-founder of Hortonworks Tech lead for Data team at Hortonworks Pig committer and PMC Member

More information

Copy Data From One Schema To Another In Sql Developer

Copy Data From One Schema To Another In Sql Developer Copy Data From One Schema To Another In Sql Developer The easiest way to copy an entire Oracle table (structure, contents, indexes, to copy a table from one schema to another, or from one database to another,.

More information

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI Big Data Development CASSANDRA NoSQL Training - Workshop November 20 to 24 2016 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

Typical size of data you deal with on a daily basis

Typical size of data you deal with on a daily basis Typical size of data you deal with on a daily basis Processes More than 161 Petabytes of raw data a day https://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minuteinfographic/ On average, 1MB-2MB

More information

MySQL Cluster Ed 2. Duration: 4 Days

MySQL Cluster Ed 2. Duration: 4 Days Oracle University Contact Us: +65 6501 2328 MySQL Cluster Ed 2 Duration: 4 Days What you will learn This MySQL Cluster training teaches you how to install and configure a real-time database cluster at

More information

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016

How we build TiDB. Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 How we build TiDB Max Liu PingCAP Amsterdam, Netherlands October 5, 2016 About me Infrastructure engineer / CEO of PingCAP Working on open source projects: TiDB: https://github.com/pingcap/tidb TiKV: https://github.com/pingcap/tikv

More information

MySQL Cluster Student Guide

MySQL Cluster Student Guide MySQL Cluster Student Guide D62018GC11 Edition 1.1 November 2012 D79677 Technical Contributor and Reviewer Mat Keep Editors Aju Kumar Daniel Milne Graphic Designer Seema Bopaiah Publishers Sujatha Nagendra

More information

Announcements. PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read. Take a break around 10:15am

Announcements. PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read. Take a break around 10:15am Announcements PS 3 is out (see the usual place on the course web) Be sure to read my notes carefully Also read SQL tutorial: http://www.w3schools.com/sql/default.asp Take a break around 10:15am 1 Databases

More information

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e ABSTRACT Pentaho Business Analytics from different data source, Analytics from csv/sql,create Star Schema Fact & Dimension Tables, kettle transformation for big data integration, MongoDB kettle Transformation,

More information

Cloudera Impala Headline Goes Here

Cloudera Impala Headline Goes Here Cloudera Impala Headline Goes Here JusAn Erickson Senior Product Manager Speaker Name or Subhead Goes Here February 2013 DO NOT USE PUBLICLY PRIOR TO 10/23/12 Agenda Intro to Impala Architectural Overview

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

A scalability comparison study of data management approaches for smart metering systems

A scalability comparison study of data management approaches for smart metering systems A scalability comparison study of data management approaches for smart metering systems Houssem Chihoub, Chris.ne Collet Grenoble INP houssem.chihoub@imag.fr Journées Plateformes Clermont Ferrand 6-7 octobre

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

MySQL & NoSQL: The Best of Both Worlds

MySQL & NoSQL: The Best of Both Worlds MySQL & NoSQL: The Best of Both Worlds Mario Beck Principal Sales Consultant MySQL mario.beck@oracle.com 1 Copyright 2012, Oracle and/or its affiliates. All rights Safe Harbour Statement The following

More information

State of the Dolphin Developing new Apps in MySQL 8

State of the Dolphin Developing new Apps in MySQL 8 State of the Dolphin Developing new Apps in MySQL 8 Highlights of MySQL 8.0 technology updates Mark Swarbrick MySQL Principle Presales Consultant Jill Anolik MySQL Global Business Unit Israel Copyright

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? Volume: 72 Questions Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig? A. update hdfs set D as./output ; B. store D

More information

Top 10 SQL- on- Hadoop Pi1alls Monte Zweben

Top 10 SQL- on- Hadoop Pi1alls Monte Zweben Top 10 SQL- on- Hadoop Pi1alls Monte Zweben CEO, Splice Machine SQL- on- Hadoop Landscape A crowded, confusing landscape, full of poten4al and pi5alls Pi1all #1: Individual Lookups and Range Queries Issues!

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

Actual4Test.   Actual4test - actual test exam dumps-pass for IT exams Actual4Test http://www.actual4test.com Actual4test - actual test exam dumps-pass for IT exams Exam : 1z1-449 Title : Oracle Big Data 2017 Implementation Essentials Vendor : Oracle Version : DEMO Get Latest

More information

Databricks, an Introduction

Databricks, an Introduction Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,

More information

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software

Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software jreser@progress.com Agenda Data Variety (Cloud and Enterprise) ABL ODBC Bridge Using Progress

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions 1Z0-449 Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions Table of Contents Introduction to 1Z0-449 Exam on Oracle Big Data 2017 Implementation Essentials... 2 Oracle 1Z0-449

More information

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage

More information

Ghislain Fourny. Big Data 5. Wide column stores

Ghislain Fourny. Big Data 5. Wide column stores Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces

More information

Data warehousing on Hadoop. Marek Grzenkowicz Roche Polska

Data warehousing on Hadoop. Marek Grzenkowicz Roche Polska Data warehousing on Hadoop Marek Grzenkowicz Roche Polska Agenda Introduction Case study: StraDa project Source data Data model Data flow and processing Reporting Lessons learnt Ideas for the future Q&A

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

Scaling DreamFactory

Scaling DreamFactory Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Start Working with Parquet!!!!

Start Working with Parquet!!!! My Goal Tonight. Start Working with Parquet!!!! Parquet Query Performance Origin of Parquet Parquet Storage Query Request Usage with Hadoop Tools Customer Examples Topics Parquet Defined Storage & Encoding

More information

DISTRIBUTED DATABASE OPTIMIZATIONS WITH NoSQL MEMBERS

DISTRIBUTED DATABASE OPTIMIZATIONS WITH NoSQL MEMBERS U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 2, 2015 ISSN 2286-3540 DISTRIBUTED DATABASE OPTIMIZATIONS WITH NoSQL MEMBERS George Dan POPA 1 Distributed database complexity, as well as wide usability area,

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

HBase... And Lewis Carroll! Twi:er,

HBase... And Lewis Carroll! Twi:er, HBase... And Lewis Carroll! jw4ean@cloudera.com Twi:er, LinkedIn: @jw4ean 1 Introduc@on 2010: Cloudera Solu@ons Architect 2011: Cloudera TAM/DSE 2012-2013: Cloudera Training focusing on Partners and Newbies

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. January 2016 Credit Card Fraud prevention is among the most time-sensitive and high-value of IT tasks. The databases

More information

Processing big data with modern applications: Hadoop as DWH backend at Pro7. Dr. Kathrin Spreyer Big data engineer

Processing big data with modern applications: Hadoop as DWH backend at Pro7. Dr. Kathrin Spreyer Big data engineer Processing big data with modern applications: Hadoop as DWH backend at Pro7 Dr. Kathrin Spreyer Big data engineer GridKa School Karlsruhe, 02.09.2014 Outline 1. Relational DWH 2. Data integration with

More information

Intro Cassandra. Adelaide Big Data Meetup.

Intro Cassandra. Adelaide Big Data Meetup. Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,

More information

Study of NoSQL Database Along With Security Comparison

Study of NoSQL Database Along With Security Comparison Study of NoSQL Database Along With Security Comparison Ankita A. Mall [1], Jwalant B. Baria [2] [1] Student, Computer Engineering Department, Government Engineering College, Modasa, Gujarat, India ank.fetr@gmail.com

More information

MariaDB MaxScale 2.0 and ColumnStore 1.0 for the Boston MySQL Meetup Group Jon Day, Solution Architect - MariaDB

MariaDB MaxScale 2.0 and ColumnStore 1.0 for the Boston MySQL Meetup Group Jon Day, Solution Architect - MariaDB MariaDB MaxScale 2.0 and ColumnStore 1.0 for the Boston MySQL Meetup Group Jon Day, Solution Architect - MariaDB 2016 MariaDB Corporation Ab 1 Tonight s Topics: MariaDB MaxScale 2.0 Currently in Beta MariaDB

More information

Aerospike Scales with Google Cloud Platform

Aerospike Scales with Google Cloud Platform Aerospike Scales with Google Cloud Platform PERFORMANCE TEST SHOW AEROSPIKE SCALES ON GOOGLE CLOUD Aerospike is an In-Memory NoSQL database and a fast Key Value Store commonly used for caching and by real-time

More information

Apache Kudu. Zbigniew Baranowski

Apache Kudu. Zbigniew Baranowski Apache Kudu Zbigniew Baranowski Intro What is KUDU? New storage engine for structured data (tables) does not use HDFS! Columnar store Mutable (insert, update, delete) Written in C++ Apache-licensed open

More information

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10 Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information