Orchestration of Data Lakes BigData Analytics and Integration. Sarma Sishta Brice Lambelet

Size: px
Start display at page:

Download "Orchestration of Data Lakes BigData Analytics and Integration. Sarma Sishta Brice Lambelet"

Transcription

1 Orchestration of Data Lakes BigData Analytics and Integration Sarma Sishta Brice Lambelet

2 Introduction

3 The Five Megatrends Driving Our Digitized World And Their Implications for Distributed Big Data Management Hyper Connectivity Super Computing Cloud Computing Smart World Cyber- Security Everybody has access Super computers power everywhere The cloud is where we compute Your fridge knows what you want for dinner High-powered security is now the norm 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 3

4 Common Types of Big Data New Types of Data: 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 4

5 How Big is the Need for Digital Enterprise? 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 5

6 Transforming Businesses of Any Kind into Digital Enterprise Employees Workforce Engageme nt Digital Core Customer Omni-Channel Customer Experience N E W Assets & Internet of Things Partners Supplier Collaboration T R A D I T I O N A L IT Budget 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 6

7 The Challenges with Distributed Big Data in Our Digital Economy With all this new data connecting us we should be sailing smoothly. Unfortunately we re drowning in our own data. Inefficient Data Processing Real-time drill-down interaction is impossible when data is distributed across thousands of nodes and processed in batches Lack Business Alignment Need to align business decisions to changing external market conditions by processing data in business systems with Hadoop Data Lakes together. Costly Managing of Big Data Extensive amounts of data start clogging business systems with data that can be more efficiently archived to less expensive systems 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 7

8 Businesses turn to Hadoop & other Big Data platforms for Low Cost & Large Scale Schema Flexibility All Data N E W T R A D I T I O N A L IT Budget 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 8

9 Background and Motivation Existing Solutions 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 9

10 BigData Architecture

11 Introduction Technical Architecture for Big Data In the last years more and more business scenarios are collecting and/or generating large amount of data (e.g. web applications, utility) and therefore scalable and reliable storage is needed. The data volume could be up to several Terabyte per day and several hundreds of Petabyte in total. The Hadoop framework offers a scalable and reliable storage framework. It is open source and has a large community which support the various projects. SAP is embracing Hadoop as part of the HANA platform. Data Volume: Data Lakes Data Variety: Data Velocity: 1 TB per day Data generated by 1 productive well 11,475 Production wells in Canada Social Media Sensor Data GPS Meter Data Real-time Event Streaming IT Devices *Example from Oil&Gas Industry Clickstream 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 11

12 Introduction Hadoop is based on work done by Google in the early 2000s Specifically, on papers describing the Google File System (GFS) published in 2003, and MapReduce published in Hadoop is Open Source (using the Apache license) Hundreds of contributors writing features, fixing bugs There are many related projects, applications and tools which are enhancing the Hadoop eco system. Data distribution and parallel processing framework The Hadoop frame work is based on a Shared nothing architecture. Computation happens where the data is stored, wherever possible. The goal is that nodes talk to each other as little as possible. Reliability and Fault Tolerance Data is replicated multiple times on the system for increased availability and reliability. If a node fails, the master will detect that failure and re-assign the work to a different node on the system SAP SE or an SAP affiliate company. All rights reserved. Customer 12

13 Introduction Find Balance between Performance and Storage Costs HANA Data Management Platform Data Replication Streaming Data Documents, Logs, SAP HANA Processing Engines (e.g. Spatial, R, Graph) Hadoop Processing Engines (e.g. Spark, Hive, ) HDFS HANA native BigData Smart Data Streaming NoSQL Graph Time Series Dynamic Tiering HANA & Hadoop SDA MapReduce Admin & Monitoring Hadoop Extension SAP HANA Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 13

14 Introduction Different Scenarios for HANA Data Management Platform: Part of SAP ILM Data Science S/4HANA, BWoH Dynamic Tiering Data Aging SAP HANA Processing Engines (e.g. Spatial, R, Graph) Hadoop HDFS Streaming Data Documents, Logs, Hive Hadoop Processing Engines HDFS Store historic data in Hadoop, Archive Integration with Information Lifecycle Management (ILM) Store volume data (e.g. streaming data, logs) in Hadoop ETL (Extract, Transform, Load) in Hadoop 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 14

15 Append Batch Processing Introduction Excursus: Lambda Architecture Realtime Stream Processing increment Realtime View New Data Speed Layer Query and Merge Master Data Set overwrite Precomputed Batch Views Batch Layer Serving Layer 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 15

16 Append Batch Processing Introduction Excursus: Lambda Architecture with SAP SAP HANA SAP Realtime HANA Smart Stream Data Processing Streaming increment SAP HANA Realtime In-Memory View Store New Data Speed Layer SAP HANA Query and Processing Merge Engines Master Data Set overwrite Precomputed HANA SAP Batch Vora Views Batch Layer Hadoop Serving Layer 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 16

17 HANA Data Platform Components Overview (1) Databases and Extensions Component Explanation SAP HANA In-Memory database and central component of the data platform SAP IQ A fast disk-based column-store database scaling up to petabytes of data SAP HANA Dynamic Tiering Accessing other data tiers within on SQL utilizing an Extended Store concept SAP HANA Vora In-Memory relational-mapper for Hadoop and Spark HANA Interfacing and Integration tools Component Explanation Smart Data Access (SDA) Accessing other databases and relational mappers from HANA Smart Data Integration (SDI) HANA integrated ETL/ replication solution (created from data services and SRS) Data Lifecycle Manager Automatic data movement to other tiers 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 17

18 HANA Data Platform Components Overview (2) Streaming Component Explanation Smart Data Streaming (SDS) Integrated HANA streaming solution Event Streaming Processor Standalone streaming solution Event Streaming Processor Lite Solution for decentralized streaming based on SQL Anywhere Standalone Replication Components Component Explanation Data Services Standalone replication solution for ETL SAP Landscape Transformation Realtime replication solution based on SAP NetWeaver ABAP SAP Replication Server Realtime replicaton solution based on log level database access Remote Data Sync Remote databses which are synced with an backend synchronous or asynchronously 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 18

19 SAP Vora & BigData Analysis

20 What s Stopping Us? The Digital Divide between Enterprise and Big Data ENTERPRISE BIG DATA Too Complex Too Slow Unable to Work Together SAP SE or an SAP affiliate company. All rights reserved. Customer Internal 20

21 Background and Motivation Existing Solutions 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 21

22 2014 SAP AG or an SAP affiliate company. All rights reserved. 22

23 In-Memory Data Fabric for Enterprise + Distributed Compute ALL IN-MEMORY CONSUME COMPUTE STORE HANA OLTP + OLAP Scale Up + Scale Out SAP HANA + Tiering Appliance TDI Federated Queries & Programming Model SAP Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Distributed compute - Massive Scale Out Distributed File System Network Storage Cloud Persistence Any Hardware SAP Vora Benefits Add functionality for enterprise applications Hierarchies OLAP modeling Boost SQL performance Federate access across SAP HANA and Hadoop Integrate tooling 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 23

24 Hadoop Background

25 Hadoop Background Cluster Overview 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 25

26 Required Components Software Requirements for installing Vora 1.4 Hadoop Distributions: HDP 2.5+ CDH 5.8+ MapR 5.1+ Spark 1.6 Java JDK 8.0 Supported OS Redhat Linux, Suse Linux, CentOS SAP GCC compatibility package ver 4.8 SAP HANA Spark Controller 2.0 (Optional) Zeppelin (Optional) Refer to the Vora Admin Guide at help.sap.com/hana_vora and SAP Note Prerequisites for Installing SAP Vora for the latest information SAP SE or an SAP affiliate company. All rights reserved. Customer 26

27 Vora Use Cases

28 SAP HANA Vora - Use Case Scenarios Standalone Outside In Inside Out 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 28

29 Customer Use Case Scenarios 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 29

30 HANA Connectivity

31 SDA and HANA Connectivity SAP HANA and Hadoop Integration 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 31

32 SAP HANA integration - voraodbc SAP HANA delivers voraodbc adapter from SPS12 onwards Introduced support for Vora in-memory Relational tables Kerberos support when connecting to SAP HANA Vora views can be consumed thru HANA using voraodbc Once HANA Wire is enabled you can use HANA Studio or Web IDE to create Virtual Tables on Vora and join that with SAP HANA 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 32

33 SDA and HANA Connectivity Bi-directional integration of SAP HANA and Vora voraodbc 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 33

34 SDA and HANA Connectivity Benefits of SAP Vora connectivy vs past SDA 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 34

35 Data Lifecycle Manager (DLM) for Hadoop as a tier Define a data aging strategy with DLM Leverage SAP HANA Dynamic Tiering (Warm-Store), Hadoop or SAP Sybase IQ in SAP HANA native use cases with a tool based approach to model aging rules on tables to displace aged data to HANA extended tables to optimize the memory footprint of data in SAP HANA. HOT-STORE (Column Table) WARM-STORE (Extended Table) DATA MOVEMENT * Data Lifecycle Manager SAP HANA 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 35

36 Vora Key Features

37 Key Features SAP Vora SQL Engine 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 37

38 Key Features SAP Vora SQL Engine 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 38

39 Key Features Distributed in-memory computing architecture 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 39

40 Key Features Hierarchy Processing A Hierarchy is really a Vora SQL extension used at the time the data is queried 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 40

41 Key Features SAP Vora Latest Innovations 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 41

42 Key Features Key features as of 1.4 Installed natively as a query engine on Hadoop/Spark nodes through Hadoop admin tools (Ambari, Cloudera Manager, MapR CS) and Vora Manager Support for most recent version of Apache Spark (v ) Leverage Spark core and Spark SQL APIs to deliver distributed queries Extend Spark to build enterprise analytics features like hierarchies Build SAP Vora tables on HDFS, MapR FS, Parquet, S3, and local file systems Bidirectional data virtualization between SAP HANA and Hadoop/Spark using SAP HANA Spark controller through SDA and Apache Spark Data Source API framework (data scientist workflows) SAP Vora Modeler to build OLAP cubes on Vora tables with facts, dimensions, and annotations Introduce new services like Dlog, Discovery (Consul) Improved performance through partitioned tables and co-located joins Extend Notebook (IDE) support to Jupyter and Zeppelin New processing engines like Time Series, Graph, DocStore, Diskstore 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 42

43 Customer Use Case Examples

44 Customer #1 Data Tiering Use Case HANA Index Server Hadoop Spark Processing Engines Spark SQL In-Memory Stores SDA (Virtual Table) HANA Spark Controller DLM Reads Data from HANA Vora Upload Table into Vora Dynamic Tiering XS Engine HDFS Data Lifecycle Manager Extended Storage (DLM) Files Files Files DLM Writes Data DLM Writes Data to ORC File 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 44

45 Response Time [s] Customer #1 Test Results Hadoop HANA / DT / Spark/ Vora DLM Customer Bill Store and Retrieval Store 40ms response time to read PDF Run aggregation query (~4 billion records): Move data from HANA to DT records per minute Load data via Sqoop into Hadoop 4min 24s to load 2.5 million records (single thread) Query Response Time [s] 360 Move data from HANA to Hadoop via VORA into HDFS 22 million records per minute 1min 10s (10 threads) Total of 6.2GB ORC files stored in HDFS against original size of 172GB. Compression Rate: 9 (3 copies in HDFS) HANA DynamicTiering Hadoop - Spark Hadoop - Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 45

46 Customer #2 Marketshare Comparison HANA View HDFS data stored in Hive CSV Mapping Data (HDFS) 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 46

47 Customer #2 Marketshare Comparison (Lumira) 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 47

48 Customer #2 Performance Benchmarks Query Vora (s) Hive/MR (s) Impala (s) SELECT SUM(DOLLAR), MASTER_SUBCATEGORY FROM SKYNET_MAPPED_COMPILED GROUP BY MASTER_SUBCATEGORY SELECT distinct(master_subcategory) FROM SKYNET_COMPILED_MAPPED SELECT count(*) from SKYNET_COMPILED_MAPPED select sum(dollar) from SKYNET_COMPILED_MAPPED where UPC = xyz' select * from SKYNET_COMPILED_MAPPED where UPC = xyz' SAP SE or an SAP affiliate company. All rights reserved. Customer 48

49 Vora Tools Overview Appendix

50 HANA Vora Hadoop Management Tools 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 50

51 HANA Vora Working with Tables and Views 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 51

52 HANA Vora Spark Shell with SAP Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 52

53 Vora Manager UI 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 53

54 Vora Modeling Tool Vora Tools use the Thriftserver to provide access to the Modeler under Perspectives: Data Browser SQL Editor Modeler 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 54

55 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 55

56 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 56

57 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 57

58 SAP Lumira Data Discovery on HANA and HANA Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 58

59 HANA Developer Application Development Business Analyst Data Discovery Data Engineer OLAP Modeling Data Scientist Statistical Modeling SAP HANA OLAP Modeler SAP Vora & Spark Hadoop 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 59

60 Thank You Contact information: Sarma Sishta Support Architect Contact information: Brice Lambelet Senior Support Engineer 2015 SAP SE or an SAP affiliate company. All rights reserved.

Analyze Big Data Faster and Store It Cheaper

Analyze Big Data Faster and Store It Cheaper Analyze Big Data Faster and Store It Cheaper Dr. Steve Pratt, CenterPoint Russell Hull, SAP Public About CenterPoint Energy, Inc. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas

More information

Capture Business Opportunities from Systems of Record and Systems of Innovation

Capture Business Opportunities from Systems of Record and Systems of Innovation Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

SAP NLS Update Roland Kramer, SAP EDW (BW/HANA), SAP SE PBS Customer Information Day, July 1st, 2016

SAP NLS Update Roland Kramer, SAP EDW (BW/HANA), SAP SE PBS Customer Information Day, July 1st, 2016 SAP NLS Update 2016 Roland Kramer, SAP EDW (BW/HANA), SAP SE PBS Customer Information Day, July 1st, 2016 Why SAP BW? It is all about three things to know SAPPHIRE 2016 - Quote from Hasso is there anything

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics Optimizing and Modeling SAP Business Analytics for SAP HANA Iver van de Zand, Business Analytics Early data warehouse projects LIMITATIONS ISSUES RAISED Data driven by acquisition, not architecture Too

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

HANA & Hadoop SAP FORUM. Javier Fernandez Leon February 2016

HANA & Hadoop SAP FORUM. Javier Fernandez Leon February 2016 Rumbo 2020 SAP FORUM HANA & Hadoop Javier Fernandez Leon February 2016 FTS INTERNAL Rumbo 2020 HANA & HADOOP Intro INDICE Challenges of distributed Big Data What is Apache Hadoop? Features Comparison HANA

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

data tiering in BW/4HANA and SAP BW on HANA Update 2017

data tiering in BW/4HANA and SAP BW on HANA Update 2017 data tiering in BW/4HANA and SAP BW on HANA Update 2017 Roland Kramer, PM EDW, SAP SE June 2017 Disclaimer This presentation outlines our general product direction and should not be relied on in making

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Digital Enterprise Platform for Live Business Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Rethinking the Future Competing in today s marketplace means leveraging

More information

Customer SAP BW/4HANA. Salvador Gimeno 7 December SAP SE or an SAP affiliate company. All rights reserved. Customer

Customer SAP BW/4HANA. Salvador Gimeno 7 December SAP SE or an SAP affiliate company. All rights reserved. Customer SAP BW/4HANA Customer Salvador Gimeno 7 December 2016 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 1 DISCLAIMER This presentation is not subject to your license agreement or any

More information

SAP BW/4HANA the next generation Data Warehouse

SAP BW/4HANA the next generation Data Warehouse SAP BW/4HANA the next generation Data Warehouse Lothar Henkes, VP Product Management SAP EDW (BW/HANA) July 25 th, 2017 Disclaimer This presentation is not subject to your license agreement or any other

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

PUBLIC SAP Vora Sizing Guide

PUBLIC SAP Vora Sizing Guide SAP Vora 2.0 Document Version: 1.1 2017-11-14 PUBLIC Content 1 Introduction to SAP Vora....3 1.1 System Architecture....5 2 Factors That Influence Performance....6 3 Sizing Fundamentals and Terminology....7

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

Data Warehousing in the Age of In-Memory Computing and Real-Time Analytics. Erich Schneider, Daniel Rutschmann June 2014

Data Warehousing in the Age of In-Memory Computing and Real-Time Analytics. Erich Schneider, Daniel Rutschmann June 2014 Data Warehousing in the Age of In-Memory Computing and Real-Time Analytics Erich Schneider, Daniel Rutschmann June 2014 Disclaimer This presentation outlines our general product direction and should not

More information

From the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI

From the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI From the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI Michael D Rutland, Sr SE, SAP / @TDWI, 9 October 2017, Savannah Disclaimer The information in this presentation is confidential

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

Hadoop Overview. Lars George Director EMEA Services

Hadoop Overview. Lars George Director EMEA Services Hadoop Overview Lars George Director EMEA Services 1 About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer HBase and Whirr O Reilly Author HBase The Definitive

More information

Big Data and Object Storage

Big Data and Object Storage Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

<Insert Picture Here> Introduction to Big Data Technology

<Insert Picture Here> Introduction to Big Data Technology Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

Scalable Tools - Part I Introduction to Scalable Tools

Scalable Tools - Part I Introduction to Scalable Tools Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

INNOVATION CAMP July 18 & 19, 2018 SAP HQ

INNOVATION CAMP July 18 & 19, 2018 SAP HQ SAP Digital Business Services INNOVATION CAMP July 18 & 19, 2018 SAP HQ Next Generation Data Management Digital Platform Track Mrinal Sarkar Support Architect Global CoE NA SAP Digital Business Services

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

The age of Big Data Big Data for Oracle Database Professionals

The age of Big Data Big Data for Oracle Database Professionals The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP Analyze Big Data Faster and Store it Cheaper Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP ABOUT CENTERPOINT ENERGY, INC. Ø Ø Ø Ø Ø Ø Publicly traded on New York Stock Exchange

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Data in the Cloud and Analytics in the Lake

Data in the Cloud and Analytics in the Lake Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)

More information

SAP HANA SAP HANA Introduction Description:

SAP HANA SAP HANA Introduction Description: SAP HANA SAP HANA Introduction Description: SAP HANA is a flexible, data-source-agnostic appliance that enables customers to analyze large volumes of SAP ERP data in real-time, avoiding the need to materialize

More information

Introduction to SAP HANA and what you can build on it. Jan 2013 Balaji Krishna Product Management, SAP HANA Platform

Introduction to SAP HANA and what you can build on it. Jan 2013 Balaji Krishna Product Management, SAP HANA Platform Introduction to SAP HANA and what you can build on it Jan 2013 Balaji Krishna Product Management, SAP HANA Platform Safe Harbor Statement The information in this presentation is confidential and proprietary

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

Data Platforms and Pattern Mining

Data Platforms and Pattern Mining Morteza Zihayat Data Platforms and Pattern Mining IBM Corporation About Myself IBM Software Group Big Data Scientist 4Platform Computing, IBM (2014 Now) PhD Candidate (2011 Now) 4Lassonde School of Engineering,

More information

Introducing SUSE Enterprise Storage 5

Introducing SUSE Enterprise Storage 5 Introducing SUSE Enterprise Storage 5 1 SUSE Enterprise Storage 5 SUSE Enterprise Storage 5 is the ideal solution for Compliance, Archive, Backup and Large Data. Customers can simplify and scale the storage

More information

What is Gluent? The Gluent Data Platform

What is Gluent? The Gluent Data Platform What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing

More information

INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)

INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?) PER STRICKER, THOMAS KALB 07.02.2017, HEART OF TEXAS DB2 USER GROUP, AUSTIN 08.02.2017, DB2 FORUM USER GROUP, DALLAS INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?) Copyright

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios

SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios Michael Dietz, Principal Solution Architect HANA Public Agenda SAP HANA Platform Usage Scenarios Potentials in Product Lifecycle Management

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights

More information

Databricks, an Introduction

Databricks, an Introduction Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

Data contains value and knowledge

Data contains value and knowledge Data contains value and knowledge What is the purpose of big data systems? To support analysis and knowledge discovery from very large amounts of data But to extract the knowledge data needs to be Stored

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

SAP Vora - AWS Marketplace Production Edition Reference Guide

SAP Vora - AWS Marketplace Production Edition Reference Guide SAP Vora - AWS Marketplace Production Edition Reference Guide 1. Introduction 2 1.1. SAP Vora 2 1.2. SAP Vora Production Edition in Amazon Web Services 2 1.2.1. Vora Cluster Composition 3 1.2.2. Ambari

More information

Technical Sheet NITRODB Time-Series Database

Technical Sheet NITRODB Time-Series Database Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS1390-2015 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC ABSTRACT The latest releases of SAS Data Integration Studio and DataFlux Data Management Platform provide

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source

Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC https://ignite.apache.org @apacheignite @dsetrakyan Agenda About In- Memory Computing Apache Ignite

More information

2013 SAP AG or an SAP ailiate company. All rights reserved. CIO Guide. SAP Solutions. How to Use Hadoop with Your SAP Software Landscape

2013 SAP AG or an SAP ailiate company. All rights reserved. CIO Guide. SAP Solutions. How to Use Hadoop with Your SAP Software Landscape SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Deinition A Conventional Disk-Based RDBMs

More information

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE

More information

Exam Questions

Exam Questions Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure

More information

The Technology of the Business Data Lake. Appendix

The Technology of the Business Data Lake. Appendix The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform

More information

SAP HANA ONLINE TRAINING. Modelling. Abstract This Course deals with SAP HANA Introduction, Advanced Modelling, and Data provision with SAP HANA

SAP HANA ONLINE TRAINING. Modelling. Abstract This Course deals with SAP HANA Introduction, Advanced Modelling, and Data provision with SAP HANA SAP HANA ONLINE TRAINING Modelling Abstract This Course deals with SAP HANA Introduction, Advanced Modelling, and Data provision with SAP HANA Arani Consulting Arani Consulting Email: Info@araniconsulting.com

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

The Reality of Qlik and Big Data. Chris Larsen Q3 2016

The Reality of Qlik and Big Data. Chris Larsen Q3 2016 The Reality of Qlik and Big Data Chris Larsen Q3 2016 Introduction Chris Larsen Sr Solutions Architect, Partner Engineering @Qlik Based in Lund, Sweden Primary Responsibility Advanced Analytics (and formerly

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Column Stores and HBase. Rui LIU, Maksim Hrytsenia Column Stores and HBase Rui LIU, Maksim Hrytsenia December 2017 Contents 1 Hadoop 2 1.1 Creation................................ 2 2 HBase 3 2.1 Column Store Database....................... 3 2.2 HBase

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from

More information

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information