Orchestration of Data Lakes BigData Analytics and Integration. Sarma Sishta Brice Lambelet
|
|
- Gwen Simon
- 6 years ago
- Views:
Transcription
1 Orchestration of Data Lakes BigData Analytics and Integration Sarma Sishta Brice Lambelet
2 Introduction
3 The Five Megatrends Driving Our Digitized World And Their Implications for Distributed Big Data Management Hyper Connectivity Super Computing Cloud Computing Smart World Cyber- Security Everybody has access Super computers power everywhere The cloud is where we compute Your fridge knows what you want for dinner High-powered security is now the norm 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 3
4 Common Types of Big Data New Types of Data: 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 4
5 How Big is the Need for Digital Enterprise? 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 5
6 Transforming Businesses of Any Kind into Digital Enterprise Employees Workforce Engageme nt Digital Core Customer Omni-Channel Customer Experience N E W Assets & Internet of Things Partners Supplier Collaboration T R A D I T I O N A L IT Budget 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 6
7 The Challenges with Distributed Big Data in Our Digital Economy With all this new data connecting us we should be sailing smoothly. Unfortunately we re drowning in our own data. Inefficient Data Processing Real-time drill-down interaction is impossible when data is distributed across thousands of nodes and processed in batches Lack Business Alignment Need to align business decisions to changing external market conditions by processing data in business systems with Hadoop Data Lakes together. Costly Managing of Big Data Extensive amounts of data start clogging business systems with data that can be more efficiently archived to less expensive systems 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 7
8 Businesses turn to Hadoop & other Big Data platforms for Low Cost & Large Scale Schema Flexibility All Data N E W T R A D I T I O N A L IT Budget 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 8
9 Background and Motivation Existing Solutions 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 9
10 BigData Architecture
11 Introduction Technical Architecture for Big Data In the last years more and more business scenarios are collecting and/or generating large amount of data (e.g. web applications, utility) and therefore scalable and reliable storage is needed. The data volume could be up to several Terabyte per day and several hundreds of Petabyte in total. The Hadoop framework offers a scalable and reliable storage framework. It is open source and has a large community which support the various projects. SAP is embracing Hadoop as part of the HANA platform. Data Volume: Data Lakes Data Variety: Data Velocity: 1 TB per day Data generated by 1 productive well 11,475 Production wells in Canada Social Media Sensor Data GPS Meter Data Real-time Event Streaming IT Devices *Example from Oil&Gas Industry Clickstream 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 11
12 Introduction Hadoop is based on work done by Google in the early 2000s Specifically, on papers describing the Google File System (GFS) published in 2003, and MapReduce published in Hadoop is Open Source (using the Apache license) Hundreds of contributors writing features, fixing bugs There are many related projects, applications and tools which are enhancing the Hadoop eco system. Data distribution and parallel processing framework The Hadoop frame work is based on a Shared nothing architecture. Computation happens where the data is stored, wherever possible. The goal is that nodes talk to each other as little as possible. Reliability and Fault Tolerance Data is replicated multiple times on the system for increased availability and reliability. If a node fails, the master will detect that failure and re-assign the work to a different node on the system SAP SE or an SAP affiliate company. All rights reserved. Customer 12
13 Introduction Find Balance between Performance and Storage Costs HANA Data Management Platform Data Replication Streaming Data Documents, Logs, SAP HANA Processing Engines (e.g. Spatial, R, Graph) Hadoop Processing Engines (e.g. Spark, Hive, ) HDFS HANA native BigData Smart Data Streaming NoSQL Graph Time Series Dynamic Tiering HANA & Hadoop SDA MapReduce Admin & Monitoring Hadoop Extension SAP HANA Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 13
14 Introduction Different Scenarios for HANA Data Management Platform: Part of SAP ILM Data Science S/4HANA, BWoH Dynamic Tiering Data Aging SAP HANA Processing Engines (e.g. Spatial, R, Graph) Hadoop HDFS Streaming Data Documents, Logs, Hive Hadoop Processing Engines HDFS Store historic data in Hadoop, Archive Integration with Information Lifecycle Management (ILM) Store volume data (e.g. streaming data, logs) in Hadoop ETL (Extract, Transform, Load) in Hadoop 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 14
15 Append Batch Processing Introduction Excursus: Lambda Architecture Realtime Stream Processing increment Realtime View New Data Speed Layer Query and Merge Master Data Set overwrite Precomputed Batch Views Batch Layer Serving Layer 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 15
16 Append Batch Processing Introduction Excursus: Lambda Architecture with SAP SAP HANA SAP Realtime HANA Smart Stream Data Processing Streaming increment SAP HANA Realtime In-Memory View Store New Data Speed Layer SAP HANA Query and Processing Merge Engines Master Data Set overwrite Precomputed HANA SAP Batch Vora Views Batch Layer Hadoop Serving Layer 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 16
17 HANA Data Platform Components Overview (1) Databases and Extensions Component Explanation SAP HANA In-Memory database and central component of the data platform SAP IQ A fast disk-based column-store database scaling up to petabytes of data SAP HANA Dynamic Tiering Accessing other data tiers within on SQL utilizing an Extended Store concept SAP HANA Vora In-Memory relational-mapper for Hadoop and Spark HANA Interfacing and Integration tools Component Explanation Smart Data Access (SDA) Accessing other databases and relational mappers from HANA Smart Data Integration (SDI) HANA integrated ETL/ replication solution (created from data services and SRS) Data Lifecycle Manager Automatic data movement to other tiers 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 17
18 HANA Data Platform Components Overview (2) Streaming Component Explanation Smart Data Streaming (SDS) Integrated HANA streaming solution Event Streaming Processor Standalone streaming solution Event Streaming Processor Lite Solution for decentralized streaming based on SQL Anywhere Standalone Replication Components Component Explanation Data Services Standalone replication solution for ETL SAP Landscape Transformation Realtime replication solution based on SAP NetWeaver ABAP SAP Replication Server Realtime replicaton solution based on log level database access Remote Data Sync Remote databses which are synced with an backend synchronous or asynchronously 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 18
19 SAP Vora & BigData Analysis
20 What s Stopping Us? The Digital Divide between Enterprise and Big Data ENTERPRISE BIG DATA Too Complex Too Slow Unable to Work Together SAP SE or an SAP affiliate company. All rights reserved. Customer Internal 20
21 Background and Motivation Existing Solutions 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 21
22 2014 SAP AG or an SAP affiliate company. All rights reserved. 22
23 In-Memory Data Fabric for Enterprise + Distributed Compute ALL IN-MEMORY CONSUME COMPUTE STORE HANA OLTP + OLAP Scale Up + Scale Out SAP HANA + Tiering Appliance TDI Federated Queries & Programming Model SAP Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Vora Distributed compute - Massive Scale Out Distributed File System Network Storage Cloud Persistence Any Hardware SAP Vora Benefits Add functionality for enterprise applications Hierarchies OLAP modeling Boost SQL performance Federate access across SAP HANA and Hadoop Integrate tooling 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 23
24 Hadoop Background
25 Hadoop Background Cluster Overview 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 25
26 Required Components Software Requirements for installing Vora 1.4 Hadoop Distributions: HDP 2.5+ CDH 5.8+ MapR 5.1+ Spark 1.6 Java JDK 8.0 Supported OS Redhat Linux, Suse Linux, CentOS SAP GCC compatibility package ver 4.8 SAP HANA Spark Controller 2.0 (Optional) Zeppelin (Optional) Refer to the Vora Admin Guide at help.sap.com/hana_vora and SAP Note Prerequisites for Installing SAP Vora for the latest information SAP SE or an SAP affiliate company. All rights reserved. Customer 26
27 Vora Use Cases
28 SAP HANA Vora - Use Case Scenarios Standalone Outside In Inside Out 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 28
29 Customer Use Case Scenarios 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 29
30 HANA Connectivity
31 SDA and HANA Connectivity SAP HANA and Hadoop Integration 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 31
32 SAP HANA integration - voraodbc SAP HANA delivers voraodbc adapter from SPS12 onwards Introduced support for Vora in-memory Relational tables Kerberos support when connecting to SAP HANA Vora views can be consumed thru HANA using voraodbc Once HANA Wire is enabled you can use HANA Studio or Web IDE to create Virtual Tables on Vora and join that with SAP HANA 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 32
33 SDA and HANA Connectivity Bi-directional integration of SAP HANA and Vora voraodbc 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 33
34 SDA and HANA Connectivity Benefits of SAP Vora connectivy vs past SDA 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 34
35 Data Lifecycle Manager (DLM) for Hadoop as a tier Define a data aging strategy with DLM Leverage SAP HANA Dynamic Tiering (Warm-Store), Hadoop or SAP Sybase IQ in SAP HANA native use cases with a tool based approach to model aging rules on tables to displace aged data to HANA extended tables to optimize the memory footprint of data in SAP HANA. HOT-STORE (Column Table) WARM-STORE (Extended Table) DATA MOVEMENT * Data Lifecycle Manager SAP HANA 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 35
36 Vora Key Features
37 Key Features SAP Vora SQL Engine 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 37
38 Key Features SAP Vora SQL Engine 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 38
39 Key Features Distributed in-memory computing architecture 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 39
40 Key Features Hierarchy Processing A Hierarchy is really a Vora SQL extension used at the time the data is queried 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 40
41 Key Features SAP Vora Latest Innovations 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 41
42 Key Features Key features as of 1.4 Installed natively as a query engine on Hadoop/Spark nodes through Hadoop admin tools (Ambari, Cloudera Manager, MapR CS) and Vora Manager Support for most recent version of Apache Spark (v ) Leverage Spark core and Spark SQL APIs to deliver distributed queries Extend Spark to build enterprise analytics features like hierarchies Build SAP Vora tables on HDFS, MapR FS, Parquet, S3, and local file systems Bidirectional data virtualization between SAP HANA and Hadoop/Spark using SAP HANA Spark controller through SDA and Apache Spark Data Source API framework (data scientist workflows) SAP Vora Modeler to build OLAP cubes on Vora tables with facts, dimensions, and annotations Introduce new services like Dlog, Discovery (Consul) Improved performance through partitioned tables and co-located joins Extend Notebook (IDE) support to Jupyter and Zeppelin New processing engines like Time Series, Graph, DocStore, Diskstore 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 42
43 Customer Use Case Examples
44 Customer #1 Data Tiering Use Case HANA Index Server Hadoop Spark Processing Engines Spark SQL In-Memory Stores SDA (Virtual Table) HANA Spark Controller DLM Reads Data from HANA Vora Upload Table into Vora Dynamic Tiering XS Engine HDFS Data Lifecycle Manager Extended Storage (DLM) Files Files Files DLM Writes Data DLM Writes Data to ORC File 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 44
45 Response Time [s] Customer #1 Test Results Hadoop HANA / DT / Spark/ Vora DLM Customer Bill Store and Retrieval Store 40ms response time to read PDF Run aggregation query (~4 billion records): Move data from HANA to DT records per minute Load data via Sqoop into Hadoop 4min 24s to load 2.5 million records (single thread) Query Response Time [s] 360 Move data from HANA to Hadoop via VORA into HDFS 22 million records per minute 1min 10s (10 threads) Total of 6.2GB ORC files stored in HDFS against original size of 172GB. Compression Rate: 9 (3 copies in HDFS) HANA DynamicTiering Hadoop - Spark Hadoop - Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 45
46 Customer #2 Marketshare Comparison HANA View HDFS data stored in Hive CSV Mapping Data (HDFS) 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 46
47 Customer #2 Marketshare Comparison (Lumira) 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 47
48 Customer #2 Performance Benchmarks Query Vora (s) Hive/MR (s) Impala (s) SELECT SUM(DOLLAR), MASTER_SUBCATEGORY FROM SKYNET_MAPPED_COMPILED GROUP BY MASTER_SUBCATEGORY SELECT distinct(master_subcategory) FROM SKYNET_COMPILED_MAPPED SELECT count(*) from SKYNET_COMPILED_MAPPED select sum(dollar) from SKYNET_COMPILED_MAPPED where UPC = xyz' select * from SKYNET_COMPILED_MAPPED where UPC = xyz' SAP SE or an SAP affiliate company. All rights reserved. Customer 48
49 Vora Tools Overview Appendix
50 HANA Vora Hadoop Management Tools 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 50
51 HANA Vora Working with Tables and Views 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 51
52 HANA Vora Spark Shell with SAP Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 52
53 Vora Manager UI 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 53
54 Vora Modeling Tool Vora Tools use the Thriftserver to provide access to the Modeler under Perspectives: Data Browser SQL Editor Modeler 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 54
55 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 55
56 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 56
57 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 57
58 SAP Lumira Data Discovery on HANA and HANA Vora 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 58
59 HANA Developer Application Development Business Analyst Data Discovery Data Engineer OLAP Modeling Data Scientist Statistical Modeling SAP HANA OLAP Modeler SAP Vora & Spark Hadoop 2015 SAP SE or an SAP affiliate company. All rights reserved. Internal 59
60 Thank You Contact information: Sarma Sishta Support Architect Contact information: Brice Lambelet Senior Support Engineer 2015 SAP SE or an SAP affiliate company. All rights reserved.
Analyze Big Data Faster and Store It Cheaper
Analyze Big Data Faster and Store It Cheaper Dr. Steve Pratt, CenterPoint Russell Hull, SAP Public About CenterPoint Energy, Inc. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationSAP NLS Update Roland Kramer, SAP EDW (BW/HANA), SAP SE PBS Customer Information Day, July 1st, 2016
SAP NLS Update 2016 Roland Kramer, SAP EDW (BW/HANA), SAP SE PBS Customer Information Day, July 1st, 2016 Why SAP BW? It is all about three things to know SAPPHIRE 2016 - Quote from Hasso is there anything
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationOptimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics
Optimizing and Modeling SAP Business Analytics for SAP HANA Iver van de Zand, Business Analytics Early data warehouse projects LIMITATIONS ISSUES RAISED Data driven by acquisition, not architecture Too
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationHANA & Hadoop SAP FORUM. Javier Fernandez Leon February 2016
Rumbo 2020 SAP FORUM HANA & Hadoop Javier Fernandez Leon February 2016 FTS INTERNAL Rumbo 2020 HANA & HADOOP Intro INDICE Challenges of distributed Big Data What is Apache Hadoop? Features Comparison HANA
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationdata tiering in BW/4HANA and SAP BW on HANA Update 2017
data tiering in BW/4HANA and SAP BW on HANA Update 2017 Roland Kramer, PM EDW, SAP SE June 2017 Disclaimer This presentation outlines our general product direction and should not be relied on in making
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationWhat is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?
Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation
More informationDigital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU
Digital Enterprise Platform for Live Business Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Rethinking the Future Competing in today s marketplace means leveraging
More informationCustomer SAP BW/4HANA. Salvador Gimeno 7 December SAP SE or an SAP affiliate company. All rights reserved. Customer
SAP BW/4HANA Customer Salvador Gimeno 7 December 2016 2016 SAP SE or an SAP affiliate company. All rights reserved. Customer 1 DISCLAIMER This presentation is not subject to your license agreement or any
More informationSAP BW/4HANA the next generation Data Warehouse
SAP BW/4HANA the next generation Data Warehouse Lothar Henkes, VP Product Management SAP EDW (BW/HANA) July 25 th, 2017 Disclaimer This presentation is not subject to your license agreement or any other
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationPUBLIC SAP Vora Sizing Guide
SAP Vora 2.0 Document Version: 1.1 2017-11-14 PUBLIC Content 1 Introduction to SAP Vora....3 1.1 System Architecture....5 2 Factors That Influence Performance....6 3 Sizing Fundamentals and Terminology....7
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationData Warehousing in the Age of In-Memory Computing and Real-Time Analytics. Erich Schneider, Daniel Rutschmann June 2014
Data Warehousing in the Age of In-Memory Computing and Real-Time Analytics Erich Schneider, Daniel Rutschmann June 2014 Disclaimer This presentation outlines our general product direction and should not
More informationFrom the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI
From the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI Michael D Rutland, Sr SE, SAP / @TDWI, 9 October 2017, Savannah Disclaimer The information in this presentation is confidential
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationHadoop Overview. Lars George Director EMEA Services
Hadoop Overview Lars George Director EMEA Services 1 About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer HBase and Whirr O Reilly Author HBase The Definitive
More informationBig Data and Object Storage
Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More information<Insert Picture Here> Introduction to Big Data Technology
Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationBig Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationScalable Tools - Part I Introduction to Scalable Tools
Scalable Tools - Part I Introduction to Scalable Tools Adisak Sukul, Ph.D., Lecturer, Department of Computer Science, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/mbds2018/ Scalable Tools session
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationINNOVATION CAMP July 18 & 19, 2018 SAP HQ
SAP Digital Business Services INNOVATION CAMP July 18 & 19, 2018 SAP HQ Next Generation Data Management Digital Platform Track Mrinal Sarkar Support Architect Global CoE NA SAP Digital Business Services
More informationHortonworks Data Platform
Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationAnalyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP
Analyze Big Data Faster and Store it Cheaper Dominick Huang CenterPoint Energy Henry Le - Utegra8on Russell Hull - SAP ABOUT CENTERPOINT ENERGY, INC. Ø Ø Ø Ø Ø Ø Publicly traded on New York Stock Exchange
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationSAP HANA SAP HANA Introduction Description:
SAP HANA SAP HANA Introduction Description: SAP HANA is a flexible, data-source-agnostic appliance that enables customers to analyze large volumes of SAP ERP data in real-time, avoiding the need to materialize
More informationIntroduction to SAP HANA and what you can build on it. Jan 2013 Balaji Krishna Product Management, SAP HANA Platform
Introduction to SAP HANA and what you can build on it Jan 2013 Balaji Krishna Product Management, SAP HANA Platform Safe Harbor Statement The information in this presentation is confidential and proprietary
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationData Platforms and Pattern Mining
Morteza Zihayat Data Platforms and Pattern Mining IBM Corporation About Myself IBM Software Group Big Data Scientist 4Platform Computing, IBM (2014 Now) PhD Candidate (2011 Now) 4Lassonde School of Engineering,
More informationIntroducing SUSE Enterprise Storage 5
Introducing SUSE Enterprise Storage 5 1 SUSE Enterprise Storage 5 SUSE Enterprise Storage 5 is the ideal solution for Compliance, Archive, Backup and Large Data. Customers can simplify and scale the storage
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationINITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)
PER STRICKER, THOMAS KALB 07.02.2017, HEART OF TEXAS DB2 USER GROUP, AUSTIN 08.02.2017, DB2 FORUM USER GROUP, DALLAS INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?) Copyright
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationSAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios
SAP HANA as an Accelerator for PLM Processes HANA Basics and Scenarios Michael Dietz, Principal Solution Architect HANA Public Agenda SAP HANA Platform Usage Scenarios Potentials in Product Lifecycle Management
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationSAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS
SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationHadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)
Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:
More informationData contains value and knowledge
Data contains value and knowledge What is the purpose of big data systems? To support analysis and knowledge discovery from very large amounts of data But to extract the knowledge data needs to be Stored
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationSAP Vora - AWS Marketplace Production Edition Reference Guide
SAP Vora - AWS Marketplace Production Edition Reference Guide 1. Introduction 2 1.1. SAP Vora 2 1.2. SAP Vora Production Edition in Amazon Web Services 2 1.2.1. Vora Cluster Composition 3 1.2.2. Ambari
More informationTechnical Sheet NITRODB Time-Series Database
Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes
More informationWhat's New in SAS Data Management
Paper SAS1390-2015 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC ABSTRACT The latest releases of SAS Data Integration Studio and DataFlux Data Management Platform provide
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationApache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC https://ignite.apache.org @apacheignite @dsetrakyan Agenda About In- Memory Computing Apache Ignite
More information2013 SAP AG or an SAP ailiate company. All rights reserved. CIO Guide. SAP Solutions. How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Deinition A Conventional Disk-Based RDBMs
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationThe Technology of the Business Data Lake. Appendix
The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform
More informationSAP HANA ONLINE TRAINING. Modelling. Abstract This Course deals with SAP HANA Introduction, Advanced Modelling, and Data provision with SAP HANA
SAP HANA ONLINE TRAINING Modelling Abstract This Course deals with SAP HANA Introduction, Advanced Modelling, and Data provision with SAP HANA Arani Consulting Arani Consulting Email: Info@araniconsulting.com
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationThe Reality of Qlik and Big Data. Chris Larsen Q3 2016
The Reality of Qlik and Big Data Chris Larsen Q3 2016 Introduction Chris Larsen Sr Solutions Architect, Partner Engineering @Qlik Based in Lund, Sweden Primary Responsibility Advanced Analytics (and formerly
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationColumn Stores and HBase. Rui LIU, Maksim Hrytsenia
Column Stores and HBase Rui LIU, Maksim Hrytsenia December 2017 Contents 1 Hadoop 2 1.1 Creation................................ 2 2 HBase 3 2.1 Column Store Database....................... 3 2.2 HBase
More informationTaming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems
1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationSub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman
Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationIBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store
IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More information