HOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS. Mark Brooks - Principal System Kinetica May 09, 2017
|
|
- Geraldine Bridges
- 6 years ago
- Views:
Transcription
1 HOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS Mark Brooks - Principal System Kinetica May 09, 2017
2 The Challenge: How to maintain analytic performance while dealing with: Larger data volumes Streaming data with minimal end-to-end latency Ad-hoc drill down (you can t pre-aggregate everything) 2
3 Architectural and Design Approaches 1. One database to rule them all 2. SQL on Hadoop (or directly on the Data Lake) 3. Data Lake + NoSQL + Spark + Search + Cache + 4. Lambda Architecture 5. Kappa Architecture 6. Next generation hardware acceleration 3
4 One Database To Rule Them All 4
5 SQL on a Data Lake Credit: 5
6 Hadoop + NoSQL + Search + Memory Cache + Credit: Matt Turck - 6
7 Lambda Architecture Credit: Nathan Marz James Kinley 7
8 Lambda Architecture Credit: James Kinley 7
9 Kappa Architecture Credit: Jay Kreps 8
10 Kappa Architecture Stream processing systems already have a notion of parallelism; why not just handle reprocessing by increasing the parallelism and replaying history very, very fast? Credit: Jay Kreps 8
11 Next Generation Hardware Acceleration Consider a system with these characteristics: Horizontally Scalable Low end-to-end latency Powerful enough to not require pre-aggregation This is now possible Credit: Jay Kreps 8
12 GPU Accelerated Compute s AT SCALE PROCESSING BECOMES THE BOTTLENECK DATA WAREHOUSE DISTRIBUTED STORAGE AFFORDABLE MEMORY GPU ACCELERATED COMPUTE RDBMS & Data Warehouse technologies enable organizations to store and analyze growing volumes of data Hadoop and MapReduce enables distributed storage and processing across multiple machines. Affordable memory allows for faster data read and write. HANA, MemSQL, & Exadata provide faster analytics. GPU cores bulk process tasks in parallel - far more efficient for many data-intensive tasks than CPUs which process those tasks linearly. on high performance machines, but at high cost. Storing massive volumes of data becomes more affordable, but performance is slow 12
13 Kinetica: Core ANALYTICS DATABASE ACCELERATED BY GPUs Columnar in-memory database Data available much like a traditional RDBMS rows, columns Data held in-memory; persisted to disk Interact with Kinetica through its native REST API, Java, Python, JavaScript, NodeJS, C++, SQL, etc as well as with various connectors HTTP Head Node GPU Accelerated Columnar In-memory Database A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Disk Commodity Hardware w/ GPUs KINETICA Native GIS & IP address object support VERY FAST: Ideal for OLAP workloads Typical hardware setup: 256GB - 1TB memory with 2-4 GPUs per node. 13
14 Multi-Head Ingest and Scale-Out Architecture ON-DEMAND SCALE OUT HTTP Head Node HTTP Head Node HTTP Head Node Columnar In-memory Columnar In-memory Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A1 B1 C1 A2 B2 C2 A3 B3 C3 A1 B1 C1 A2 B2 C2 A3 B3 C3 + A4 B4 C4 A4 B4 C4 A4 B4 C4 Disk Disk Disk Commodity Hardware w/ GPUs Commodity Hardware w/ GPUs Commodity Hardware w/ GPUs MULTI-HEAD INGEST 19
15 Real-Time Data Handlers for Structured & Unstructured Data Java API APIs C++ API VISUALIZATION via ODBC/JDBC GEOSPATIAL CAPABILITIES Geometric Objects WMS JavaScript API Node.js API Tracks WKT REST API Python API Geospatial Endpoints OPEN SOURCE INTEGRATION HTTP Head Node HTTP Head Node HTTP Head Node HTTP Head Node Apache NiFi Apache Kafka Apache Spark Apache Storm Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 OTHER INTEGRATION Message Queues ETL Tools Disk Commodity Hardware w/ GPUs Disk Commodity Hardware w/ GPUs Disk Commodity Hardware w/ GPUs Disk Commodity Hardware w/ GPUs Streaming Tools KINETICA CLUSTER On-Demand Scale 20
16 Parallel Ingest Provides High Performance Streaming PARALLEL INGEST 1 NODE (1TB/2GPU) 1 NODE (1TB/2GPU) 1 NODE (1TB/2GPU) Each node of the system can share the task of data ingest, provides more and faster throughput. It can be made faster simply by adding more nodes. No compute is used on ingest! 16
17 Speed Layer for the Data Lake Parallel ingestion of events Kinetica is speed layer with realtime analytic capabilities Amazon Kinesis Put, get, scan ANALYSTS HDFS for archival store Much looser coupling than traditional lambda architecture Batch mode Spark or MR jobs can push data to Kinetica as needed for fast query on data loaded from the data lake EVENTS MESSAGE BROKERS Kinetica Connectors STREAM PROCESSING Parallel Ingestion Execute complex analytics on the fly MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS HDFS / AWS S3 / GCS / Azure Data Lake 17
18 Real-Time, Advanced Analytics, Speed Layer for Teradata or Oracle Parallel ingestion of events Lambda-type architecture for Teradata or Oracle Kinetica is speed layer with near-real-time analytic capabilities Converge Machine Learning, streaming and location analytics and fast Query and Analytics with Kinetica and RDBMS DATA IN MOTION AND REST Amazon Kinesis Kinetica Connectors STREAM / ETL PROCESSING Fast GPU accelerated, in- Memory Database Converge ML, AI, Streaming MOBILE USERS ANALYSTS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS DATA WAREHOUSE / TRANSACTIONAL 18
19 Advanced In-Database Analytics ORCHESTRATION LAYER WITH USER-DEFINED FUNCTIONS (UDFs) 1. User-defined functions (UDFs) can receive table data, do arbitrary computations, and save output to a separate table in a distributed manner. 2. UDFs have direct access to CUDA APIs enables compute-to-grid analytics for logic deployed within Kinetica. 3. Works with custom code, or packaged code. Opens the way for machine learning/artificial intelligence libraries such as TensorFlow, BIDMach, Caffe and Torch to work on data directly within Kinetica. 4. Available now with C++ & Java bindings. PHYSICAL / VIRTUAL SERVER Table A Table B Table n Table C Proc Server UDF_A UDF_B UDF_n CUDA Libraries GPU n number of Kinetica servers Data returned to output table for further analysis /exec/proc/udf_a/ UDFs exposed from RESTful endpoint 19
20 Kinetica Architecture STREAMING DATA ETL / STREAM PROCESSING PARALLEL INGEST ON DEMAND SCALE OUT + 1TB MEM / 2 GPU CARDS Native APIs SQL Geospatial WMS Custom Connectors BI / GIS / APPS KINETICA REVEAL BI DASHBOARDS UDFs ERP / CRM / TRANSACTIONAL DATA In-Database Processing CUSTOM LOGIC BIDMach ML Libs CUSTOM APPS & GEOSPATIAL 20
21 AI & BI on One GPU-Accelerated Database BUSINESS INTELLIGENCE CUSTOM APPLICATIONS HIGH FIDELITY GEOSPATIAL PIPELINE SQL ODBC / JDBC Native REST API WMS HIGH PERFORMANCE ANALYTICS DATABASE BUSINESS USERS BIDMach UDF UDF UDF DATA SCIENTISTS / DEVELOPERS MACHINE LEARNING & DEEP LEARNING GPU-ACCELERATED DATA SCIENCE PREDICTIVE MODELS e.g. Risk Management, Sales Volume, Fraud. 21
22 50-100x Faster on Queries with Large Datasets WHEN COMPARED TO LEADING IN-MEMORY ALTERNATIVES Large retailer tested complex SQL queries on 3 years of retail data (150bn rows) 10 node Kinetica cluster against 30TB+ cluster from next best alternative GPU is able to perform many instructions in parallel. Huge performance gains on aggregations, group bys, joins, etc. Kinetica sustained ingest of 1.3bn objects/minute with 70 attributes per row SELECT (Q10) GROUP BY (Q5) SUM (Q1) Kinetica Leading In-Memory DB More Details 22
23 Distributed Geospatial Pipeline NATIVE VISUALIZATION IS DESIGNED FOR FAST MOVING, LOCATION-BASED DATA Native Geospatial Object Types Points, Shapes, Tracks, Labels Native Geospatial Functions Filters (by area, by series, by geometry, etc.) Aggregation (histograms) Geofencing - triggers Video generation (based on dates/times) Generate Map Overlay Imagery (via WMS) Rasterize points Style based on attributes (class-break) Heat maps 23
24 Full-Text Search Kinetica includes powerful text search functionality, including : Rain Tire ~5 "Union Tranquility"~10 Exact Phrases Boolean AND / OR Wildcards Grouping Fuzzy Search (Damerau-Levenshtein optimal string alignment algorithm) N-Gram Term Proximity Search Term Boosting Relevance Prioritization [100 TO 200] 22
25 CASE STUDY : LOCATION BASED ANALYTICS INTELLIGENCE: US Army - INSCOM US Army s in-memory computational engine for any data with a geospatial or temporal attribute for a major joint cloud initiative within the Intelligence Community (IC ITE). U.S Army INSCOM Shift from Oracle to GPUdb Intel analysts are able to conduct near real-time analytics and fuse SIGINT, ISR, and GEOINT streaming big data feeds and visualize in a web browser. First time in history military analysts are able to query and visualize billions to trillions of near realtime objects in a production environment. Major executive military and congressional visibility. GPUdb (20ms) 42x Lower Space 28x Lower Cost 38x Lower Power Cost 1 GPUdb server vs 42 servers with Oracle 10gR2 (2011) Oracle Spatial (92 Minutes) 24
26 CASE STUDY : LOCATION BASED ANALYTICS LOGISTICS: Workforce optimization USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year. DISTRIBUTED ANALYSIS USPS parallel cluster is able to serve up to 15,000 simultaneous sessions, providing the service s managers and analysts with the capability to instantly analyze their areas of responsibility via dashboards. AT SCALE With 200,000 USPS devices emitting location once every minute, that amounts to more than a quarter billion events captured and analyzed daily tracked on 10 nodes. 25
27 CASE STUDY : LOCATION BASED ANALYTICS LOGISTICS & FLEET MANAGEMENT LARGE RETAILER Kinetica enables agile tracking of shipments to assist store managers for tracking of inventory and arrival times. Visibility and tracking of deliveries & trucks for store managers ETA & Notifications Provide estimated time of delivery, notifications and custom location based alerting Route Optimization based on truck size, and if cargo is perishable or contains hazardous materials. 27
28 CASE STUDY : ADVANCED IN-DATABASE ANALYTICS RISK MANAGEMENT MULTINATIONAL BANK Large financial institution moves counterparty risk analysis from overnight to real-time. Data collected by XVA library which computes risk metrics for each trade Risk computations are becoming more complex and computationally heavy. xva analysis needs to project years into the future. Kinetica enables banks to move from batch/overnight analysis to a streaming/real-time system for flexible real-time monitoring by traders, auditors and management. 28
29 Scale Out on Industry Standard Hardware Kinetica typically results in 1 10 hardware costs of standard in-memory databases. Runs on industry standard servers, 512GB memory with GPUs (ex. NVIDIA K80) IN THE CLOUD WITH: COMING SOON: CERTIFIED ON PREMISE WITH: 29
30 Stop by Booth #431 and Get Your Free T-shirt
Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationOPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS
OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More informationTop Five Reasons for Data Warehouse Modernization Philip Russom
Top Five Reasons for Data Warehouse Modernization Philip Russom TDWI Research Director for Data Management May 28, 2014 Sponsor Speakers Philip Russom TDWI Research Director, Data Management Steve Sarsfield
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationAccelerate your SAS analytics to take the gold
Accelerate your SAS analytics to take the gold A White Paper by Fuzzy Logix Whatever the nature of your business s analytics environment we are sure you are under increasing pressure to deliver more: more
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationPřehled novinek v SQL Server 2016
Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing
More informationStreaming Integration and Intelligence For Automating Time Sensitive Events
Streaming Integration and Intelligence For Automating Time Sensitive Events Ted Fish Director Sales, Midwest ted@striim.com 312-330-4929 Striim Executive Summary Delivering Data for Time Sensitive Processes
More informationMicrosoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud
Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationBringing Data to Life
Bringing Data to Life Data management and Visualization Techniques Benika Hall Rob Harrison Corporate Model Risk March 16, 2018 Introduction Benika Hall Analytic Consultant Wells Fargo - Corporate Model
More informationBig Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationApril Copyright 2013 Cloudera Inc. All rights reserved.
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on
More informationMicrosoft Perform Data Engineering on Microsoft Azure HDInsight.
Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse
More informationNetezza The Analytics Appliance
Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationIn-Memory Computing EXASOL Evaluation
In-Memory Computing EXASOL Evaluation 1. Purpose EXASOL (http://www.exasol.com/en/) provides an in-memory computing solution for data analytics. It combines inmemory, columnar storage and massively parallel
More information8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara
Week 1-B-0 Week 1-B-1 CS535 BIG DATA FAQs Slides are available on the course web Wait list Term project topics PART 0. INTRODUCTION 2. DATA PROCESSING PARADIGMS FOR BIG DATA Sangmi Lee Pallickara Computer
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationREGULATORY REPORTING FOR FINANCIAL SERVICES
REGULATORY REPORTING FOR FINANCIAL SERVICES Gordon Hughes, Global Sales Director, Intel Corporation Sinan Baskan, Solutions Director, Financial Services, MarkLogic Corporation Many regulators and regulations
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationWHITEPAPER. The Lambda Architecture Simplified
WHITEPAPER The Lambda Architecture Simplified DATE: April 2016 A Brief History of the Lambda Architecture The surest sign you have invented something worthwhile is when several other people invent it too.
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More informationUSERS CONFERENCE Copyright 2016 OSIsoft, LLC
Bridge IT and OT with a process data warehouse Presented by Matt Ziegler, OSIsoft Complexity Problem Complexity Drives the Need for Integrators Disparate assets or interacting one-by-one Monitoring Real-time
More informationOracle Exadata: Strategy and Roadmap
Oracle Exadata: Strategy and Roadmap - New Technologies, Cloud, and On-Premises Juan Loaiza Senior Vice President, Database Systems Technologies, Oracle Safe Harbor Statement The following is intended
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationData Lake Best Practices
Data Lake Best Practices Agenda Why Data Lake Key Components of a Data Lake Modern Data Architecture Some Best Practices Case Study Summary Takeaways What is a Data Lake? What, why etc. What is a data
More informationAWS Serverless Architecture Think Big
MAKING BIG DATA COME ALIVE AWS Serverless Architecture Think Big Garrett Holbrook, Data Engineer Feb 1 st, 2017 Agenda What is Think Big? Example Project Walkthrough AWS Serverless 2 Think Big, a Teradata
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationThe Reality of Qlik and Big Data. Chris Larsen Q3 2016
The Reality of Qlik and Big Data Chris Larsen Q3 2016 Introduction Chris Larsen Sr Solutions Architect, Partner Engineering @Qlik Based in Lund, Sweden Primary Responsibility Advanced Analytics (and formerly
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationInside Kinetica. Inside. 1 / Core Concepts. 2 / Kinetica in your environment. 3 / Core Architecture. 4 / Administering Kinetica.
Inside Kinetica Inside 1 Core Concepts 2 Kinetica in your environment 3 Core Architecture 4 Administering Kinetica 5 Conclusion WHITE PAPER Summary For years, the CPU clock speed race trumped almost every
More informationSaving ETL Costs Through Data Virtualization Across The Enterprise
Saving ETL Costs Through Virtualization Across The Enterprise IBM Virtualization Manager for z/os Marcos Caurim z Analytics Technical Sales Specialist 2017 IBM Corporation What is Wrong with Status Quo?
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationMicrosoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo
Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 You have an Azure HDInsight cluster. You need to store data in a file format that
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationData pipelines with PostgreSQL & Kafka
Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationAn InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager
An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.
More informationAcquiring Big Data to Realize Business Value
Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways
More informationMaking Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST
Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0 WEBINAR MAY 15 th, 2018 1PM EST 10AM PST Welcome and Logistics If you have problems with the sound on your computer, switch
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationTop 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software
Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software jreser@progress.com Agenda Data Variety (Cloud and Enterprise) ABL ODBC Bridge Using Progress
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationSpatial Analytics Built for Big Data Platforms
Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationAugust 23, 2017 Revision 0.3. Building IoT Applications with GridDB
August 23, 2017 Revision 0.3 Building IoT Applications with GridDB Table of Contents Executive Summary... 2 Introduction... 2 Components of an IoT Application... 2 IoT Models... 3 Edge Computing... 4 Gateway
More informationGabriel Villa. Architecting an Analytics Solution on AWS
Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate
More informationWhat s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services
What s New at AWS? A selection of some new stuff Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services Speed of Innovation AWS Pace of Innovation AWS has been continually expanding its
More informationVerarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016
Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Hans Viehmann Product Manager EMEA ORACLE Corporation 12. Mai 2016 Safe Harbor Statement The following
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationMicrosoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo
Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationEvolution of Capabilities Hunter Downey, Solution Advisor
Evolution of Capabilities Hunter Downey, Solution Advisor What is our suite? Crystal Reports Web Intelligence Dashboards Explorer Mobile Lumira Predictive 2011 SAP. All rights reserved. 2 What is our suite?
More informationAccelerating Digital Transformation with InterSystems IRIS and vsan
HCI2501BU Accelerating Digital Transformation with InterSystems IRIS and vsan Murray Oldfield, InterSystems Andreas Dieckow, InterSystems Christian Rauber, VMware #vmworld #HCI2501BU Disclaimer This presentation
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationIOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK
IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK DR.KONSTANTIN BOUDNIK EPAM SYSTEMS CHIEF TECHNOLOGIST BIGDATA, OPEN SOURCE
More informationAlexander Klein. #SQLSatDenmark. ETL meets Azure
Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationOracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA
Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle
More information28 February 1 March 2018, Trafo Baden. #techsummitch
#techsummitch 28 February 1 March 2018, Trafo Baden #techsummitch Transform your data estate with cloud, data and AI #techsummitch The world is changing Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationHadoop course content
course content COURSE DETAILS 1. In-detail explanation on the concepts of HDFS & MapReduce frameworks 2. What is 2.X Architecture & How to set up Cluster 3. How to write complex MapReduce Programs 4. In-detail
More informationQualys Cloud Platform
18 QUALYS SECURITY CONFERENCE 2018 Qualys Cloud Platform Looking Under the Hood: What Makes Our Cloud Platform so Scalable and Powerful Dilip Bachwani Vice President, Engineering, Qualys, Inc. Cloud Platform
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More information