GPU-Accelerated Analytics on your Data Lake.
|
|
- Roxanne Edwards
- 5 years ago
- Views:
Transcription
1 GPU-Accelerated Analytics on your Data Lake.
2 Data Lake
3 Data Swamp
4 ETL Hell DATA LAKE >>>>>>>>>>>>>>>>>>>>>> >>>>>>>> >>>>>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >>>> >>>>> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>> >>>> >>>>> >>>>> >>>> >>>> >>>>>>>>>> >>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>
5 COMMON DATA LAYER
6 Simplify Data Storage SCHEMA METADATA DATA
7 SQL Warehouse on Data Lake
8 BlazingDB How it works DATA LAKE Compression/Decompression Filtering (Predicate Pushdown) Aggregations Transformations Joins Sorting/Ordering Local Disk HDFS AWS S3 RAM Cache (Hot) Disk Cache (Medium) HDD SSD
9 BlazingDB Multi-nodal Cluster
10 Shared Data Architecture DATA LAKE
11 The Nays No Ingest No Duplication No BlazingDB Specific ETL No Consistency Management No Vendor Lock-in
12 The Yays Incredibly Fast SQL Scalable, On Demand Data Warehouse Multi-Terabyte Queries Data Sharing (Across Clusters And Other Tools) High Concurrency
13 DEMO
14 Demo - Architecture HDFS on Azure Azure GPU Servers NC24 V1 4 Servers
15 SECONDS Queries: BlazingDB 4 Node Query times (Lower is better) Cold Medium (Disk cache only) Hot Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES
16 SECONDS Query 1 Query select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendeprice) as sum_disc_price, sum(l_extendeprice*(1-l_discount)) as sum_base_price, sum(l_extendeprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quatity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(l_quantity) as count_order from lineitem where l_shipdate <= group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; Cold Query 1 Medium (Disk cache only) Hot Data Points 6 billion row table Many aggregations/transformations
17 SECONDS Query 2 Query select lineitem.l_orderkey, sum(lineitem.l_extendedprice*(1- lineitem.l_discount)) as revenue, orders.o_orderdate, orders.o_shippriority from customer inner join orders on customer.c_custkey = orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey where customer.c_mktsegment = 'BUILDING' and orders.o_orderdate < ' ' and lineitem.l_shipdate > ' ' group by lineitem.l_orderkey, orders.o_orderdate, orders.o_shippriority order by revenue desc,orders.o_orderdate; Cold Query 2 Medium (Disk cache only) Hot Data Points Join 6B rows to 1.5B rows to 150M rows Many aggregations/transformations Order (sorting)
18 SECONDS Query 3 Query select nation.name, sum(lineitem.l_extendedprice * (1 - lineitem.l_discount)) as revenue from customer inner join orders on customer.cust_key = orders.o_custkey inner join lineitem on lineitem.l_orderkey = orders.o_orderkey inner join supplier on lineitem.l_suppkey = supplier.s_suppkey inner join nation on supplier.s_nationkey = nation.nation_key inner join region on nation.region_key = region.r_regionkey where supplier.s_nationkey = nation.nation_key and region.r_name = 'ASIA' and orders.o_orderdate >= ' ' and orders.o_orderdate < ' ' group by nation.name order by revenue desc Cold Query 3 Medium (Disk cache only) Hot Data Points Join 6B rows to 1.5B rows to 150M rows (and many small joins) Multiple aggregations/transformations Order (sorting)
19 SECONDS Query 4 Query select sum(l_extendedprice) as sum_exprice, sum(l_discount) as sum_discount from lineitem where l_shipdate >= ' ' and l_shipdate < ' ' and l_discount >= 0.05 and l_discount <= 0.07 and l_quantity < 24 Cold Query 4 Medium (Disk cache only) Hot Data Points 6B row table Multiple aggregations/transformations
20 SECONDS Query 5 Query1 select supplier.s_acctbal, supplier.s_suppkey, nation.name, part.p_partkey, part.p_mfgr, supplier.s_address, supplier.s_phone, supplier.s_comment from supplier inner join partsupp on supplier.s_suppkey = partsupp.ps_suppkey inner join nation on supplier.s_nationkey = nation.nation_key inner join region on nation.region_key = region.r_regionkey inner join part on part.p_partkey = partsupp.ps_partkey where part.p_size = 15 and part.p_type in ('ECONOMY ANODIZED BRASS', 'ECONOMY BRUSHED BRASS', 'ECONOMY BURNISHED BRASS', 'ECONOMY PLATED BRASS', 'ECONOMY POLISHED BRASS', 'LARGE ANODIZED BRASS', LARGE BRUSHED BRASS','LARGE BURNISHED BRASS','LARGE PLATED BRASS', 'LARGE POLISHED BRASS', 'SMALL ANODIZED BRASS', 'SMALL BRUSHED BRASS', 'SMALL BURNISHED BRASS', SMALL PLATED BRASS', 'SMALL POLISHED BRASS', 'STANDARD ANODIZED BRASS', 'STANDARD BRUSHED BRASS', 'STANDARD BURNISHED BRASS', 'STANDARD PLATED BRASS', 'STANDARD POLISHED BRASS') and region.r_name = 'EUROPE' order by supplier.s_acctbal desc, supplier.s_suppkey, nation.name, part.p_partkey Cold Query 5 Medium (Disk cache only) Hot Data Points Join multiple tables Many aggregations/transformations String comparisons
21 Data Pipeline Common Data Layer Coming Soon STORAGE (Data Lake) GPU Data Frame Apache Arrow INGEST
22 Questions?
23
Vectorized Postgres (VOPS extension) Konstantin Knizhnik Postgres Professional
Vectorized Postgres (VOPS extension) Konstantin Knizhnik Postgres Professional Why Postgres is slow on OLAP queries? 1. Unpacking tuple overhead (heap_deform_tuple) 2. Interpretation overhead (invocation
More informationTPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets
TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets
More informationJust In Time Compilation in PostgreSQL 11 and onward
Just In Time Compilation in PostgreSQL 11 and onward Andres Freund PostgreSQL Developer & Committer Email: andres@anarazel.de Email: andres.freund@enterprisedb.com Twitter: @AndresFreundTec anarazel.de/talks/2018-09-07-pgopen-jit/jit.pdf
More informationEfficient in-memory query execution using JIT compiling. Han-Gyu Park
Efficient in-memory query execution using JIT compiling Han-Gyu Park 2012-11-16 CONTENTS Introduction How DCX works Experiment(purpose(at the beginning of this slide), environment, result, analysis & conclusion)
More informationNewSQL Databases MemSQL and VoltDB Experimental Evaluation
NewSQL Databases MemSQL and VoltDB Experimental Evaluation João Oliveira 1 and Jorge Bernardino 1,2 1 ISEC, Polytechnic of Coimbra, Rua Pedro Nunes, Coimbra, Portugal 2 CISUC Centre for Informatics and
More informationADMS/VLDB, August 27 th 2018, Rio de Janeiro, Brazil OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING
ADMS/VLDB, August 27 th 2018, Rio de Janeiro, Brazil 1 OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING OPTIMIZING GROUP-BY AND AGGREGATION USING GPU-CPU CO-PROCESSING MOTIVATION OPTIMIZING
More informationInfrastructure at your Service. In-Memory-Pläne für den 12.2-Optimizer: Teuer oder billig?
Infrastructure at your Service. In-Memory-Pläne für den 12.2-Optimizer: Teuer oder billig? About me Infrastructure at your Service. Clemens Bleile Senior Consultant Oracle Certified Professional DB 11g,
More informationChallenges in Query Optimization. Doug Inkster, Ingres Corp.
Challenges in Query Optimization Doug Inkster, Ingres Corp. Abstract Some queries are inherently more difficult than others for a query optimizer to generate efficient plans. This session discusses the
More informationHigh Volume In-Memory Data Unification
25 March 2017 High Volume In-Memory Data Unification for UniConnect Platform powered by Intel Xeon Processor E7 Family Contents Executive Summary... 1 Background... 1 Test Environment...2 Dataset Sizes...
More informationComparison of Database Cloud Services
Comparison of Database Cloud Services Testing Overview ORACLE WHITE PAPER SEPTEMBER 2016 Table of Contents Table of Contents 1 Disclaimer 2 Preface 3 Introduction 4 Cloud OLTP Workload 5 Cloud Analytic
More informationComparison of Database Cloud Services
Comparison of Database Cloud Services Benchmark Testing Overview ORACLE WHITE PAPER SEPTEMBER 2016 Table of Contents Table of Contents 1 Disclaimer 2 Preface 3 Introduction 4 Cloud OLTP Workload 5 Cloud
More informationTake Me to SSD: A Hybrid Block-Selection Method on HDFS based on Storage Type
Take Me to SSD: A Hybrid Block-Selection Method on HDFS based on Storage Type Minkyung Kim Yonsei University 50 Yonsei-ro, Seodaemun-gu Seoul, Korea +82 2 2123 7757 goodgail@cs.yonsei.ac.kr Mincheol Shin
More informationOn-Disk Bitmap Index Performance in Bizgres 0.9
On-Disk Bitmap Index Performance in Bizgres 0.9 A Greenplum Whitepaper April 2, 2006 Author: Ayush Parashar Performance Engineering Lab Table of Contents 1.0 Summary...1 2.0 Introduction...1 3.0 Performance
More informationConsolidate your data analytics servers with the Dell EMC. PowerEdge R740xd.
A Principled Technologies report: Hands-on testing. Real-world results. New Dell EMC server with HDDs 3x the work of the legacy server New Dell EMC server with SATA SSDs 6x the work of the legacy server
More informationWhen and How to Take Advantage of New Optimizer Features in MySQL 5.6. Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle
When and How to Take Advantage of New Optimizer Features in MySQL 5.6 Øystein Grøvlen Senior Principal Software Engineer, MySQL Oracle Program Agenda Improvements for disk-bound queries Subquery improvements
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationRun Your Own Oracle Database Benchmarks with Hammerora
Run Your Own Oracle Database Benchmarks with Hammerora Steve Shaw Database Technology Manager Software and Services Group Date: 19-NOV-09 Time: 3.00 3.45 Location: Seoul Steve Shaw Introduction Database
More informationRecommending Materialized Views and Indexes with the IBM DB2 Design Advisor
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor Daniel C. Zilio et al Proceedings of the International Conference on Automatic Computing (ICAC 04) Rolando Blanco CS848 - Spring
More informationJITing PostgreSQL using LLVM
JITing PostgreSQL using LLVM Andres Freund PostgreSQL Developer & Committer Email: andres@anarazel.de Email: andres.freund@enterprisedb.com Twitter: @AndresFreundTec anarazel.de/talks/fosdem-2018-02-03/jit.pdf
More informationMidterm Review. March 27, 2017
Midterm Review March 27, 2017 1 Overview Relational Algebra & Query Evaluation Relational Algebra Rewrites Index Design / Selection Physical Layouts 2 Relational Algebra & Query Evaluation 3 Relational
More informationWhen MPPDB Meets GPU:
When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationMain-Memory Database Management Systems
Main-Memory Database Management Systems David Broneske Otto-von-Guericke University Magdeburg Summer Term 2018 Credits Parts of this lecture are based on content by Jens Teubner from TU Dortmund and Sebastian
More informationAvoiding Sorting and Grouping In Processing Queries
Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation
More informationMCSA SQL SERVER 2012
MCSA SQL SERVER 2012 1. Course 10774A: Querying Microsoft SQL Server 2012 Course Outline Module 1: Introduction to Microsoft SQL Server 2012 Introducing Microsoft SQL Server 2012 Getting Started with SQL
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationJignesh M. Patel. Blog:
Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationDetec%ng the Temporal Context of Queries. Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, and Dieter Gawlick 09/01/2014
Detec%ng the Temporal Context of Queries Oliver Kennedy, Ying Yang, Jan Chomicki, Ronny Fehling, Zhen Hua Liu, and Dieter Gawlick 09/01/2014 Outline Mo.va.on Contextual Analysis Prac.cal Temporal Dependency
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationSchema Tuning. Tuning Schemas : Overview
Administração e Optimização de Bases de Dados 2012/2013 Schema Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID Tuning Schemas : Overview Trade-offs among normalization / denormalization Overview When
More informationWhitepaper. Big Data implementation: Role of Memory and SSD in Microsoft SQL Server Environment
Whitepaper Big Data implementation: Role of Memory and SSD in Microsoft SQL Server Environment Scenario Analysis of Decision Support System with Microsoft Windows Server 2012 OS & SQL Server 2012 and Samsung
More informationFinding the Pitfalls in Query Performance
Finding the Pitfalls in Query Performance M.L. Kersten P. Koutsourakis Y. Zhang CWI, MonetDB Solutions EU H2020 project ACTiCLOUD The Challenge MonetDB Mar-18 Which system is relatively better? Postgres
More informationUsing MySQL, Hadoop and Spark for Data Analysis
Using MySQL, Hadoop and Spark for Data Analysis Alexander Rubin Principle Architect, Percona September 21, 2015 About Me Alexander Rubin, Principal Consultant, Percona Working with MySQL for over 10 years
More informationJIT-Compiling SQL Queries in PostgreSQL Using LLVM
JIT-Compiling SQL Queries in PostgreSQL Using LLVM Dmitry Melnik *, Ruben Buchatskiy, Roman Zhuykov, Eugene Sharygin Institute for System Programming of the Russian Academy of Sciences (ISP RAS) * dm@ispras.ru
More informationData Processing on Modern Hardware
Data Processing on Modern Hardware Jens Teubner, ETH Zurich, Systems Group jens.teubner@inf.ethz.ch Fall 2012 c Jens Teubner Data Processing on Modern Hardware Fall 2012 1 Part II Cache Awareness c Jens
More informationA Case Study of Real-World Porting to the Itanium Platform
A Case Study of Real-World Porting to the Itanium Platform Jeff Byard VP, Product Development RightOrder, Inc. Agenda RightOrder ADS Product Description Porting ADS to Itanium 2 Testing ADS on Itanium
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationEfficiency Analysis of the access method with the cascading Bloom filter to the data warehouse on the parallel computing platform
Journal of Physics: Conference Series PAPER OPEN ACCESS Efficiency Analysis of the access method with the cascading Bloom filter to the data warehouse on the parallel computing platform To cite this article:
More informationAlgebricks: A Data Model-Agnostic Compiler Backend for Big Data Languages
Algebricks: A Data Model-Agnostic Compiler Backend for Big Data Languages Vinayak Borkar 2* Yingyi Bu 1 E. Preston Carman, Jr. 3 Nicola Onose 2* Till Westmann 4 Pouria Pirzadeh 1 Michael J. Carey 1 Vassilis
More informationData Processing on Modern Hardware
Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2015 c Jens Teubner Data Processing on Modern Hardware Summer 2015 1 Part II Cache Awareness
More informationWHITE PAPER. Best Practices for Using Tableau with Snowflake BY ALAN ELDRIDGE, ET. AL.
WHITE PAPER Best Practices for Using Tableau with Snowflake BY ALAN ELDRIDGE, ET. AL. What s inside: 3 Introduction 4 What Is Tableau? 5 What Is Snowflake? 8 What you don t have to worry about with Snowflake
More informationColumnstore Technology Improvements in SQL Server Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan
Columnstore Technology Improvements in SQL Server 2016 Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with
More informationSepand Gojgini. ColumnStore Index Primer
Sepand Gojgini ColumnStore Index Primer SQLSaturday Sponsors! Titanium & Global Partner Gold Silver Bronze Without the generosity of these sponsors, this event would not be possible! Please, stop by the
More informationCourse Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:
Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course: 20762C Developing SQL 2016 Databases Module 1: An Introduction to Database Development Introduction to the
More informationWhy All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts
White Paper Analytics & Big Data Why All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts Table of Contents page Compression...1 Early and Late Materialization...1
More informationcstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman
cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman What is CitusDB? CitusDB is a scalable analytics database that extends PostgreSQL Citus shards your data and automa/cally parallelizes
More informationActian Vector Benchmarks. Cloud Benchmarking Summary Report
Actian Vector Benchmarks Cloud Benchmarking Summary Report April 2018 The Cloud Database Performance Benchmark Executive Summary The table below shows Actian Vector as evaluated against Amazon Redshift,
More informationTechnical Sheet NITRODB Time-Series Database
Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes
More informationTechnical Report - Distributed Database Victor FERNANDES - Université de Strasbourg /2000 TECHNICAL REPORT
TECHNICAL REPORT Distributed Databases And Implementation of the TPC-H Benchmark Victor FERNANDES DESS Informatique Promotion : 1999 / 2000 Page 1 / 29 TABLE OF CONTENTS ABSTRACT... 3 INTRODUCTION... 3
More informationOrri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso).
Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). Business Intelligence Extensions for SPARQL Orri Erling and Ivan Mikhailov OpenLink Software, 10 Burlington
More informationSpark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters
1 Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters Yuan Yuan, Meisam Fathi Salmi, Yin Huai, Kaibo Wang, Rubao Lee and Xiaodong Zhang The Ohio State University Paypal Inc. Databricks
More informationa linear algebra approach to olap
a linear algebra approach to olap Rogério Pontes December 14, 2015 Universidade do Minho data warehouse ETL OLTP OLAP ETL Warehouse OLTP Data Mining ETL OLTP Data Marts 2 olap Online analytical processing
More informationData Processing on Modern Hardware
Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2017 c Jens Teubner Data Processing on Modern Hardware Summer 2017 1 Part II Cache Awareness
More informationExploring Query Execution Strategies for JIT, Vectorization and SIMD
Exploring Query Execution Strategies for JIT, Vectorization and SIMD Tim Gubner CWI tim.gubner@cwi.nl Peter Boncz CWI peter.boncz@cwi.nl ABSTRACT This paper partially explores the design space for efficient
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationShark: Hive (SQL) on Spark
Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce
More informationTPC Benchmark H Full Disclosure Report. Kickfire Appliance 2400 Using MySQL Database
TPC Benchmark H Full Disclosure Report Kickfire Appliance 2400 Using MySQL Database Submitted for Review Report Date: May 5, 2008 TPCH Benchmark Full Disclosure Report Added discount explanation note (June
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationOn-Disk Bitmap Index In Bizgres
On-Disk Bitmap Index In Bizgres Ayush Parashar aparashar@greenplum.com and Jie Zhang jzhang@greenplum.com 1 Agenda Introduction to On-Disk Bitmap Index Bitmap index creation Bitmap index creation performance
More informationarxiv: v1 [cs.ai] 14 Nov 2017
DataVizard: Recommending Visual Presentations for Structured Data Rema Ananthanarayanan, Pranay K Lohia, and Srikanta Bedathur IBM Research, India November 15, 2017 arxiv:1711.04971v1 [cs.ai] 14 Nov 2017
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationMOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems
MOIRA A Goal-Oriented Incremental Machine Learning Approach to Dynamic Resource Cost Estimation in Distributed Stream Processing Systems Daniele Foroni, C. Axenie, S. Bortoli, M. Al Hajj Hassan, R. Acker,
More informationApril Copyright 2013 Cloudera Inc. All rights reserved.
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on
More informationMicrosoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud
Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions
More informationBig Data solution benchmark
Big Data solution benchmark Introduction In the last few years, Big Data Analytics have gained a very fair amount of success. The trend is expected to grow rapidly with further advancement in the coming
More information6.830 Problem Set 2 (2017)
6.830 Problem Set 2 1 Assigned: Monday, Sep 25, 2017 6.830 Problem Set 2 (2017) Due: Monday, Oct 16, 2017, 11:59 PM Submit to Gradescope: https://gradescope.com/courses/10498 The purpose of this problem
More informationTuning Relational Systems I
Tuning Relational Systems I Schema design Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning, etc Query rewriting Using indexes appropriately,
More informationBDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz
BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,
More informationSimba: Towards Building Interactive Big Data Analytics Systems. Feifei Li
Simba: Towards Building Interactive Big Data Analytics Systems Feifei Li Complex Operators over Rich Data Types Integrated into System Kernel For Example: SELECT k-means from Population WHERE k=5 and feature=age
More informationQuerying Data with Transact SQL
Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More information[AVNICF-MCSASQL2012]: NICF - Microsoft Certified Solutions Associate (MCSA): SQL Server 2012
[AVNICF-MCSASQL2012]: NICF - Microsoft Certified Solutions Associate (MCSA): SQL Server 2012 Length Delivery Method : 5 Days : Instructor-led (Classroom) Course Overview Participants will learn technical
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationOptimizer Standof. MySQL 5.6 vs MariaDB 5.5. Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012
Optimizer Standof MySQL 5.6 vs MariaDB 5.5 Peter Zaitsev, Ovais Tariq Percona Inc April 18, 2012 Thank you Ovais Tariq Ovais Did a lot of heavy lifing for this presentation He could not come to talk together
More informationBig Data and Object Storage
Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationActive Disks - Remote Execution
- Remote Execution for Network-Attached Storage Erik Riedel Parallel Data Laboratory, Center for Automated Learning and Discovery University www.pdl.cs.cmu.edu/active Parallel Data Laboratory Center for
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationLenovo Database Configuration
Lenovo Database Configuration for Microsoft SQL Server Standard Edition DWFT 9TB Reduce time to value with pretested hardware configurations Data Warehouse problem and a solution The rapid growth of technology
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic
More informationBenchmark TPC-H 100.
Benchmark TPC-H 100 vs Benchmark TPC-H Transaction Processing Performance Council (TPC) is a non-profit organization founded in 1988 to define transaction processing and database benchmarks and to disseminate
More informationImplementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language
Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations Show Only certain columns and rows from the join of Table A with Table B The implementation of table operations
More informationETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere
ETL Best Practices and Techniques Marc Beacom, Managing Partner, Datalere Thank you Sponsors Experience 10 years DW/BI Consultant 20 Years overall experience Marc Beacom Managing Partner, Datalere Current
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationLenovo Database Configuration for Microsoft SQL Server TB
Database Lenovo Database Configuration for Microsoft SQL Server 2016 22TB Data Warehouse Fast Track Solution Data Warehouse problem and a solution The rapid growth of technology means that the amount of
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationColumnstore Technology Improvements in SQL Server 2016
Columnstore Technology Improvements in SQL Server 2016 Subtle Subtitle AlwaysOn Niko Neugebauer Our Sponsors Niko Neugebauer Microsoft Data Platform Professional OH22 (http://www.oh22.net) SQL Server MVP
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationImpala Intro. MingLi xunzhang
Impala Intro MingLi xunzhang Overview MPP SQL Query Engine for Hadoop Environment Designed for great performance BI Connected(ODBC/JDBC, Kerberos, LDAP, ANSI SQL) Hadoop Components HDFS, HBase, Metastore,
More informationDatabricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes
Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified
More information