Approaching the Petabyte Analytic Database: What I learned
|
|
- Conrad Tyler
- 5 years ago
- Views:
Transcription
1 Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior written permission of Actian. This document is not intended to be binding upon Actian to any particular course of business, pricing, product strategy, and/or development. Actian assumes no responsibility for errors or omissions in this document. Actian shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. Actian does not warrant the accuracy or completeness of the information, text, graphics, links, or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or non-infringement.
2 Approaching the Petabyte Analytic Database: What I learned Keith Bolam Director of Engineering Projects November 2018
3 One petabyte of data? Where does the data come from? When do we need to access the data? Who is going to be accessing the data Flashback to One Billion Rows what next
4 After the Terabyte we arrive at the newest data size Petabyte is becoming a norm in regular conversation The human brain has a capacity of about 2.5 petabytes of memories Databases do NOT handle a Petabyte of data with ease Databases need you Actian Corporation
5 After the Terabyte we arrive at the newest data size Petabyte is becoming a norm in regular conversation The human brain has a capacity of about 2.5 petabytes of memories Databases do NOT handle a Petabyte of data with ease Databases need you Actian Corporation
6 Play Video Here Actian Corporation
7 Where does the data come from? OLTP systems Social Platforms Log and timeseries IoT Devices
8 What is one petabyte of data Today s iphones are 128 gb or more. So 8 of them make a Terabyte. So one petabyte is exactly 8000 phones Not a lot when we think of there being 73,734,000 iphones in But what can we do with it on a database? On the phones. We generally store images so one record may be 4 mb, maybe 12mb. A video could be 2-4 gb. In a database we are more interested in small data but lots of it. For images we would be interested in the metadata only IoT devices can generate many GB per day Actian Corporation
9 When do we need to access data? Now Frequentl Ad-Hoc? A years Time or longer Rolling Window
10 Petabyte implications on database analytic queries Try not to allow users access to the whole dataset THEY DO NOT NEED IT Bring Insight from the queries that have been run USE MONITORING TOOLS You do not need all the data in one place PUT IT IN SMALLER CHUNKS Put the data into the database in an appropriate way USE NATURAL CLUSTERING LET USERS ACCESS DATA EARLY See point 2 above Actian Corporation
11 Who will be accessing the data Data Scientists AI application and automated Insights BI Users Enterprise or Ad-Hoc
12 Data Scientists Applications : Business User Complex exploratory queries Few in number Long running May generate more data than they consume! Dynamically generated queries Potential for poor SQL No humans involved to 'tune' SQL Rapid request potential Corporate On-demand queries Organised generally on Date Customer Region Product Actian Corporation
13 Petabyte implications on database analytic queries Try not to allow users access to the whole dataset THEY DO NOT NEED IT Bring Insight from the queries that have been run USE MONITORING TOOLS You do not need all the data in one place PUT IT IN SMALLER CHUNKS Put the data into the database in an appropriate way USE NATURAL CLUSTERING LET USERS ACCESS DATA EARLY See point 2 above Actian Corporation
14 Spreading the effort on more nodes or bigger nodes? Increasing the nodes size and capability Azure HDInsight D12 tiny 4 vcpu 28 GB D13 small 8 vcpu 56 D16 starter 16 vcpu 128 Then they get much bigger and expensive. 8 Exabytes of Storage The power of MANY Increase Nodes Cores Both cores and nodes Considerations Bigger nodes = higher cost More nodes = greater joining cost More cores = greater Vectorization capability Actian Corporation
15 Flashback to One Billion Rows what next...
16 One Billion Rows Many devices and system produce data Not all at the same rate Our perspective on what is happening is affected by our viewpoint Actian Corporation
17 How did it work out
18 Some Numbers 22, bytes of data needs this number of records to make ONE Petabyte Time to load (63bn) ,855,406 Time to load 10m rows Rows per second loaded Actian Corporation
19 Consuming data while moving it helps Reducing the payload in the first place is even better If we eat while we work I it does get easier Leave the REALLY difficult tasks to someone else 2018 Actian Corporation
20 Take-away's from this session Planning Preparation Performance Look at the initial payload Identify what can be processed up front and never moved Consume during data movement Scale Slowly and Steadily Onboarding is time & cost sensitive Use insights to manage growth Enable user groups access progressively Users Applications BI/ELT Self-Serve can be the most disruptive of queries AI applications pre-defined by Data Scientist Business Reporting known reports that will be run at scheduled times Actian Corporation
21 How Actian's products can Change your business Be on the leading edge of Cloud 100GB to 20TB Use Actian Vector on-premise today Use Actian Vector on Azure Use Actian Vector on AWS Use Actian Vector as a Service 20GB to 100TB Use Actian Vector on Azure Use Actian Vector on AWS Use Actian Vector as a Service 100TB + Use Actian Vector on Azure Use Actian Vector on AWS Use Actian Vector as a Service
22 Actian Vector & Dataflow November 2018
23 Actian Vector Delivering fast, open, enterprise-grade analytics to top customers Achieve business insights not possible before Connect to all your data sources and systems Get to mission-critical production faster 24
24 Performance advantage derived through multiple innovations 1. Vectorized Processing 4. Smart Compression Single Instruction Multiple Data Maximize throughput Vectorized decompression in chip Typically 4-6:1 Compression Ratio- 2. Exploiting Chip Cache 5. Storage Indexes Process data in chip not in RAM Created Automatically simplifies schema Quickly identify candidate data blocks for solving queries Minimize I/O 3. Second Gen Columnar 6. Multi-core Parallelism Limit I/O Most efficient real time updates on and off Hadoop Maximize concurrency, parallelism and system resource utilization 25
25 Actian Vector The world s fastesest analytic database Scans, aggregations, and joins over 1TB, 5TB, 10TB databases, single user and 20 concurrent users, on same underlying configurations Performance advantage over competition grows as data scales, query complexity increases, and user concurrency increases Independently tested by MCG using Berkeley AMPLab Big Data Benchmark 10X Faster 14X Faster 20X Faster 100X Faster 26 Download the reports at
26 Benchmarking VectorH Vs SQL in Hadoop Competition How many times faster is VectorH? Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 VectorH HAWQ SparkSQL Impala Hive DNF The Benchmark includes two refresh streams that delete and insert 1/1000 th of the data. Note that only HIVE & Vector can complete these tests. The below query times reflect the time taken to complete the refresh streams and execute the query set after the refresh stre ams have been executed. Hive: RF1=34s RF2=112s GeoDiff=138.2% VectorH RF1=25s RF2=12.5s GeoDiff=99.3% VectorH Hive DNF
27 Benchmarking VectorH Vs SQL in Hadoop Competition How many times faster is VectorH? Click to add text Click to add text Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 VectorH HAWQ Click to add text SparkSQL Impala Hive DNF The Benchmark includes two refresh streams that delete and insert 1/1000 th of the data. Note that only HIVE & Vector can complete these tests. The below query times reflect the time taken to complete the refresh streams and execute the query set after the refresh stre ams have been executed. Hive: RF1=34s RF2=112s GeoDiff=138.2% VectorH RF1=25s RF2=12.5s GeoDiff=99.3% VectorH Hive DNF
28 Actian Vector for Hadoop: Enterprise class SQL BI & analytics natively in Hadoop ENTERPRISE GRADE Full ANSI SQL 2003 support leverage existing SQL skills and standard BI tools and apps Fully ACID compliant prevent inaccurate results by bringing transactional integrity to Hadoop Update Capability provide ability to update data in Hadoop without impacting query performance Native DBMS Security sleep well with enterprise class authentication, user and role-based security, data protection, and encryption 29
29 Actian Vector for Hadoop: Enterprise class SQL BI & analytics natively in Hadoop ENTERPRISE GRADE Full ANSI SQL 2003 support leverage existing SQL skills and standard BI tools and apps Fully ACID compliant prevent inaccurate results by bringing transactional integrity to Hadoop HIGH PERFORMANCE Highly Performant run existing apps faster and grow data without sacrificing performance High Concurrency allow simultaneous users and tasks to run without long wait times Update Capability provide ability to update data in Hadoop without impacting query performance Mature, proven planner and fast optimizer maximize usage of nodes, CPU, memory and cache with highly intelligent query execution plans Native DBMS Security sleep well with enterprise class authentication, user and role-based security, data protection, and encryption Native in-hadoop YARN optimize usage of low-cost Hadoop infrastructure by automatically managing cluster resources across applications 30
30 Actian Vector for Hadoop: Enterprise class SQL BI & analytics natively in Hadoop ENTERPRISE GRADE HIGH PERFORMANCE OPEN Full ANSI SQL 2003 support leverage existing SQL skills and standard BI tools and apps Fully ACID compliant prevent inaccurate results by bringing transactional integrity to Hadoop Update Capability provide ability to update data in Hadoop without impacting query performance Native DBMS Security sleep well with enterprise class authentication, user and role-based security, data protection, and encryption Highly Performant run existing apps faster and grow data without sacrificing performance High Concurrency allow simultaneous users and tasks to run without long wait times Mature, proven planner and fast optimizer maximize usage of nodes, CPU, memory and cache with highly intelligent query execution plans Native in-hadoop YARN optimize usage of low-cost Hadoop infrastructure by automatically managing cluster resources across applications Cloud get started quickly with flexible deployment options on premise or across multiple cloud infrastructures Hadoop distribution agnostic - avoid vendor lock-in and provide customer flexibility across distributions Collaborative architecture minimize risk by leveraging existing tools and benefitting from cross-industry innovations Open Data Formats query native Hadoop file formats and allow API access to our own block format 31
31 Actian Vector and DataFlow & Spark Ubiquitous Analytics Custom Apps Streaming ISVs Data DataFlow Spark Remote Data Traditional ETL SQL Vector Cloud Actian Vector Spark Connector Vector serves as a data source to Spark Apps Cloud Data & Applications Data Local Data Sources Data Actian Vector Spark Loader Ingest data from all available Spark sources Using the Spark Loader Actian Vector Spark Connector Spark Vector External Tables Using Spark 32
32 Processing capability and Scale required example drop table if exists sort_10t_x100; create table sort_10t_x100 ( ID UUID NOT NULL WITH DEFAULT, _c0 varchar(100) ) with PARTITION=(HASH on _c0 25 partitions); --Create the EXTERNAL table drop table if exists sort_10t; create external table sort_10t (_c0 varchar(100) ) USING SPARK WITH REFERENCE = 'adl:///user/actian/datasets/ sort/10tb/pennyinput_10m one', ROWS = , FORMAT = 'CSV', options= ( 'header' = 'false', 'delimeter' = ' ' ); Actian Corporation create external table sort_10t_full (_c0 varchar(100) ) USING SPARK WITH REFERENCE = 'adl:///user/actian/datasets/sort/10tb/*.one', ROWS = , FORMAT = 'CSV', options= ( 'header' = 'false', 'delimeter' = ' ' ); insert into sort_10t_x100 (_c0) select * from sort_10t; ( rows in secs) insert into sort_10t_x100 (_c0) select * from sort_10t_full; ( rows in secs) select first 2 tid, *, length(_c0) len from sort_10t_x100 order by id desc; 9e87baa2-e5f3-11e8-b d3a0d785a 9e87bb37-e5f3-11e8-b d3a0d785a (2 rows in secs)
33 Actian DataFlow Single platform for end-to-end data access, transformation, preparation, and predictive analysis Combines the KNIME (open source data mining platform) drag and drop visual workflow environment Eliminates memory constraints, and data movement prior to analytic processing Desktop, remote server, or clusters -- including Hadoop Transform, cleanse and analyze terabytes of data into actionable insights at recordbreaking speed on commodity hardware 34
34 Data Integration Some of our Vector Technology Partners Actian X Actian Vector & Vector in Hadoop JDBC 4.2 ODBC 3.5 Business Intelligence & Analysis 35
35 Thank you!
Oracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationActian Hybrid Data Conference 2018 London
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationApril Copyright 2013 Cloudera Inc. All rights reserved.
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on
More informationActian Vector Benchmarks. Cloud Benchmarking Summary Report
Actian Vector Benchmarks Cloud Benchmarking Summary Report April 2018 The Cloud Database Performance Benchmark Executive Summary The table below shows Actian Vector as evaluated against Amazon Redshift,
More informationHadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationActian SQL Analytics in Hadoop
Actian SQL Analytics in Hadoop The Fastest, Most Industrialized SQL in Hadoop A Technical Overview 2015 Actian Corporation. All Rights Reserved. Actian product names are trademarks of Actian Corp. Other
More informationHortonworks and The Internet of Things
Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationActian Hybrid Data Conference 2018 London
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationTechnical Sheet NITRODB Time-Series Database
Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationActian Hybrid Data Conference 2017 London Actian Corporation
Actian Hybrid Data Conference 2017 London 1 2017 Actian Corporation Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationOracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE
Oracle Database Exadata Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE Oracle Database Exadata combines the best database with the best cloud platform. Exadata is the culmination of more
More informationAccelerating BI on Hadoop: Full-Scan, Cubes or Indexes?
White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationSAP HANA Scalability. SAP HANA Development Team
SAP HANA Scalability Design for scalability is a core SAP HANA principle. This paper explores the principles of SAP HANA s scalability, and its support for the increasing demands of data-intensive workloads.
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationModernizing Business Intelligence and Analytics
Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationSpatial Analytics Built for Big Data Platforms
Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationSoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research
SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research 1 The world s most valuable resource Data is everywhere! May. 2017 Values from Data! Need infrastructures for
More informationPUBLIC SAP Vora Sizing Guide
SAP Vora 2.0 Document Version: 1.1 2017-11-14 PUBLIC Content 1 Introduction to SAP Vora....3 1.1 System Architecture....5 2 Factors That Influence Performance....6 3 Sizing Fundamentals and Terminology....7
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationBuilding a Data Strategy for a Digital World
Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationMicrosoft Big Data and Hadoop
Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common
More informationSAP HANA. Jake Klein/ SVP SAP HANA June, 2013
SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationMigrate from Netezza Workload Migration
Migrate from Netezza Automated Big Data Open Netezza Source Workload Migration CASE SOLUTION STUDY BRIEF Automated Netezza Workload Migration To achieve greater scalability and tighter integration with
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationpowered by Cloudian and Veritas
Lenovo Storage DX8200C powered by Cloudian and Veritas On-site data protection for Amazon S3-compliant cloud storage. assistance from Lenovo s world-class support organization, which is rated #1 for overall
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationSub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman
Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationIBM DB2 Analytics Accelerator Trends and Directions
March, 2017 IBM DB2 Analytics Accelerator Trends and Directions DB2 Analytics Accelerator for z/os on Cloud Namik Hrle IBM Fellow Peter Bendel IBM STSM Disclaimer IBM s statements regarding its plans,
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationQLIK INTEGRATION WITH AMAZON REDSHIFT
QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik
More informationProgress DataDirect For Business Intelligence And Analytics Vendors
Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline
More informationOptimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics
Optimizing and Modeling SAP Business Analytics for SAP HANA Iver van de Zand, Business Analytics Early data warehouse projects LIMITATIONS ISSUES RAISED Data driven by acquisition, not architecture Too
More informationCloud Analytics Database Performance Report
Cloud Analytics Database Performance Report Actian Vector up to 100x faster than Cloudera Impala MCG Global Services Benchmark Results April 2018 Key insights Independent benchmark performed by MCG Global
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationBig data easily, efficiently, affordably. UniConnect 2.1
Connecting Data. Delivering Intelligence Big data easily, efficiently, affordably UniConnect 2.1 The UniConnect platform is designed to unify data in a highly scalable and seamless manner, by building
More informationBigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation
BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE
ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE An innovative storage solution from Pure Storage can help you get the most business value from all of your data THE SINGLE MOST IMPORTANT
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationApache Kylin. OLAP on Hadoop
Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationIn-Memory Computing EXASOL Evaluation
In-Memory Computing EXASOL Evaluation 1. Purpose EXASOL (http://www.exasol.com/en/) provides an in-memory computing solution for data analytics. It combines inmemory, columnar storage and massively parallel
More informationGain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.
Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources
More information28 February 1 March 2018, Trafo Baden. #techsummitch
#techsummitch 28 February 1 March 2018, Trafo Baden #techsummitch Transform your data estate with cloud, data and AI #techsummitch The world is changing Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationWindows 10 IoT Overview. Microsoft Corporation
Windows 10 IoT Overview Microsoft Corporation 25 $7.2 BILLION TRILLION Connected things will by 2020 be in use by 2020 worldwide market for IoT solutions IDC: Worldwide and Regional Internet of Things
More informationIBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse
IBM dashdb Local Using a software-defined environment in a private cloud to enable hybrid data warehousing Evolving the data warehouse Managing a large-scale, on-premises data warehouse environments to
More informationBig Data solution benchmark
Big Data solution benchmark Introduction In the last few years, Big Data Analytics have gained a very fair amount of success. The trend is expected to grow rapidly with further advancement in the coming
More informationVision of the Software Defined Data Center (SDDC)
Vision of the Software Defined Data Center (SDDC) Raj Yavatkar, VMware Fellow Vijay Ramachandran, Sr. Director, Storage Product Management Business transformation and disruption A software business that
More informationBenchmarks Prove the Value of an Analytical Database for Big Data
White Paper Vertica Benchmarks Prove the Value of an Analytical Database for Big Data Table of Contents page The Test... 1 Stage One: Performing Complex Analytics... 3 Stage Two: Achieving Top Speed...
More informationIBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store
IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.
More informationSQL Server 2017 Power your entire data estate from on-premises to cloud
SQL Server 2017 Power your entire data estate from on-premises to cloud PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS Vulnerabilities (2010-2016) Power your entire data estate
More informationDatabase in the Cloud Benchmark
Database in the Cloud Benchmark Product Profile and Evaluation: Vertica and Redshift By William McKnight and Jake Dolezal July 2016 Sponsored by Table of Contents EXECUTIVE OVERVIEW 3 BENCHMARK SETUP 4
More informationAccelerating Digital Transformation with InterSystems IRIS and vsan
HCI2501BU Accelerating Digital Transformation with InterSystems IRIS and vsan Murray Oldfield, InterSystems Andreas Dieckow, InterSystems Christian Rauber, VMware #vmworld #HCI2501BU Disclaimer This presentation
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationLeveraging Customer Behavioral Data to Drive Revenue the GPU S7456
Leveraging Customer Behavioral Data to Drive Revenue the GPU way 1 Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send
More informationAn InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager
An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.
More informationKognitio Analytical Platform
Kognitio Analytical Platform Technical Profile Overview Kognitio is a pioneer in high-performance, scalable Big Data analytics for Data Science & Business Intelligence Updated March 2016 for Kognitio v8.2
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationBI ENVIRONMENT PLANNING GUIDE
BI ENVIRONMENT PLANNING GUIDE Business Intelligence can involve a number of technologies and foster many opportunities for improving your business. This document serves as a guideline for planning strategies
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationBEST BIG DATA CERTIFICATIONS
VALIANCE INSIGHTS BIG DATA BEST BIG DATA CERTIFICATIONS email : info@valiancesolutions.com website : www.valiancesolutions.com VALIANCE SOLUTIONS Analytics: Optimizing Certificate Engineer Engineering
More informationDeploying, Managing and Reusing R Models in an Enterprise Environment
Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics
More informationFAST SQL SERVER BACKUP AND RESTORE
WHITE PAPER FAST SQL SERVER BACKUP AND RESTORE WITH PURE STORAGE TABLE OF CONTENTS EXECUTIVE OVERVIEW... 3 GOALS AND OBJECTIVES... 3 AUDIENCE... 3 PURE STORAGE INTRODUCTION... 4 SOLUTION SUMMARY... 4 FLASHBLADE
More informationEnergy Management with AWS
Energy Management with AWS Kyle Hart and Nandakumar Sreenivasan Amazon Web Services August [XX], 2017 Tampa Convention Center Tampa, Florida What is Cloud? The NIST Definition Broad Network Access On-Demand
More informationDATABASE SCALE WITHOUT LIMITS ON AWS
The move to cloud computing is changing the face of the computer industry, and at the heart of this change is elastic computing. Modern applications now have diverse and demanding requirements that leverage
More informationFive Common Myths About Scaling MySQL
WHITE PAPER Five Common Myths About Scaling MySQL Five Common Myths About Scaling MySQL In this age of data driven applications, the ability to rapidly store, retrieve and process data is incredibly important.
More informationAn Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)
An Oracle White Paper June 2011 (EHCC) Introduction... 3 : Technology Overview... 4 Warehouse Compression... 6 Archive Compression... 7 Conclusion... 9 Introduction enables the highest levels of data compression
More informationImpala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam
Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?
More information