From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
|
|
- Rudolph Mason
- 5 years ago
- Views:
Transcription
1 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
2 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways 2
3 Data Lakes A data lake is a storage repository that holds a vast amount of raw data in its native format. The data structure and requirements are not defined until the data is needed The current needs for sophisticated data-driven intelligence and data science favored this concept for its simplicity and power Hadoop and its ecosystem provided the foundation that data lakes required: vast storage and processing muscle It also favored the concept of ELT vs ETL: load data first, (maybe) 3
4 Data Lakes Not a Perfect World Physical Nature Based on Replication. Data Lakes require data to be copied to its physical storage Replication extends development cycles and costs Not all data is suitable for replication Single Purpose Real time needs: Cloud and SaaS APIs Large volumes: existing EDW Laws and restrictions Usage of the data lake is often monopolize by data scientists New data silo. No clear path to share insights with business users Lacks the governance, security and quality that business users are used to (e.g. in the EDW) 5
5 The Rise of Logical Architectures The Evolution of Analytical Architectures Source: Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs Gartner April
6 The Multipurpose Data Lake with Data Virtualization Logical Nature Replication is an option, not a necessity Broaden data access, shorten development times, better insights Tight integration with big data systems. Fast execution with large data volumes Multi-purpose Curated access for non-technical users Better governance and access control Better ROI for the investment of the lake 8
7 The Multipurpose Data Lake with Data Virtualization A multi-purpose data lake can become an organization s universal data delivery system Architecting the Multi-Purpose Data Lake with Data Virtualization, Rick Van der Lans, April
8 The Virtual Data Lake Access to all Data Sources Single access to all data assets, internal and external: Physical Data Lake (usually based on SQL-on- Hadoop systems) Other databases (EDW, ODS, applications, etc.) SaaS APIs (Salesforce, Google, social media, etc.) Files (local, S3, Azure, etc.) 10
9 The Virtual Data Lake Ingesting and Caching The physical Data Lake can also be used as Denodo s cache This allows to quickly load any data accessible by Denodo to the Hadoop cluster Caching becomes an alternative to ingestion ELT processes that preserves lineage and governance Load process based on direct load to HDFS: 1. Creation of the target table in Cache system 2. Generation of Parquet files (in chunks) with Snappy compression in the local machine 3. Upload in parallel of Parquet files to HDFS 11
10 The Virtual Data Lake Using the Lake Processing Engine Denodo optimizer provides native integration with MPP systems to provide one extra key capability: Query Acceleration Denodo can move, on demand, processing to the MPP during execution of a query Parallel power for calculations in the virtual layer Avoids slow processing in-disk when processing buffers don t fit into Denodo s memory (swapped data) 12
11 Example: Scenario Evolution of sales per ZIP code over the previous years. Scenario: Current data (last 12 months) in EDW Historical data offloaded to Hadoop cluster for cheaper storage Customer master data is used often, so it is cached in the Hadoop cluster union group by ZIP join Very large data volumes: Current Sales 100 million rows Historical Sales 300 million rows Customer 2 million rows (cached) Sales tables have hundreds of millions of rows 13
12 Example: What are the options? Simple Federation 1) Simple Federation in Virtual Layer Move hundreds of millions of rows for processing in the virtual layer 2) Data Shipping Move Current sales to Hadoop and process content in the cluster Moves 100 million rows Shipping 3) Partial Aggregation Pushdown (Denodo 6) Modifies the execution tree to split the aggregation in two steps: 1. by Customer ID for the JOIN (pushed down to source) 2. by ZIP for the final results (in virtual layer) Reduces significantly network traffic but processing of large amount of data in the virtual layer (aggregation by ZIP) becomes the bottleneck 4) Denodo s MPP Integration (Denodo 7 next slide) group by ID group by ZIP join group by ZIP join 14
13 The Virtual Data Lake Putting the Pieces Together 2. Integrated with Cost Based Optimizer Based on data volume estimation and the cost of these particular operations, the CBO can decide to move all or part of the execution tree to the MPP group by ZIP join 5. Fast parallel execution Support for Spark, Presto and Impala for fast analytical processing in inexpensive Hadoop-based solutions 1. Partial Aggregation push down Maximizes source processing dramatically Reduces network traffic 2M rows (sales by customer) group by Customer ID Current Sales 68 M rows 3. On-demand data transfer Denodo automatically generates and upload Parquet files Hist. Sales 220 M rows Customer 2 M rows (Cached) 4. Integration with local data The engine detects when data is cached or comes from a local table already in the MPP System Execution Time Optimization Techniques Others ~ 10 min Simple federation No MPP 43 sec Aggregation push-down With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes) 15
14 The Virtual Data Lake - Conclusions A Virtual Data Lake improves decision making and shortens development cycles Surfaces all company data from multiple repositories without the need to replicate all data into the lake Eliminates data silos: allows for on-demand combination of data from multiple sources A Virtual Data Lake broadens adoption of the lake and improves its ROI Improves governance and metadata management to avoid data swamps Allows controlled access to the lake to non-technical users A Virtual Data Lake offer performance for the Big Data World Leverages the processing power of the existing cluster controlled by Denodo s optimizer 16
15 Customer Success Story 17
16 Customer Case Overview THE CHALLENGE: Find an agile way to integrate data from existing silos, including data warehouse, machine data, and others, that will reduce dependencies from business users on IT and provides quick turnaround and flexibility. BUSINESS NEED Optimize operational efficiency, automate manufacturing processes, and deliver on-demand services to business consumers Find smarter ways to aggregate and analyze data An agile solution that enables the monetization of customer-facing data products Free business users from IT reliance to become self-sufficient with reporting and analysis Founded 1925 Annual revenues (FY 2017) $3,1 B Over 20,000 employees Headquarter Germany World s leading supplier of automation technology and technical education. 18
17 Customer Case Overview SOLUTION: Festo developed a Big Data Analytics Framework to provide a data marketplace to better support the business Using the Denodo Platform to integrate data from numerous on-prem and cloud systems in real-time A unified layer for consistent data access and governance across different data silos 19
18 Demo 21
19 Example What s the impact of a new marketing campaign for each country? Historical sales data offloaded to Hadoop cluster for cheaper storage Marketing campaigns managed in an external cloud app Country is part of the customer details table, stored in the DW join group by state join Sales Campaign Customer Consume Combine, Transfor m & Integrate Base View Source Abstraction Sources 22
20 Key Takeaways 23
21 Key Takeaways Hadoop-based Data Lakes are the standard approach to modern analytics within most organizations Physical Data Lakes introduce many complexities (replication, synchronization, governance, etc.) that restrict their use Logical Data Lakes allow users to access data from all sources internal and external to grow value of Data Lake approach Data Virtualization creates multipurpose Data Lakes for all kinds of users data scientists and business users Data Virtualization introduces governance and access controls to the Data Lake without impeding the power users' 24
22 Q&A
23 Next steps Denodo Express Test Drive Questions? Accelerate Your Fast Data Strategy with Denodo Express. Try Denodo Express for free Test Drive Denodo Platform on AWS for Agile BI and Analytics Take Denodo for Test Drive Please do reach out for any questions or requests. Send us an
Using Data Virtualization to Accelerate Time-to-Value From Your Data. Integrating Distributed Data in Real Time
Using Data Virtualization to Accelerate Time-to-Value From Your Data Integrating Distributed Data in Real Time Speaker Paul Moxon VP Data Architectures and Chief Evangelist @ Denodo Technologies Data,
More informationFast Innovation requires Fast IT
Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationModernizing Business Intelligence and Analytics
Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from
More informationBest practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP
Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can
More informationDrawing the Big Picture
Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationMaking Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST
Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0 WEBINAR MAY 15 th, 2018 1PM EST 10AM PST Welcome and Logistics If you have problems with the sound on your computer, switch
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationTalend Spark Meetup. Edward Ost Talend
Talend Spark Meetup Edward Ost 2016 Talend 1 Talend: A History of Innovation and Growth Data Preparation Data Integration Data Quality Master Data Management Application Integration Big Data Hadoop 2.0
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More informationOliver Engels & Tillmann Eitelberg. Big Data! Big Quality?
Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Like to visit Germany? PASS Camp 2017 Main Camp 5.12 7.12.2017 (4.12 Kick Off Evening) Lufthansa Training & Conference Center, Seeheim SQL Konferenz
More informationHitachi Vantara Overview Pentaho 8.0 and 8.1 Roadmap. Pedro Alves
Hitachi Vantara Overview Pentaho 8.0 and 8.1 Roadmap Pedro Alves Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our current intended product direction.
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years
More informationThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,
More informationBuilt for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations
Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Table of contents Faster Visualizations from Data Warehouses 3 The Plan 4 The Criteria 4 Learning
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationImproving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You
Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You Özgür Yiğit Oracle Data Integration, Senior Manager, ECEMEA Safe Harbor Statement The following
More informationAnalyze Big Data Faster and Store It Cheaper
Analyze Big Data Faster and Store It Cheaper Dr. Steve Pratt, CenterPoint Russell Hull, SAP Public About CenterPoint Energy, Inc. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas
More informationTechnical Sheet NITRODB Time-Series Database
Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes
More informationVirtuoso Infotech Pvt. Ltd.
Virtuoso Infotech Pvt. Ltd. About Virtuoso Infotech Fastest growing IT firm; Offers the flexibility of a small firm and robustness of over 30 years experience collectively within the leadership team Technology
More informationDigital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU
Digital Enterprise Platform for Live Business Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Rethinking the Future Competing in today s marketplace means leveraging
More information#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.
Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data
More informationIOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK
IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK DR.KONSTANTIN BOUDNIK EPAM SYSTEMS CHIEF TECHNOLOGIST BIGDATA, OPEN SOURCE
More informationBringing Data to Life
Bringing Data to Life Data management and Visualization Techniques Benika Hall Rob Harrison Corporate Model Risk March 16, 2018 Introduction Benika Hall Analytic Consultant Wells Fargo - Corporate Model
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationMicrosoft Developer Day
Microsoft Developer Day Pradeep Menon Microsoft Developer Day Solutions Architect Agenda Microsoft Developer Day Traditional Business Intelligence Architecture Structured Sources Extract Transform Structurize
More information5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992
2014-05-20 MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992 @SoQooL http://blog.mssqlserver.se Mattias.Lind@Sogeti.se 1 The evolution of the Microsoft data platform
More informationPřehled novinek v SQL Server 2016
Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing
More informationData Virtualization and the API Ecosystem
Data Virtualization and the API Ecosystem Working Together, These Two Technologies Enable Digital Transformation SOLUTION Data Virtualization for the API Ecosystem WEBSITE www.denodo.com PRODUCT OVERVIEW
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationInteractive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData
Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData ` Ronen Ovadya, Ofir Manor, JethroData About JethroData Founded 2012 Raised funding from Pitango in 2013 Engineering in Israel,
More informationIBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store
IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationComposite Software Data Virtualization The Five Most Popular Uses of Data Virtualization
Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software, Inc. June 2011 TABLE OF CONTENTS INTRODUCTION... 3 DATA FEDERATION... 4 PROBLEM DATA CONSOLIDATION
More informationREGULATORY REPORTING FOR FINANCIAL SERVICES
REGULATORY REPORTING FOR FINANCIAL SERVICES Gordon Hughes, Global Sales Director, Intel Corporation Sinan Baskan, Solutions Director, Financial Services, MarkLogic Corporation Many regulators and regulations
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More informationIan Choy. Technology Solutions Professional
Ian Choy Technology Solutions Professional XML KPIs SQL Server 2000 Management Studio Mirroring SQL Server 2005 Compression Policy-Based Mgmt Programmability SQL Server 2008 PowerPivot SharePoint Integration
More informationSyncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET
SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationIBM Data Virtualization Manager for z/os Leverage data virtualization synergy with API economy to evolve the information architecture on IBM Z
IBM for z/os Leverage data virtualization synergy with API economy to evolve the information architecture on IBM Z IBM z Analytics Agenda Big Data vs. Dark Data Traditional Data Integration Mainframe Data
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationData Virtualization in the Time of Big Data
Data Virtualization in the Time of Big Data A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy September 2017 Sponsored by Copyright 2017 Cisco and/or
More informationIBM DB2 Analytics Accelerator use cases
IBM DB2 Analytics Accelerator use cases Ciro Puglisi Netezza Europe +41 79 770 5713 cpug@ch.ibm.com 1 Traditional systems landscape Applications OLTP Staging Area ODS EDW Data Marts ETL ETL ETL ETL Historical
More informationmarko.hotti@microsoft.com GARTNER MAGIC QUADRANT DW & BI Data Warehouse Database Management Systems Business Intelligence and Analytics Platforms * Disclaimer: Gartner does not endorse any vendor, product
More informationHeisenberg and the uncertainty laws of BI. Zoltan Vago, Senior DWH Consultant 03-June-2015
Heisenberg and the uncertainty laws of BI Zoltan Vago, Senior DWH Consultant zoltan.vago@teradata.com 03-June-2015 The uncerainty principle The more precisely the position of some particle is determined,
More informationOliver Engels & Tillmann Eitelberg. Big Data! Big Quality?
Oliver Engels & Tillmann Eitelberg Big Data! Big Quality? Sponsors help us to run this event! THX! You Rock! Sponsor Gold Sponsor Silver Sponsor Bronze Sponsor You Rock! Sponsor Session 13:45 Track 1 Das
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationSAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC
SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data
More informationMetadata and the Rise of Big Data Governance: Active Open Source Initiatives. October 23, 2018
Metadata and the Rise of Big Data Governance: Active Open Source Initiatives October 23, 2018 Today s speakers John Mertic, Director of Program Management, Linux Foundation David Radley, ODPi Egeria maintainer,
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationAyush Ganeriwal Senior Principal Product Manager, Oracle. Benjamin Perez-Goytia Principal Solution Architect A-Team, Oracle
Oracle Data Integration Platform A Cornerstone for Big Data Ayush Ganeriwal Senior Principal Product Manager, Oracle Benjamin Perez-Goytia Principal Solution Architect A-Team, Oracle Pencho Tzonev Head
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationIntelligent Caching in Data Virtualization Recommended Use of Caching Controls in the Denodo Platform
Data Virtualization Intelligent Caching in Data Virtualization Recommended Use of Caching Controls in the Denodo Platform Introduction Caching is one of the most important capabilities of a Data Virtualization
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationTaming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems
1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationData Virtualization for the Enterprise
Data Virtualization for the Enterprise New England Db2 Users Group Meeting Old Sturbridge Village, 1 Old Sturbridge Village Road, Sturbridge, MA 01566, USA September 27, 2018 Milan Babiak Client Technical
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationAccelerating BI on Hadoop: Full-Scan, Cubes or Indexes?
White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more
More informationTop Five Reasons for Data Warehouse Modernization Philip Russom
Top Five Reasons for Data Warehouse Modernization Philip Russom TDWI Research Director for Data Management May 28, 2014 Sponsor Speakers Philip Russom TDWI Research Director, Data Management Steve Sarsfield
More informationWhat's New in SAS Data Management
Paper SAS1390-2015 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC ABSTRACT The latest releases of SAS Data Integration Studio and DataFlux Data Management Platform provide
More informationBuilding a Data Strategy for a Digital World
Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service
More informationETL is No Longer King, Long Live SDD
ETL is No Longer King, Long Live SDD How to Close the Loop from Discovery to Information () to Insights (Analytics) to Outcomes (Business Processes) A presentation by Brian McCalley of DXC Technology,
More informationRDP203 - Enhanced Support for SAP NetWeaver BW Powered by SAP HANA and Mixed Scenarios. October 2013
RDP203 - Enhanced Support for SAP NetWeaver BW Powered by SAP HANA and Mixed Scenarios October 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making
More informationHow to integrate data into Tableau
1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service
More informationCloud Computing Private Cloud
Cloud Computing Private Cloud Amplifying Business Value thru IT Ivo Sladoljev, Territory Manager, Adriatic Region December, 2010. 2010 VMware Inc. All rights reserved Agenda Company Facts VMware Focus
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationApril Copyright 2013 Cloudera Inc. All rights reserved.
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on
More informationSwimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad
Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools
More informationGOVERNING HADOOP (AND THE DATA LAKE)
GOVERNING HADOOP (AND THE DATA LAKE) DAMA-RMC Discussion Lowell W. Fryman, CBIP-CDMP Practice Principle lowell.fryman@collibra.com April 20, 2017 2017 Collibra Inc DAMA-RMC Discussion Agenda Do we need
More informationAzure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region
Azure DevOps Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region What is DevOps? People. Process. Products. Build & Test Deploy DevOps is the union of people, process, and products to
More informationEnterprise Recording and Live Streaming Architecture with VBrick
Enterprise Recording and Live Streaming Architecture with VBrick Terry French Technical Manager - International - VBrick Systems Inc BRKCOL-2111 Agenda Enterprise Video Overview VBrick Core Components
More informationSchwan Food Company s Journey with SAP HANA
Speakers: Schwan Food Company s Journey with SAP HANA May 14, 2013 From Vision of SAP HANA to EDW on SAP HANA Al Grube Enterprise Information Architect The Schwan Food Company Al.Grube@schwans.com Mark
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationMagento U. Getting Started with Magento Business Intelligence Essentials
Magento U Getting Started with Magento Business Intelligence Essentials Leah Ard Solutions Architect, Magento Business Intelligence Nate Golubiewski Solutions Consultant, Magento Agenda Overview: Magento
More informationHow to Accelerate Merger and Acquisition Synergies
How to Accelerate Merger and Acquisition Synergies MERGER AND ACQUISITION CHALLENGES Mergers and acquisitions (M&A) occur frequently in today s business environment; $3 trillion in 2017 alone. 1 M&A enables
More informationDesigning a Modern Data Warehouse + Data Lake
Designing a Modern Warehouse + Lake Strategies & architecture options for implementing a modern data warehousing environment Melissa Coates Analytics Architect, SentryOne Blog: sqlchick.com Twitter: @sqlchick
More informationBIG DATA ANALYTICS A PRACTICAL GUIDE
BIG DATA ANALYTICS A PRACTICAL GUIDE STEP 1: GETTING YOUR DATA PLATFORM IN ORDER Big Data Analytics A Practical Guide / Step 1: Getting your Data Platform in Order 1 INTRODUCTION Everybody keeps extolling
More informationLow Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR
Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Table of Contents Foreword... 2 New Era of Rapid Data Warehousing... 3 Eliminating Slow Reporting and Analytics Pains... 3 Applying 20 Years
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationAzure Data Lake Store
Azure Data Lake Store Analytics 101 Kenneth M. Nielsen Data Solution Architect, MIcrosoft Our Sponsors About me Kenneth M. Nielsen Worked with SQL Server since 1999 Data Solution Architect at Microsoft
More information