Drawing the Big Picture

Similar documents
Top Five Reasons for Data Warehouse Modernization Philip Russom

Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST

WHERE HADOOP FITS IN YOUR DATA WAREHOUSE ARCHITECTURE

Modernize Data Warehousing

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Modern Data Warehouse The New Approach to Azure BI

Evolving To The Big Data Warehouse

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Přehled novinek v SQL Server 2016

Heisenberg and the uncertainty laws of BI. Zoltan Vago, Senior DWH Consultant 03-June-2015

Designing a Modern Data Warehouse + Data Lake

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

@Pentaho #BigDataWebSeries

Full file at

Microsoft Analytics Platform System (APS)

Shine a Light on Dark Data with Vertica Flex Tables

Data Warehouse and Data Mining

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Data Analytics at Logitech Snowflake + Tableau = #Winning

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract

How to integrate data into Tableau

Modernizing Business Intelligence and Analytics

Overview of Data Services and Streaming Data Solution with Azure

Capture Business Opportunities from Systems of Record and Systems of Innovation

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

5 Fundamental Strategies for Building a Data-centered Data Center

April Copyright 2013 Cloudera Inc. All rights reserved.

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

THE RISE OF. The Disruptive Data Warehouse

An InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

A Guide to Best Practices

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

Big Data with Hadoop Ecosystem

Introduction to Data Science

Information empowerment for your evolving data ecosystem

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

AnAlytic DAtAbAses for big DAtA

IT directors, CIO s, IT Managers, BI Managers, data warehousing professionals, data scientists, enterprise architects, data architects

Data Warehousing in the Age of In-Memory Computing and Real-Time Analytics. Erich Schneider, Daniel Rutschmann June 2014

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Hype Cycle for Data Warehousing, 2003

Talend Spark Meetup. Edward Ost Talend

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Oracle Big Data Connectors

Enterprise Data Management in an In-Memory World

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Fast Innovation requires Fast IT

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

Stages of Data Processing

Building Next- GeneraAon Data IntegraAon Pla1orm. George Xiong ebay Data Pla1orm Architect April 21, 2013

Big Data The end of Data Warehousing?

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Data Warehouse and Data Mining

Data-Intensive Distributed Computing

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Virtuoso Infotech Pvt. Ltd.

Teradata Aggregate Designer

The Reality of Qlik and Big Data. Chris Larsen Q3 2016

RDP203 - Enhanced Support for SAP NetWeaver BW Powered by SAP HANA and Mixed Scenarios. October 2013

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

BIG DATA ANALYTICS A PRACTICAL GUIDE

Simplifying your upgrade and consolidation to BW/4HANA. Pravin Gupta (Teklink International Inc.) Bhanu Gupta (Molex LLC)

Chapter 6 VIDEO CASES

Data Science. Data Analyst. Data Scientist. Data Architect

#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.

From the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI

Business Intelligence and Decision Support Systems

Data Vault Brisbane User Group

Data Management Glossary

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Microsoft Developer Day

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK

Build a True Data Lake with a Cloud Data Warehouse A SINGLE SOURCE OF TRUTH THAT S SECURE, GOVERNED AND FAST

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

CHAPTER 3 Implementation of Data warehouse in Data Mining

WHITEPAPER. MemSQL Enterprise Feature List

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

Hybrid Data Platform

Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Demystifying Cloud Data Warehousing

BUILDING the VIRtUAL enterprise

Informatica Enterprise Information Catalog

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Netezza The Analytics Appliance

Data Warehouse Design Decisions

Enterprise Data Architecture: Why, What and How

After completing this course, participants will be able to:

Analytics in Action with Teradata In-Memory Optimizations

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Transcription:

Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015

Sponsor 2

Speakers Philip Russom TDWI Research Director, Data Management Imad Birouty Director, Technical Product Marketing, Teradata 3

Agenda The Mission Queries, analytics, and other BI that reach multiple warehouse and data platforms simultaneously Enabling Technologies Modern data warehouse environments (DWEs) Single-console tools Data exploration and discovery Standard SQL, but extended Grid, fabric, virtualization, logical DW Benefits of the single big picture New ways to view data and develop queries or analytics Simplification for architecture, governance, stewardship, compliance, auditing, security... Recommendations PLEASE TWEET @prussom, @Teradata, #TDWI, #Analytics, #Big Data

The Mission Redux Today s BI/DW/analytics demands: As much data as possible From more sources and source types In many structures or structure free Persisted on old and new data platform types Virtualized, as appropriate All the above, available all the time, for everyone We ve always aspired toward these goals: But success is more likely today, because we have better software, hardware, skills, best practices We also have better executive support Organizations want more business value from big data, new data, analytics, new data-driven business programs

Enablers for the Revised Mission New tool types and functions, plus their disciplines & practices Data exploration and data discovery More agile data preparation Data visualization ease of use, analytics, fun & compelling presentations, story telling New data platforms Hadoop, whether open source or vendor distro MPP RDBMSs, appliances & columnar Old skills and technologies, too SQL & other relational techs are as important as ever All the above, integrated and interoperable Single console or as few tools as possible Single access & query method SQL, but for any data, platform Data architecture to integrate the back end

DEFINITION Multi-Platform Data Warehouse Environments Many enterprise data warehouses (EDWs) are evolving into multi-platform data warehouse environments (DWEs). Users continue to add additional standalone data platforms to their warehouse tool and platform portfolio. The new platforms don t replace the core warehouse, because it is still the best platform for the data that goes into standards reports, dashboards, performance management, and OLAP. Instead, the new platforms complement the warehouse, because they are optimized for workloads that manage, process, and analyze new forms of big data, non-structured data, and real-time data.

Modern DW Architectures are Complex Tech stack for DW, BI, DI, & analytics has always been multi-platform environ. What s new? The trend toward a portfolio of many physical data platforms has accelerated. Logical architecture that integrates them is very important. Why do it? More platform types to serve more types of users, data & workloads. Over The Passage of Time Federated Data Federated Marts Data Federated Marts Data Marts Data Warehouse Star or Multi- Snowflake dimensional Scheme Data Models Customer Mart Customer or ODS Mart or ODS Data Staging Data Areas Staging Data Areas Staging Areas Metrics for Performance Mgt Real Time ODS OLAP Cubes OLAP DBMSs DW from a Merger Detailed Source Detailed Data Source Detailed Data Source Data Analytic Sand Box Data Federation & Virtualization Columnar DBMS Columnar DBMS DW Appliance DW Appliances Map Reduce Logical Data Warehouse Cloudbased DBMSs Hadoop Distributed Hadoop File Distributed Sys File Sys No-SQL Database No-SQL Database Complex, Event Processing Streaming Data Tools It s a logical and/or virtual layer of the DW architecture that complements the physical layer of architecture under it.

DEFINITIONS OF THE Logical Data Warehouse TDWI: A Data Warehouse is user-defined data architecture The architecture & its design components must be populated by data But the data can be physical, logical/virtual, or both So, most DW architectures have two key layers: physical & logical Gartner s view: A Logical DW depends on virtual tech From simple federation to object-oriented virtualization, plus virtual views, indices, semantics, server memory Building out the Logical Layer of your DW is important The logical layer enables cross-platform integration and interoperability, for broad queries, exploration, analytics

DEFINITIONS OF THE Logical Data Warehouse (LDW) The LDW layer provides a unified view (or a collection of views) of data in multiple platforms Plus a simplified (yet diverse & high-performance) collection of interfaces into such sources and targets to achieve interoperability, especially for queries The point of the LDW layer is to provide A fairly comprehensive big picture of data in the DWE A single layer through which data can be accessed, thereby reducing data redundancy, movement, processing A simplified view & related mechanisms that enable more user types Similar Concepts: Virtual DW (LDW is often partially virtual, but mostly physical) Real-Time DW, Operational DW, Active DW, Dynamic DW Query Grid, Data Grid, Data Fabric

NEW ARCHITECTURES Hadoop integrated with a Relational DBMS The strengths of one balance the weaknesses of the other A Relational DBMS is good at: Metadata management Complex query optimization Table joins, views, keys, etc. Security, including roles, directories HDFS & other Hadoop tools are good at: Massive, linear scalability Multi-structured & no-schema data Some ETL and ELT functions Custom code for algorithmic analytics Other platforms are also being tightly integrated w/relational DW Analytic DBMSs based on columnar, appliance, MapReduce, graph To make this integration of diverse data platforms practical Good design by users for the logical DW architectural layer Vendor tools that can reach all the above and more from one query

Importance of Data Exploration Exploring data is a first step to leveraging new data Never allow new data into a DW without proper vetting Assess value & use cases for new (big) data via exploration Exploring data is a prerequisite to analyzing data By its natural, analysis makes correlations across data of diverse sources, structures, subjects, and vintages Finding just the right combination for successful analysis depends on data exploration as a first step High ease of use for user productivity Some users are biz people who need biz friendly view Ease of use accelerates developers productivity, too Support for all data platforms, from relational to Hadoop A modern data exploration tool will merge diverse data via a single complex query A data exploration tool must do more than exploration Profile data to understand its content and condition Extract data, model the result set, index big data Deduce data s structure and develop metadata Perform tasks as you go, not ahead of time, for greater agility

ITERATIVE, FOUR-STEP PROCESS FOR Exploratory Analytics with New (Big) Data Visualize Explore Analyze Data Prep

A FEW REQUIREMENTS FOR Advanced Analytics Visualize Analyze ITERATIVE, FOUR-STEP PROCESS Explore Data Prep Market direction: Seamless integration In one tool environment, exploration, data prep, analysis, visualization, and more The iterative, four-step process of exploratory analytics demands tight tool integration Advanced forms of analytics Mining, predictive, statistics, NLP (not OLAP) Algorithmic, as well as query based Both canned and home-grown algorithms Tool should include library of pre-built algorithms Tool should also help you write your own High ease-of-use for broad collaboration Functions for both technical and business users Both develop analytic apps and consume them Assume that many user types will share their work

SQL is More Important than Ever Data professionals want and depend on SQL It must be ANSI standard, high performance, iterative, optimized Why? To leverage user skills and SQL-based tool portfolios SQL on Hadoop versus SQL off Hadoop argument Users interviewed want BOTH! In survey, SQL on Hadoop is a must have (69%) Only 4% don t need SQL on Hadoop Source: TDWI survey run in late 2014. Based 99 respondents.

SQL-Based Analytics Data Exploration = Ad-hoc queries on steroids A query grows in size, scope, and complexity with each iteration KLOCs = Thousands of Lines of [SQL] Code Whether tool-generated, hand-written, or both Complex SQL expresses many things Data access via many interfaces, near real time Data models, even dimensional ones Multi-way joins, but also complex transformations Growing number and diversity of users Data analysts, data scientists, BI/DW pros, business analysts All the above demand a hefty tool environ t As described on the next slide

SUMMARY & CONCLUSION: TOOLS AND REQUIREMENTS FOR Logical Data Warehousing and Other Complex Data Ecosystems Look for tools and environments that enable: Designing and architecting a big picture Interoperability among diverse systems and data types Data operations optimized across multiple platforms ANSI SQL support; performance for iterative queries Features that help with complex data architectures: Distributed queries, in the extreme High performance, even with multiple platforms Metadata management and metadata deduction Easy ingestion of new data, whether streaming or static Real-time indexing, to keep pace with data ingestion Single-sign-on security, despite multiple systems

RECOMMENDATIONS Draw the Big Picture for its Benefits Benefits of the unified big picture of data. New ways to view data & develop queries & analytics Simplification for data architecture, governance, stewardship, compliance, auditing, security... Revisit your mission as a data professional Tons of data, sources, and source types, in many structures (or structure free) persisted on old and new data platform types (virtualized, as appropriate) All the above, available all the time, for everyone Satisfy new requirements with tools/platforms that provide unified view Virtual DW and miscellaneous approaches to Real-Time DW Query Grid, Data Grid, Data Fabric Special functions: Hadoop, exploration, SQL-based analytics

Teradata QueryGrid Imad Birouty Director, Teradata Product Marketing

DATA MART EDW/IDW LOGICAL DATA WAREHOUSE Just Give Me Some Data and Fast! 1990 s Give Me Good Data But Do It Efficiently! 2000 s Give Me All Data Fast, Simple & Effectively! 2010 s 20

What s Different Today? There Is No Single Technology That Can Do Everything New types of data New economic models New sources of data Higher volume of data New technologies Increased prevalence of analytics 21

What s The Same Today? Users need access to all relevant data to make informed business decisions Users need timely access to data when they need it User skills and tools 22

Shift from a Single Platform to an Ecosystem "Logical" Data Warehouse We will abandon the old models based on the desire to implement for high-value analytic applications. 23

Not All Data Should Be Treated Equally Data of different value High value density ERP, CRM, Low value density Sensors, weblogs, social, Different processing techniques required Structured data SQL Multi-structured data SQL, NoSQL Different integration requirements Pre-define schema and integrated upon data acquisition (schemaon-write) Define schema during query runtime (schema-on-read) Regardless.data and analytics should be accessible 24

Data Fabric Enabled by QueryGrid Analytic Flexibility to meet your business needs Pick Your Best-of Breed Technology: Data types Analytic engines Economic options Run the right analytic on the right platform: Minimize data movement, process data where it resides Minimize data duplication Optimized work distribution through pushdown processing Bi-directional data movement Users direct their queries to a cohesive data fabric using existing SQL skills & tools Focus on data and business questions, not integrating separate systems 25

Teradata QueryGrid Demo

Metadata Goal: View Database in Hadoop HELP FOREIGN SERVER hdp21; 27 Teradata Confidential

Metadata Goal: View Tables in Hadoop HELP FOREIGN DATABASE "default"@hdp21; 28 Teradata Confidential

Metadata Goal: View Specific Table in Hadoop HELP FOREIGN TABLE "default".carpricedata@hdp21; 29

Querying Hadoop Table Goal: Select a Sample of Rows From a Hadoop Table SELECT * FROM sample_08@hdp21; 30

Multi-System Query For all cars that received warranty repair, find the reported Diagnostic Trouble Code Requires data from Hadoop and Teradata data warehouse Query passed through, data not persisted HADOOP RAW MULTI- STRUCTURED DATA Massive amounts of detailed sensor data Teradata QueryGrid TERADATA PRODUCTION DATA VINs Service records Warranty data DTC descriptions 31

32

Questions? 33

Contact Information If you have further questions or comments: Philip Russom, TDWI prussom@tdwi.org Imad Birouty, Teradata imad.birouty@teradata.com 34