YOU SUN JEONG DATA ANALYTICS WITH DRUID
|
|
- Chester Underwood
- 5 years ago
- Views:
Transcription
1 YOU SUN JEONG DATA ANALYTICS WITH DRUID
2 2 WHO AM I? Senior Software Engineer of SK Telecom Commercial Products Big Data Discovery Solution (~ 16) Hadoop DW (~ 15) PaaS(CloudFoundry) (~ 13) Iaas (OpenStack) (~ 13) Mail to : jerryjung@apache.org
3 3 FOOTPRINTS Big Data Discovery - Streaming Processing Hadoop DW - Realtime NW Analytics 2014
4 4 AGENDA History What is Druid? Druid Architecture Real-Time Ingestion Demo (15m) Cohort Analysis (15m)
5 5 HISTORY Development started at Meta markets in 2011 Apache V2 in early contributors today
6 6 DATA LAKE
7 7 DW VS DATA LAKE
8 8 WHAT IS DRUID Distributed, In-memory Multi-dimensional OLAP store
9 9 PROBLEMS timestamp domain user gender clicked T00:01:35Z bieber.com Female T00:03:03Z bieber.com Female T00:04:51Z ultra.com Male T00:05:33Z ultra.com Male T00:05:53Z ultra.com Female T00:06:17Z ultra.com Female T00:23:15Z bieber.com Female T00:38:51Z ultra.com Male T00:49:33Z bieber.com Female T00:49:53Z ultra.com Female 0 timestamp impressions clicks T00:00:00Z 10 6 timestamp domain user gender clicked T00:01:35Z bieber.com Female T00:03:03Z bieber.com Female T00:04:51Z ultra.com Male T00:05:33Z ultra.com Male T00:05:53Z ultra.com Female T00:06:17Z ultra.com Female T00:23:15Z bieber.com Female T00:38:51Z ultra.com Male T00:49:33Z bieber.com Female T00:49:53Z ultra.com Female 0 timestamp domain gender impressions clicks T00:00:00Z bieber.com Female T00:00:00Z ultra.com Female T00:00:00Z ultra.com Male 3 2
10 10 BIG DATA DISCOVERY Roll-up Summarizing over a dimension Drill-down Focusing (zooming in) Slicing and dicing Reducing dimensions (slice) Picking values of specific dimensions (dice) Pivoting Rotating multi-dimensional cube
11 11 OLAP CUBE Slice and Dice
12 12 IN-MEMORY
13 13 COLUMNAR STORAGE
14 14 DRUID TERMS Data Timestamp Dimension Metric Datasource Segment Granularity
15 15 DRUID ARCHITECTURE REALTIME BROKER HISTORICAL
16 16 ARCHITECTURE - BATCH INGESTION Segments HISTORICAL HDFS HISTORICAL BROKER Queries HISTORICAL
17 17 ARCHITECTURE - STREAMING INGESTION Segments HISTORICAL Streaming REALTIME HISTORICAL BROKER Queries HISTORICAL
18 18 ARCHITECTURE - LAMBDA HDFS Segments HISTORICAL Streaming REALTIME HISTORICAL BROKER Queries HISTORICAL
19 19 GLUE ARCHITECTURE Kafka Indexing Service HISTORICAL Streaming Segments STREAM PROCESSOR (TRANQUILITY) REAL TIME TASK HISTORICAL BROKER Queries HISTORICAL
20 20 REAL WORLD ARCHITECTURE MYSQL COORDI NATOR MEMCACHED #1 HISTORICAL #1 DATA #1 HA PROXY BROKER #1 BROKER #1 MEMCACHED #2 MEMCACHED #3 HISTORICAL #N DATA #N ZK1 ZK2 ZK3 OVERLORD MIDDLE MANAGE #1 MIDDLE MANAGE #N
21 21 DRUID MONITORING
22 22 DRUID DATASOURCE
23 23 RDRUID
24 24 PYDROID
25 25 DEMO Jupyter Notebook(PyDruid) Mobile App User Events for 1 week : 2 billion events Scenario : Unique users Cohort Analysis
26 DEMO
27 27 MAY THE FORCE BE WITH YOU
28 28 REFERENCES Druid : ( : Cohort Analysis : Druid Meetup@Seoul :
29 29 POPIT
30 30 Q&A THANK YOU
Druid Power Interactive Applications at Scale. Jonathan Wei Software Engineer
Druid Power Interactive Applications at Scale Jonathan Wei Software Engineer History & Motivation Demo Overview Storage Internals Druid Architecture Motivation Motivation Visibility and analysis for complex
More informationNew Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply
New Data Architectures For Netflow Analytics NANOG 74 Fangjin Yang - Cofounder @ Imply The Problem Comparing technologies Overview Operational analytic databases Try this at home The Problem Netflow data
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationCSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!
CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News! Make-up class on Saturday, Mar 9 in Gleacher 203 10:30am 1:30pm.! Last 15 minute in-class quiz (6:30pm) on Mar 5.! Covers
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationDecision Support Systems aka Analytical Systems
Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis
More informationUsing Druid and Apache Hive
3 Using Druid and Apache Hive Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents Accelerating Hive queries using Druid... 3 How Druid indexes Hive data... 3 Transform Apache Hive Data to
More informationData contains value and knowledge
Data contains value and knowledge What is the purpose of big data systems? To support analysis and knowledge discovery from very large amounts of data But to extract the knowledge data needs to be Stored
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection
More informationTime Series Storage with Apache Kudu (incubating)
Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry
More informationData Architectures in Azure for Analytics & Big Data
Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A
More informationEvolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo
Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationBringing Data to Life
Bringing Data to Life Data management and Visualization Techniques Benika Hall Rob Harrison Corporate Model Risk March 16, 2018 Introduction Benika Hall Analytic Consultant Wells Fargo - Corporate Model
More informationCHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI
CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)
More informationQUALITY MONITORING AND
BUSINESS INTELLIGENCE FOR CMS DATA QUALITY MONITORING AND DATA CERTIFICATION. Author: Daina Dirmaite Supervisor: Broen van Besien CERN&Vilnius University 2016/08/16 WHAT IS BI? Business intelligence is
More informationDruid Data Ingest. Wayne M Adams Data Science and Business Analy7cs Meetup 26 February 2014
Druid Data Ingest Wayne M Adams Data Science and Business Analy7cs Meetup 26 February 2014 By Druid, we mean The column- oriented, distributed, real- 7me analy7c datastore (hjp://druid.io/ and hjps://github.com/metamx/druid)
More informationJupyter and Spark on Mesos: Best Practices. June 21 st, 2017
Jupyter and Spark on Mesos: Best Practices June 2 st, 207 Agenda About me What is Spark & Jupyter Demo How Spark+Mesos+Jupyter work together Experience Q & A About me Graduated from EE @ Tsinghua Univ.
More informationETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationMaking Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST
Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0 WEBINAR MAY 15 th, 2018 1PM EST 10AM PST Welcome and Logistics If you have problems with the sound on your computer, switch
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationInteractive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData
Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData ` Ronen Ovadya, Ofir Manor, JethroData About JethroData Founded 2012 Raised funding from Pitango in 2013 Engineering in Israel,
More informationDATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision
More informationStreaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_
Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_ About Us At GetInData, we build custom Big Data solutions Hadoop, Flink, Spark, Kafka and more Our team is today represented
More informationElasticsearch. Presented by: Steve Mayzak, Director of Systems Engineering Vince Marino, Account Exec
Elasticsearch Presented by: Steve Mayzak, Director of Systems Engineering Vince Marino, Account Exec What about Elasticsearch the Company?! Support 100s of Companies in Production environments Training
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationPNDA.io: when BGP meets Big-Data
PNDA.io: when BGP meets Big-Data Let s go back in time 26 th April 2017 The Internet is very much alive Millions of BGP events occurring every day 15 Routers Monitored 410 active peers (both IPv4 and IPv6)
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationData Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues. Slides by Michael Hahsler
Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues Slides by Michael Hahsler Data Mining & Analytics Analytics is the discovery and communication of meaningful
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationOrchestration of Data Lakes BigData Analytics and Integration. Sarma Sishta Brice Lambelet
Orchestration of Data Lakes BigData Analytics and Integration Sarma Sishta Brice Lambelet Introduction The Five Megatrends Driving Our Digitized World And Their Implications for Distributed Big Data Management
More informationStep-by-step data transformation
Step-by-step data transformation Explanation of what BI4Dynamics does in a process of delivering business intelligence Contents 1. Introduction... 3 Before we start... 3 1 st. STEP: CREATING A STAGING
More informationHOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS. Mark Brooks - Principal System Kinetica May 09, 2017
HOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS Mark Brooks - Principal System Engineer @ Kinetica May 09, 2017 The Challenge: How to maintain analytic performance while dealing with: Larger
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationDrawing the Big Picture
Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationData Ingestion at Scale. Jeffrey Sica
Data Ingestion at Scale Jeffrey Sica ARC-TS @jeefy Overview What is Data Ingestion? Concepts Use Cases GPS collection with mobile devices Collecting WiFi data from WAPs Sensor data from manufacturing machines
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationPřehled novinek v SQL Server 2016
Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationMaking the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor
Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationBig Data Facebook
Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More information#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.
Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system
More informationData Lake Best Practices
Data Lake Best Practices Agenda Why Data Lake Key Components of a Data Lake Modern Data Architecture Some Best Practices Case Study Summary Takeaways What is a Data Lake? What, why etc. What is a data
More informationAn Information Asset Hub. How to Effectively Share Your Data
An Information Asset Hub How to Effectively Share Your Data Hello! I am Jack Kennedy Data Architect @ CNO Enterprise Data Management Team Jack.Kennedy@CNOinc.com 1 4 Data Functions Your Data Warehouse
More informationData Warehousing & OLAP
CMPUT 391 Database Management Systems Data Warehousing & OLAP Textbook: 17.1 17.5 (first edition: 19.1 19.5) Based on slides by Lewis, Bernstein and Kifer and other sources University of Alberta 1 Why
More informationCHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)
CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationApache Kylin. OLAP on Hadoop
Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite
More informationThe University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory
Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationDistributed systems for stream processing
Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine
More informationOskari Heikkinen. New capabilities of Azure Data Factory v2
Oskari Heikkinen New capabilities of Azure Data Factory v2 Oskari Heikkinen Lead Cloud Architect at BIGDATAPUMP Microsoft P-TSP Azure Advisors Numerous projects on Azure Worked with Microsoft Data Platform
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationMicrosoft Developer Day
Microsoft Developer Day Pradeep Menon Microsoft Developer Day Solutions Architect Agenda Microsoft Developer Day Traditional Business Intelligence Architecture Structured Sources Extract Transform Structurize
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More information12 June 2015, Free University of Bozen-Bolzano. Building Large Scale Recommender Systems. Omar Moling
12 June 2015, Free University of Bozen-Bolzano Building Large Scale Recommender Systems Omar Moling AGENDA Intro Data Algorithms Systems 42MATTERS FACTS Founded in 2011 in Zurich, Switzerland Very strong
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationHolodesk A distributed in-memory columnar store for interactive analysis
Holodesk A distributed in-memory columnar store for interactive analysis 张常淳星环科技 www.transwarp.io 05-7-9 www.transwarp.io OUTLINE Overview Architecture Optimization Technique Update & Delete 05-7-9 www.transwarp.io
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationMariaDB MaxScale 2.0 and ColumnStore 1.0 for the Boston MySQL Meetup Group Jon Day, Solution Architect - MariaDB
MariaDB MaxScale 2.0 and ColumnStore 1.0 for the Boston MySQL Meetup Group Jon Day, Solution Architect - MariaDB 2016 MariaDB Corporation Ab 1 Tonight s Topics: MariaDB MaxScale 2.0 Currently in Beta MariaDB
More informationData Warehouse Design Decisions
Data Warehouse Design Decisions August 2015 Colleen Barnitz Director, IT Development MVT Services Colleen Barnitz over 20 Years in IT worked with SQL Server since version 6.5 developer and an architect
More informationAgenda. Spark Platform Spark Core Spark Extensions Using Apache Spark
Agenda Spark Platform Spark Core Spark Extensions Using Apache Spark About me Vitalii Bondarenko Data Platform Competency Manager Eleks www.eleks.com 20 years in software development 9+ years of developing
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationData Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,
More informationAzure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD
Azure Data Factory VS. SSIS Reza Rad, Consultant, RADACAD 2 Please silence cell phones Explore Everything PASS Has to Offer FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS VOLUNTEERING OPPORTUNITIES
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationData Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes
More informationCarbonData : An Indexed Columnar File Format For Interactive Query HUAWEI TECHNOLOGIES CO., LTD.
CarbonData : An Indexed Columnar File Format For Interactive Query HUAWEI TECHNOLOGIES CO., LTD. Outline u Motivation : Why introducing a new file format? u CarbonData Deep Dive u Tuning Hint 2 Big Data
More informationData Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20
Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationIOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK
IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK DR.KONSTANTIN BOUDNIK EPAM SYSTEMS CHIEF TECHNOLOGIST BIGDATA, OPEN SOURCE
More information#MicroFocusCyberSummit
#MicroFocusCyberSummit Data Simplicity: ArcSight Data Platform enhances enterprise data via the Common Event Format Peter Titov Micro Focus #MicroFocusCyberSummit Agenda Usage Ingestion Management Solutions
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationOSIsoft Cloud Services Core Infrastructure for Developing Partner Applications
OSIsoft Cloud Services Core Infrastructure for Developing Partner Applications Presented by Laurent Garrigues, Gregg Le Blanc, Paul Kaiser Agenda Overview Platform Tour Demo Partner Preview Program Q&A
More informationRocky Mountain Technology Ventures
Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationData Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationMIS2502: Data Analytics The Information Architecture of an Organization. Jing Gong
MIS2502: Data Analytics The Information Architecture of an Organization Jing Gong gong@temple.edu http://community.mis.temple.edu/gong What Do You Do With Data? Gather Retrieve Interpret The Information
More informationIDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu
IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension
More informationPersonalizing Netflix with Streaming datasets
Personalizing Netflix with Streaming datasets Shriya Arora Senior Data Engineer Personalization Analytics @shriyarora What is this talk about? Helping you decide if a streaming pipeline fits your ETL problem
More information