How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony
|
|
- Magdalene Thornton
- 6 years ago
- Views:
Transcription
1 How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony Grant Parsamyan, Director of BI & Data Warehousing eharmony 1
2 Agenda Company Overview What is Big Data? Challenges Implementation Phase 1 Architecture 2
3 Company Overview eharmony was founded in 2000 and pioneered the use of relationship science to match singles seeking long-term relationships. Today the company offers a variety of relationship services in the United States, Canada, Australia, the United Kingdom and Brazil with members in more than 150 countries around the world. With more than 40 million registered users, eharmony s highly regarded singles matching service is a market leader in online relationships. On average, 542 eharmony members marry every day in the United States as a result of being matched in the site.* eharmony also operates Jazzed.com, casual and fun dating site where users can browse their matches directly. 3
4 Data Analytics Group Our team (DAG) is responsible for providing Business Analytics and reporting solutions to internal Business Users across all departments. Each person in the team is responsible for a specific business unit: Accounting, Finance, Marketing, Customer Care, Life Cycle Marketing and International. Very limited direct data access to business users. All the data is provided through Adhoc SQL and MicroStrategy reports. 4
5 Big Data Gartner 'Big Data' Is Only the Beginning of Extreme Information Management McKinsey & Company Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. 5
6 Big Data Event: JSON JavaScript Object Notation Widely hailed as the successor to XML in the browser, JSON aspires to be nothing more than a simple, and elegant data format for the exchange of information between the browser and server; and in doing this simple task it will usher in the next version of the World Wide Web itself. o JSON can be represented in two structures Object - Unordered set of name/value pairs Array - Ordered collection of values 6
7 Sample JSON event Context Changes Header 7
8 JSON rows as they appear in the database after being flattened out by Hparser CATEGORY ENTITY_ID ID PRODUCER EVENT_TIMESTAMP PROPERTY_NAME PROPERTY_NEW_VALUE PROPERTY _SOURCE a2547c49-6a75- qaasanswers.data.up singles-7-4c50-9ad4- date c7bc023447f QAAS 2/16/ :31 locale en_us CONTEXT qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].desc CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 site singles CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].ignored TRUE CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 type 7 CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].type MULTISELECT CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers {"type":7,"version":1} CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].answer [] CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].date CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 userid CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 version 1 CONTEXT 8
9 Sections in a JSON Changes contains list of variables that have changed which resulted in this event s generation Sample row where a User chose their desired age range for their match "changes":[{"name":"agerangemin","newvalue":18,"oldvalue":0},{"name":"agerangemax","newvalue":24,"oldvalue":0}] Context Provides contextual information to the changes such as User Id, User Name, etc. Sample row showing User s Name and Match details "context":{"userfirstname": John","userLocation": Santa Monica, CA","matchId":"353861","matchUserId":" "} Header Provides Header level information Sample header row "headers": {"id":"03c57fe3-21bd-4bde-8c5a-679b5fb3c38a","x-category":"mds_savematch.resource.post","xinstance":"matchdata01-i8","x-timestamp":" t00:46: " } 9
10 Challenges Millions of Events generated every hour as JSON files How to handle the large volume? No relational source database, how to process JSON? How do you create reporting that finds trends in that large amount of data? Quick turnaround time for prototypes Create a analytics stack that could process large amounts of data and have real time reporting. Achieve a 3 Weeks release cycle to provide reporting solutions on new event structure 10
11 Phase 1 - Duration : 3 Months Step 1: Processing the JSON event files each hour Step 2: Flattening the JSON events (most tricky) Step 4: Finding the relationships Step 5: Defining the Data Model Step 6 : ETL (Extract, Transform and Load) Step 7: Building MicroStrategy Reports and Dashboards Step 8 : Storing Historical Data/ Events 11
12 Step 1, 2 & 3: Reading, Flattening and Loading Events Events are stored in text file. Hparser & scripts process the files every hour, flattens each event into CSV files (also a Hive table) PWX HDFS plug-in is used to load the CSV rows into Netezza staging tables Using PowerCenter mapping properties are then changed become rows and Contextual Information in the event becomes columns 12
13 The Big Staging Table Contains all events Grows exponentially 200 million new rows per day : 30 Billion so far Current Size: 1.2 TB with 4x Compression Basis for the whole Data Model Needs to be archived 13
14 Finding Relationships Top Down Approach Get the Business Reporting Requirements Analyze the Flattened events in Hadoop Write Adhoc Hive queries directly on HDFS or Netezza staging tables Outline the findings and define the relationships Define the Data Model 14
15 Data Model Define Logical Data Model based on: Business and Analytics Requirements Relationships and Findings from the last step Tips and Tricks o Only Define/Build what is needed for Reporting and Analytics, don t model anything you don t need right away o Easy to get lost in the amount of information o Keep it simple 15
16 ETL Pass Logical Data Model and Relationships on to ETL team PowerCenter reads the files in HDFS and loads into the individual tables using PWX HDFS plug-in Data is loaded hourly and nightly Goal: To process with in 2 hours, from the time event is fired to the data in tables. 16
17 Reporting Keep the Reporting Requirements in mind Define MicroStrategy Architecture : Attributes/ Facts and Hierarchies Pass it on to team of BI Developers Build MicroStrategy Intelligent Cubes and Dashboards based on these cubes Triggers in place to run the Cubes hourly as soon as the data is updated in the tables 17
18 Storing Historical Data Processed event logs are stored in local HDFS (< 1 year) and ins S3 for long term storage Data can be reprocessed from the JSON event files in case an unused event has to be analyzed 18
19 Flow of Events : NFS HDFS Netezza Amazon S3 Oracle Event Server Network Drive Hadoop Copy Parse JSON s in Informatica HParser Hive Staging Table Informatica PowerCenter Grid with PWX for HDFS In-house Hadoop Cluster MicroStrategy Reports Netezza 19
20 High Level Systems Overview & Data Flow 20
21 HParser How Does It Work? hadoop dt-hadoop.jar My_Parser /input/*/input*.txt 1. Define JSON parser in HParser visual studio 2. Deploy the parser on Hadoop Distributed File System (HDFS) 3. Run HParser to extract data from JSON, flatten, and stage in Hadoop 21
22 Sample JSON to CSV Transformation in DT 22
23 Sample mapping that reads Hparser output to Netezza HDFS Application Connection Sample workflow that calls a Hparser script and parses the output data into Netezza 23
24 Workflow Controlled by Informatica Informatica HParser Staging Table Informatica PowerCenter Netezza 24
25 Next Steps Phase 1 was about capturing huge volumes of data and creating MSTR architecture, Operational reports and dashboards. Phase 2: Provide concise analytics anywhere and anytime 25
26 Business Benefit Have a scalable infrastructure Adding additional ETL and analytical capabilities without increasing overhead Creating an agile environment to keep up with business expectations (2 to 3 day turnaround for new data) 26
27 Thank You 27
Modernizing Business Intelligence and Analytics
Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from
More informationSAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine
SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationEvolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo
Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011
More informationGuide Users along Information Pathways and Surf through the Data
Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise
More informationMicroStrategy Academic Program
MicroStrategy Academic Program Creating a center of excellence for enterprise analytics and mobility. HOW TO DEPLOY ENTERPRISE ANALYTICS AND MOBILITY ON AWS APPROXIMATE TIME NEEDED: 1 HOUR In this workshop,
More informationJAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.
JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationOwn change. TECHNICAL WHITE PAPER Data Integration With REST API
TECHNICAL WHITE PAPER Data Integration With REST API Real-time or near real-time accurate and fast retrieval of key metrics is a critical need for an organization. Many times, valuable data are stored
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationAzure Data Factory. Data Integration in the Cloud
Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and
More informationCloudExpo November 2017 Tomer Levi
CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes
More information#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.
Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data
More informationHow to Write Data to HDFS
How to Write Data to HDFS 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior
More informationThis document contains information on fixed and known limitations for Test Data Management.
Informatica LLC Test Data Management Version 10.1.0 Release Notes December 2016 Copyright Informatica LLC 2003, 2016 Contents Installation and Upgrade... 1 Emergency Bug Fixes in 10.1.0... 1 10.1.0 Fixed
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationINDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team
INDEPTH Network Introduction to ETL Tathagata Bhattacharjee ishare2 Support Team Data Warehouse A data warehouse is a system used for reporting and data analysis. Integrating data from one or more different
More informationMS-55045: Microsoft End to End Business Intelligence Boot Camp
MS-55045: Microsoft End to End Business Intelligence Boot Camp Description This five-day instructor-led course is a complete high-level tour of the Microsoft Business Intelligence stack. It introduces
More informationInstructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e
ABSTRACT Pentaho Business Analytics from different data source, Analytics from csv/sql,create Star Schema Fact & Dimension Tables, kettle transformation for big data integration, MongoDB kettle Transformation,
More informationQLIK INTEGRATION WITH AMAZON REDSHIFT
QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationBig Data Facebook
Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale
More informationMaking the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor
Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,
More informationOracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service
Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationData warehousing on Hadoop. Marek Grzenkowicz Roche Polska
Data warehousing on Hadoop Marek Grzenkowicz Roche Polska Agenda Introduction Case study: StraDa project Source data Data model Data flow and processing Reporting Lessons learnt Ideas for the future Q&A
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationCase Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster
Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster CASE STUDY: TATA COMMUNICATIONS 1 Ten years ago, Tata Communications,
More informationAUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved
AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH 2019 SNOWFLAKE Our vision Allow our customers to access all their data in one place so they can make actionable decisions anytime, anywhere, with any number
More informationTechnical Sheet NITRODB Time-Series Database
Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationOracle Hyperion Tips and Tricks. NEOAUG Eric Sanders, Gordon Strodel Monday, October 22, 2012
Oracle Hyperion 11.1.2.2 Tips and Tricks NEOAUG Eric Sanders, Gordon Strodel Monday, October 22, 2012 Agenda About Archetype What s New in 11.1.2.2: New User Interface Calculation Manager Manage Substitution
More informationBull Fast Track/PDW and Big Data
Bull Fast Track/PDW and Big Data Add High Performance BI to your Big Data Roger Van Unen Expert Microsoft / BI roger.van-unen@bull.net http://www.bull.fr/bi/fastrack.html Michael Schmitter BI Sales Germany
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationEnterprise Data Warehousing
Enterprise Data Warehousing SQL Server 2005 Ron Dunn Data Platform Technology Specialist Integrated BI Platform Integrated BI Platform Agenda Can SQL Server cope? Do I need Enterprise Edition? Will I avoid
More informationIBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse
IBM dashdb Local Using a software-defined environment in a private cloud to enable hybrid data warehousing Evolving the data warehouse Managing a large-scale, on-premises data warehouse environments to
More informationComposite Software Data Virtualization The Five Most Popular Uses of Data Virtualization
Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software, Inc. June 2011 TABLE OF CONTENTS INTRODUCTION... 3 DATA FEDERATION... 4 PROBLEM DATA CONSOLIDATION
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationDepartment of Computer Engineering 1, 2, 3, 4,5
Components for writing Parquet Format Files Manas Rathi 1, Pratik Jagtap 2, Pranali Jain 3, Anisha Jain 4, Prof. Subhash Tatale 5 1, 2, 3, 4,5 Department of Computer Engineering 1, 2, 3, 4,5 Vishwakarma
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationPřehled novinek v SQL Server 2016
Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing
More informationPagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB
Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely is the market leader in managed WordPress hosting, and an AWS Advanced Technology, SaaS, and Public
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationMicrosoft End to End Business Intelligence Boot Camp
Microsoft End to End Business Intelligence Boot Camp 55045; 5 Days, Instructor-led Course Description This course is a complete high-level tour of the Microsoft Business Intelligence stack. It introduces
More informationEtlworks Integrator cloud data integration platform
CONNECTED EASY COST EFFECTIVE SIMPLE Connect to all your APIs and data sources even if they are behind the firewall, semi-structured or not structured. Build data integration APIs. Select from multiple
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationAccelerating BI on Hadoop: Full-Scan, Cubes or Indexes?
White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationImproving the ROI of Your Data Warehouse
Improving the ROI of Your Data Warehouse Many organizations are struggling with a straightforward but challenging problem: their data warehouse can t affordably house all of their data and simultaneously
More informationOracle Enterprise Manager 12c IBM DB2 Database Plug-in
Oracle Enterprise Manager 12c IBM DB2 Database Plug-in May 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and
More informationAlexander Klein. #SQLSatDenmark. ETL meets Azure
Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &
More informationCapability White Paper Straight-Through-Processing (STP)
Capability White Paper Straight-Through-Processing (STP) Drag-and-drop to create automated, repeatable, flexible and powerful data flow and application logic orchestration without programming to support
More informationWhat is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE
What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE About me Freelancer since 2010 Consulting and development Oracle databases APEX BI Blog: APEX-AT-WORK Twitter: @tobias_arnhold - Oracle ACE Associate
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationNetezza The Analytics Appliance
Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for
More informationAWS Serverless Architecture Think Big
MAKING BIG DATA COME ALIVE AWS Serverless Architecture Think Big Garrett Holbrook, Data Engineer Feb 1 st, 2017 Agenda What is Think Big? Example Project Walkthrough AWS Serverless 2 Think Big, a Teradata
More informationWHAT S NEW IN SQL SERVER 2016 REPORTING SERVICES?
WHAT S NEW IN SQL SERVER 2016 REPORTING SERVICES? Timothy P. McAliley CISA, CISM, CISSP, ITIL V3, MCSA, MCSE, MCT, PMP Microsoft Account Technology Strategist Try It Yourself! Two TechNet Virtual Labs
More informationThere s no data like more data. Theo Vassilakis, Founder and CEO
There s no data like more data Theo Vassilakis, Founder and CEO 1 A bit about Theo 2 2014 METANAUTIX. Detecting Failures in Utility Pipes Processing 3-D scans of pipes using SQL pipelines Metadata Points
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationStreaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_
Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_ About Us At GetInData, we build custom Big Data solutions Hadoop, Flink, Spark, Kafka and more Our team is today represented
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationFAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide
FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationFast Innovation requires Fast IT
Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:
More informationMicroStrategy Desktop MicroStrategy 10.2: New features overview. microstrategy.com 1
MicroStrategy Desktop 10.2 MicroStrategy 10.2: New features overview. microstrategy.com 1 TABLE OF CONTENTS MicroStrategy Desktop 10.2 Easier integration of custom visualizations 3 BETA Dashboard annotation
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationEgypt s Bavarian Auto Group Deploys SAP On SQL Server 2005 to Support Rapid Growth
Microsoft SQL Server Customer Solution Case Study Egypt s Bavarian Auto Group Deploys SAP On SQL Server 2005 to Support Rapid Growth Overview Country or Region: Egypt Industry: Manufacturing Automotive
More informationActian Vector Benchmarks. Cloud Benchmarking Summary Report
Actian Vector Benchmarks Cloud Benchmarking Summary Report April 2018 The Cloud Database Performance Benchmark Executive Summary The table below shows Actian Vector as evaluated against Amazon Redshift,
More informationAnswer: A Reference:http://www.vertica.com/wpcontent/uploads/2012/05/MicroStrategy_Vertica_12.p df(page 1, first para)
1 HP - HP2-N44 Selling HP Vertical Big Data Solutions QUESTION: 1 When is Vertica a better choice than SAP HANA? A. The customer wants a closed ecosystem for BI and analytics, and is unconcerned with support
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationDURATION : 03 DAYS. same along with BI tools.
AWS REDSHIFT TRAINING MILDAIN DURATION : 03 DAYS To benefit from this Amazon Redshift Training course from mildain, you will need to have basic IT application development and deployment concepts, and good
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationHadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop
Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce
More informationMassively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data
Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality
More informationHigh-Performance Distributed DBMS for Analytics
1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationAcquiring Big Data to Realize Business Value
Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways
More informationAnalyze Big Data Faster and Store It Cheaper
Analyze Big Data Faster and Store It Cheaper Dr. Steve Pratt, CenterPoint Russell Hull, SAP Public About CenterPoint Energy, Inc. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data
More informationCERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)
CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program
More informationOracle Enterprise Manager 12c Sybase ASE Database Plug-in
Oracle Enterprise Manager 12c Sybase ASE Database Plug-in May 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only,
More informationAfter completing this course, participants will be able to:
Designing a Business Intelligence Solution by Using Microsoft SQL Server 2008 T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s i n - d e p t h k n o w l e d g e o n d e s
More informationImplement a Data Warehouse with Microsoft SQL Server
Implement a Data Warehouse with Microsoft SQL Server 20463D; 5 days, Instructor-led Course Description This course describes how to implement a data warehouse platform to support a BI solution. Students
More informationELTMaestro for Spark: Data integration on clusters
Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationTop Five Reasons for Data Warehouse Modernization Philip Russom
Top Five Reasons for Data Warehouse Modernization Philip Russom TDWI Research Director for Data Management May 28, 2014 Sponsor Speakers Philip Russom TDWI Research Director, Data Management Steve Sarsfield
More informationRIPE NCC Routing Information Service (RIS)
RIPE NCC Routing Information Service (RIS) Overview Colin Petrie 14/12/2016 RON++ What is RIS? What is RIS? Worldwide network of BGP collectors Deployed at Internet Exchange Points - Including at AMS-IX
More informationHow to choose the right approach to analytics and reporting
SOLUTION OVERVIEW How to choose the right approach to analytics and reporting A comprehensive comparison of the open source and commercial versions of the OpenText Analytics Suite In today s digital world,
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More information