How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony

Size: px
Start display at page:

Download "How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony"

Transcription

1 How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony Grant Parsamyan, Director of BI & Data Warehousing eharmony 1

2 Agenda Company Overview What is Big Data? Challenges Implementation Phase 1 Architecture 2

3 Company Overview eharmony was founded in 2000 and pioneered the use of relationship science to match singles seeking long-term relationships. Today the company offers a variety of relationship services in the United States, Canada, Australia, the United Kingdom and Brazil with members in more than 150 countries around the world. With more than 40 million registered users, eharmony s highly regarded singles matching service is a market leader in online relationships. On average, 542 eharmony members marry every day in the United States as a result of being matched in the site.* eharmony also operates Jazzed.com, casual and fun dating site where users can browse their matches directly. 3

4 Data Analytics Group Our team (DAG) is responsible for providing Business Analytics and reporting solutions to internal Business Users across all departments. Each person in the team is responsible for a specific business unit: Accounting, Finance, Marketing, Customer Care, Life Cycle Marketing and International. Very limited direct data access to business users. All the data is provided through Adhoc SQL and MicroStrategy reports. 4

5 Big Data Gartner 'Big Data' Is Only the Beginning of Extreme Information Management McKinsey & Company Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. 5

6 Big Data Event: JSON JavaScript Object Notation Widely hailed as the successor to XML in the browser, JSON aspires to be nothing more than a simple, and elegant data format for the exchange of information between the browser and server; and in doing this simple task it will usher in the next version of the World Wide Web itself. o JSON can be represented in two structures Object - Unordered set of name/value pairs Array - Ordered collection of values 6

7 Sample JSON event Context Changes Header 7

8 JSON rows as they appear in the database after being flattened out by Hparser CATEGORY ENTITY_ID ID PRODUCER EVENT_TIMESTAMP PROPERTY_NAME PROPERTY_NEW_VALUE PROPERTY _SOURCE a2547c49-6a75- qaasanswers.data.up singles-7-4c50-9ad4- date c7bc023447f QAAS 2/16/ :31 locale en_us CONTEXT qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date qaasanswers.data.up singles-7- date a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].desc CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 site singles CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].ignored TRUE CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 type 7 CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].type MULTISELECT CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers {"type":7,"version":1} CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].answer [] CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 useranswers[singles ].date CHANGE a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 userid CONTEXT a2547c49-6a75-4c50-9ad4-8c7bc023447f QAAS 2/16/ :31 version 1 CONTEXT 8

9 Sections in a JSON Changes contains list of variables that have changed which resulted in this event s generation Sample row where a User chose their desired age range for their match "changes":[{"name":"agerangemin","newvalue":18,"oldvalue":0},{"name":"agerangemax","newvalue":24,"oldvalue":0}] Context Provides contextual information to the changes such as User Id, User Name, etc. Sample row showing User s Name and Match details "context":{"userfirstname": John","userLocation": Santa Monica, CA","matchId":"353861","matchUserId":" "} Header Provides Header level information Sample header row "headers": {"id":"03c57fe3-21bd-4bde-8c5a-679b5fb3c38a","x-category":"mds_savematch.resource.post","xinstance":"matchdata01-i8","x-timestamp":" t00:46: " } 9

10 Challenges Millions of Events generated every hour as JSON files How to handle the large volume? No relational source database, how to process JSON? How do you create reporting that finds trends in that large amount of data? Quick turnaround time for prototypes Create a analytics stack that could process large amounts of data and have real time reporting. Achieve a 3 Weeks release cycle to provide reporting solutions on new event structure 10

11 Phase 1 - Duration : 3 Months Step 1: Processing the JSON event files each hour Step 2: Flattening the JSON events (most tricky) Step 4: Finding the relationships Step 5: Defining the Data Model Step 6 : ETL (Extract, Transform and Load) Step 7: Building MicroStrategy Reports and Dashboards Step 8 : Storing Historical Data/ Events 11

12 Step 1, 2 & 3: Reading, Flattening and Loading Events Events are stored in text file. Hparser & scripts process the files every hour, flattens each event into CSV files (also a Hive table) PWX HDFS plug-in is used to load the CSV rows into Netezza staging tables Using PowerCenter mapping properties are then changed become rows and Contextual Information in the event becomes columns 12

13 The Big Staging Table Contains all events Grows exponentially 200 million new rows per day : 30 Billion so far Current Size: 1.2 TB with 4x Compression Basis for the whole Data Model Needs to be archived 13

14 Finding Relationships Top Down Approach Get the Business Reporting Requirements Analyze the Flattened events in Hadoop Write Adhoc Hive queries directly on HDFS or Netezza staging tables Outline the findings and define the relationships Define the Data Model 14

15 Data Model Define Logical Data Model based on: Business and Analytics Requirements Relationships and Findings from the last step Tips and Tricks o Only Define/Build what is needed for Reporting and Analytics, don t model anything you don t need right away o Easy to get lost in the amount of information o Keep it simple 15

16 ETL Pass Logical Data Model and Relationships on to ETL team PowerCenter reads the files in HDFS and loads into the individual tables using PWX HDFS plug-in Data is loaded hourly and nightly Goal: To process with in 2 hours, from the time event is fired to the data in tables. 16

17 Reporting Keep the Reporting Requirements in mind Define MicroStrategy Architecture : Attributes/ Facts and Hierarchies Pass it on to team of BI Developers Build MicroStrategy Intelligent Cubes and Dashboards based on these cubes Triggers in place to run the Cubes hourly as soon as the data is updated in the tables 17

18 Storing Historical Data Processed event logs are stored in local HDFS (< 1 year) and ins S3 for long term storage Data can be reprocessed from the JSON event files in case an unused event has to be analyzed 18

19 Flow of Events : NFS HDFS Netezza Amazon S3 Oracle Event Server Network Drive Hadoop Copy Parse JSON s in Informatica HParser Hive Staging Table Informatica PowerCenter Grid with PWX for HDFS In-house Hadoop Cluster MicroStrategy Reports Netezza 19

20 High Level Systems Overview & Data Flow 20

21 HParser How Does It Work? hadoop dt-hadoop.jar My_Parser /input/*/input*.txt 1. Define JSON parser in HParser visual studio 2. Deploy the parser on Hadoop Distributed File System (HDFS) 3. Run HParser to extract data from JSON, flatten, and stage in Hadoop 21

22 Sample JSON to CSV Transformation in DT 22

23 Sample mapping that reads Hparser output to Netezza HDFS Application Connection Sample workflow that calls a Hparser script and parses the output data into Netezza 23

24 Workflow Controlled by Informatica Informatica HParser Staging Table Informatica PowerCenter Netezza 24

25 Next Steps Phase 1 was about capturing huge volumes of data and creating MSTR architecture, Operational reports and dashboards. Phase 2: Provide concise analytics anywhere and anytime 25

26 Business Benefit Have a scalable infrastructure Adding additional ETL and analytical capabilities without increasing overhead Creating an agile environment to keep up with business expectations (2 to 3 day turnaround for new data) 26

27 Thank You 27

Modernizing Business Intelligence and Analytics

Modernizing Business Intelligence and Analytics Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from

More information

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP

More information

Microsoft Exam

Microsoft Exam Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred

More information

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

MicroStrategy Academic Program

MicroStrategy Academic Program MicroStrategy Academic Program Creating a center of excellence for enterprise analytics and mobility. HOW TO DEPLOY ENTERPRISE ANALYTICS AND MOBILITY ON AWS APPROXIMATE TIME NEEDED: 1 HOUR In this workshop,

More information

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc. JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Own change. TECHNICAL WHITE PAPER Data Integration With REST API

Own change. TECHNICAL WHITE PAPER Data Integration With REST API TECHNICAL WHITE PAPER Data Integration With REST API Real-time or near real-time accurate and fast retrieval of key metrics is a critical need for an organization. Many times, valuable data are stored

More information

Big Data Infrastructure at Spotify

Big Data Infrastructure at Spotify Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system

More information

@Pentaho #BigDataWebSeries

@Pentaho #BigDataWebSeries Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of

More information

Azure Data Factory. Data Integration in the Cloud

Azure Data Factory. Data Integration in the Cloud Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and

More information

CloudExpo November 2017 Tomer Levi

CloudExpo November 2017 Tomer Levi CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes

More information

#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.

#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data

More information

How to Write Data to HDFS

How to Write Data to HDFS How to Write Data to HDFS 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior

More information

This document contains information on fixed and known limitations for Test Data Management.

This document contains information on fixed and known limitations for Test Data Management. Informatica LLC Test Data Management Version 10.1.0 Release Notes December 2016 Copyright Informatica LLC 2003, 2016 Contents Installation and Upgrade... 1 Emergency Bug Fixes in 10.1.0... 1 10.1.0 Fixed

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012 Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema

More information

INDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team

INDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team INDEPTH Network Introduction to ETL Tathagata Bhattacharjee ishare2 Support Team Data Warehouse A data warehouse is a system used for reporting and data analysis. Integrating data from one or more different

More information

MS-55045: Microsoft End to End Business Intelligence Boot Camp

MS-55045: Microsoft End to End Business Intelligence Boot Camp MS-55045: Microsoft End to End Business Intelligence Boot Camp Description This five-day instructor-led course is a complete high-level tour of the Microsoft Business Intelligence stack. It introduces

More information

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e ABSTRACT Pentaho Business Analytics from different data source, Analytics from csv/sql,create Star Schema Fact & Dimension Tables, kettle transformation for big data integration, MongoDB kettle Transformation,

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Big Data Facebook

Big Data Facebook Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale

More information

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,

More information

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

Data warehousing on Hadoop. Marek Grzenkowicz Roche Polska

Data warehousing on Hadoop. Marek Grzenkowicz Roche Polska Data warehousing on Hadoop Marek Grzenkowicz Roche Polska Agenda Introduction Case study: StraDa project Source data Data model Data flow and processing Reporting Lessons learnt Ideas for the future Q&A

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information

Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster

Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster CASE STUDY: TATA COMMUNICATIONS 1 Ten years ago, Tata Communications,

More information

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH 2019 SNOWFLAKE Our vision Allow our customers to access all their data in one place so they can make actionable decisions anytime, anywhere, with any number

More information

Technical Sheet NITRODB Time-Series Database

Technical Sheet NITRODB Time-Series Database Technical Sheet NITRODB Time-Series Database 10X Performance, 1/10th the Cost INTRODUCTION "#$#!%&''$!! NITRODB is an Apache Spark Based Time Series Database built to store and analyze 100s of terabytes

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

Oracle Hyperion Tips and Tricks. NEOAUG Eric Sanders, Gordon Strodel Monday, October 22, 2012

Oracle Hyperion Tips and Tricks. NEOAUG Eric Sanders, Gordon Strodel Monday, October 22, 2012 Oracle Hyperion 11.1.2.2 Tips and Tricks NEOAUG Eric Sanders, Gordon Strodel Monday, October 22, 2012 Agenda About Archetype What s New in 11.1.2.2: New User Interface Calculation Manager Manage Substitution

More information

Bull Fast Track/PDW and Big Data

Bull Fast Track/PDW and Big Data Bull Fast Track/PDW and Big Data Add High Performance BI to your Big Data Roger Van Unen Expert Microsoft / BI roger.van-unen@bull.net http://www.bull.fr/bi/fastrack.html Michael Schmitter BI Sales Germany

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Enterprise Data Warehousing

Enterprise Data Warehousing Enterprise Data Warehousing SQL Server 2005 Ron Dunn Data Platform Technology Specialist Integrated BI Platform Integrated BI Platform Agenda Can SQL Server cope? Do I need Enterprise Edition? Will I avoid

More information

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse IBM dashdb Local Using a software-defined environment in a private cloud to enable hybrid data warehousing Evolving the data warehouse Managing a large-scale, on-premises data warehouse environments to

More information

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization Composite Software, Inc. June 2011 TABLE OF CONTENTS INTRODUCTION... 3 DATA FEDERATION... 4 PROBLEM DATA CONSOLIDATION

More information

Shine a Light on Dark Data with Vertica Flex Tables

Shine a Light on Dark Data with Vertica Flex Tables White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Department of Computer Engineering 1, 2, 3, 4,5

Department of Computer Engineering 1, 2, 3, 4,5 Components for writing Parquet Format Files Manas Rathi 1, Pratik Jagtap 2, Pranali Jain 3, Anisha Jain 4, Prof. Subhash Tatale 5 1, 2, 3, 4,5 Department of Computer Engineering 1, 2, 3, 4,5 Vishwakarma

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

Přehled novinek v SQL Server 2016

Přehled novinek v SQL Server 2016 Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing

More information

Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB

Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely is the market leader in managed WordPress hosting, and an AWS Advanced Technology, SaaS, and Public

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Microsoft End to End Business Intelligence Boot Camp

Microsoft End to End Business Intelligence Boot Camp Microsoft End to End Business Intelligence Boot Camp 55045; 5 Days, Instructor-led Course Description This course is a complete high-level tour of the Microsoft Business Intelligence stack. It introduces

More information

Etlworks Integrator cloud data integration platform

Etlworks Integrator cloud data integration platform CONNECTED EASY COST EFFECTIVE SIMPLE Connect to all your APIs and data sources even if they are behind the firewall, semi-structured or not structured. Build data integration APIs. Select from multiple

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

Improving the ROI of Your Data Warehouse

Improving the ROI of Your Data Warehouse Improving the ROI of Your Data Warehouse Many organizations are struggling with a straightforward but challenging problem: their data warehouse can t affordably house all of their data and simultaneously

More information

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in Oracle Enterprise Manager 12c IBM DB2 Database Plug-in May 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and

More information

Alexander Klein. #SQLSatDenmark. ETL meets Azure

Alexander Klein. #SQLSatDenmark. ETL meets Azure Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &

More information

Capability White Paper Straight-Through-Processing (STP)

Capability White Paper Straight-Through-Processing (STP) Capability White Paper Straight-Through-Processing (STP) Drag-and-drop to create automated, repeatable, flexible and powerful data flow and application logic orchestration without programming to support

More information

What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE

What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE About me Freelancer since 2010 Consulting and development Oracle databases APEX BI Blog: APEX-AT-WORK Twitter: @tobias_arnhold - Oracle ACE Associate

More information

Warehouse- Scale Computing and the BDAS Stack

Warehouse- Scale Computing and the BDAS Stack Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,

More information

Netezza The Analytics Appliance

Netezza The Analytics Appliance Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for

More information

AWS Serverless Architecture Think Big

AWS Serverless Architecture Think Big MAKING BIG DATA COME ALIVE AWS Serverless Architecture Think Big Garrett Holbrook, Data Engineer Feb 1 st, 2017 Agenda What is Think Big? Example Project Walkthrough AWS Serverless 2 Think Big, a Teradata

More information

WHAT S NEW IN SQL SERVER 2016 REPORTING SERVICES?

WHAT S NEW IN SQL SERVER 2016 REPORTING SERVICES? WHAT S NEW IN SQL SERVER 2016 REPORTING SERVICES? Timothy P. McAliley CISA, CISM, CISSP, ITIL V3, MCSA, MCSE, MCT, PMP Microsoft Account Technology Strategist Try It Yourself! Two TechNet Virtual Labs

More information

There s no data like more data. Theo Vassilakis, Founder and CEO

There s no data like more data. Theo Vassilakis, Founder and CEO There s no data like more data Theo Vassilakis, Founder and CEO 1 A bit about Theo 2 2014 METANAUTIX. Detecting Failures in Utility Pipes Processing 3-D scans of pipes using SQL pipelines Metadata Points

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_

Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_ Streaming analytics better than batch - when and why? _Adam Kawa - Dawid Wysakowicz_ About Us At GetInData, we build custom Big Data solutions Hadoop, Flink, Spark, Kafka and more Our team is today represented

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

MicroStrategy Desktop MicroStrategy 10.2: New features overview. microstrategy.com 1

MicroStrategy Desktop MicroStrategy 10.2: New features overview. microstrategy.com 1 MicroStrategy Desktop 10.2 MicroStrategy 10.2: New features overview. microstrategy.com 1 TABLE OF CONTENTS MicroStrategy Desktop 10.2 Easier integration of custom visualizations 3 BETA Dashboard annotation

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Egypt s Bavarian Auto Group Deploys SAP On SQL Server 2005 to Support Rapid Growth

Egypt s Bavarian Auto Group Deploys SAP On SQL Server 2005 to Support Rapid Growth Microsoft SQL Server Customer Solution Case Study Egypt s Bavarian Auto Group Deploys SAP On SQL Server 2005 to Support Rapid Growth Overview Country or Region: Egypt Industry: Manufacturing Automotive

More information

Actian Vector Benchmarks. Cloud Benchmarking Summary Report

Actian Vector Benchmarks. Cloud Benchmarking Summary Report Actian Vector Benchmarks Cloud Benchmarking Summary Report April 2018 The Cloud Database Performance Benchmark Executive Summary The table below shows Actian Vector as evaluated against Amazon Redshift,

More information

Answer: A Reference:http://www.vertica.com/wpcontent/uploads/2012/05/MicroStrategy_Vertica_12.p df(page 1, first para)

Answer: A Reference:http://www.vertica.com/wpcontent/uploads/2012/05/MicroStrategy_Vertica_12.p df(page 1, first para) 1 HP - HP2-N44 Selling HP Vertical Big Data Solutions QUESTION: 1 When is Vertica a better choice than SAP HANA? A. The customer wants a closed ecosystem for BI and analytics, and is unconcerned with support

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

DURATION : 03 DAYS. same along with BI tools.

DURATION : 03 DAYS. same along with BI tools. AWS REDSHIFT TRAINING MILDAIN DURATION : 03 DAYS To benefit from this Amazon Redshift Training course from mildain, you will need to have basic IT application development and deployment concepts, and good

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce

More information

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality

More information

High-Performance Distributed DBMS for Analytics

High-Performance Distributed DBMS for Analytics 1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest

More information

EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed

More information

Acquiring Big Data to Realize Business Value

Acquiring Big Data to Realize Business Value Acquiring Big Data to Realize Business Value Agenda What is Big Data? Common Big Data technologies Use Case Examples Oracle Products in the Big Data space In Summary: Big Data Takeaways

More information

Analyze Big Data Faster and Store It Cheaper

Analyze Big Data Faster and Store It Cheaper Analyze Big Data Faster and Store It Cheaper Dr. Steve Pratt, CenterPoint Russell Hull, SAP Public About CenterPoint Energy, Inc. Publicly traded on New York Stock Exchange Headquartered in Houston, Texas

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

Oracle Enterprise Manager 12c Sybase ASE Database Plug-in

Oracle Enterprise Manager 12c Sybase ASE Database Plug-in Oracle Enterprise Manager 12c Sybase ASE Database Plug-in May 2015 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only,

More information

After completing this course, participants will be able to:

After completing this course, participants will be able to: Designing a Business Intelligence Solution by Using Microsoft SQL Server 2008 T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s i n - d e p t h k n o w l e d g e o n d e s

More information

Implement a Data Warehouse with Microsoft SQL Server

Implement a Data Warehouse with Microsoft SQL Server Implement a Data Warehouse with Microsoft SQL Server 20463D; 5 days, Instructor-led Course Description This course describes how to implement a data warehouse platform to support a BI solution. Students

More information

ELTMaestro for Spark: Data integration on clusters

ELTMaestro for Spark: Data integration on clusters Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

Top Five Reasons for Data Warehouse Modernization Philip Russom

Top Five Reasons for Data Warehouse Modernization Philip Russom Top Five Reasons for Data Warehouse Modernization Philip Russom TDWI Research Director for Data Management May 28, 2014 Sponsor Speakers Philip Russom TDWI Research Director, Data Management Steve Sarsfield

More information

RIPE NCC Routing Information Service (RIS)

RIPE NCC Routing Information Service (RIS) RIPE NCC Routing Information Service (RIS) Overview Colin Petrie 14/12/2016 RON++ What is RIS? What is RIS? Worldwide network of BGP collectors Deployed at Internet Exchange Points - Including at AMS-IX

More information

How to choose the right approach to analytics and reporting

How to choose the right approach to analytics and reporting SOLUTION OVERVIEW How to choose the right approach to analytics and reporting A comprehensive comparison of the open source and commercial versions of the OpenText Analytics Suite In today s digital world,

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information