Fast and Efficient A/B Testing Analysis with Shiny and SQL. Charlie Thompson Storyblocks

Size: px
Start display at page:

Download "Fast and Efficient A/B Testing Analysis with Shiny and SQL. Charlie Thompson Storyblocks"

Transcription

1 Fast and Efficient A/B Testing Analysis with Shiny and SQL Charlie Thompson Storyblocks

2 A/B Testing at Storyblocks

3 Our search page for stock video

4 Related Search cards test

5 Related Search cards test Test Control

6 We store results for our tests in Shiny

7 We have > 100 metrics to analyze per test

8 A/B testing generates big data We have thousands of A/B tests with millions of users Multiple ways to measure users Lots of metrics per user

9 Shiny and SQL together

10 A brief history Automated online dashboard in SQL Outsourced to 3rd party 2014 Scaling within Shiny Adhoc SQL queries 2017 To Shiny!

11 Loading big data into Shiny Overnight preprocessing on shiny server R script queries the SQL database and saves off an.rdata file for each test that contains the raw data test_1.rdata test_2.rdata Raw A/B testing data (SQL) load_data.r test_3.rdata test_4.rdata

12 Loading big data into Shiny Overnight preprocessing on shiny server Live in dashboard R script queries the SQL database and saves off an.rdata file for each test that contains the raw data As tests are selected in the dashboard, Shiny pulls the raw data file and computes all the metrics needed, including hypothesis tests test_1.rdata test_2.rdata Raw A/B testing data (SQL) load_data.r test_3.rdata test_4.rdata server.r Shiny Dashboard

13 Constraints with Shiny at scale Overnight preprocessing on shiny server Live in dashboard R script queries the SQL database and saves off an.rdata file for each test that contains the raw data As tests are selected in the dashboard, Shiny pulls the raw data file and computes all the metrics needed, including hypothesis tests test_1.rdata test_2.rdata Raw A/B testing data (SQL) load_data.r test_3.rdata test_4.rdata Bottleneck #3: Users queue Bottleneck #1: Reading in large tests server.r Shiny Dashboard Bottleneck #2: Calculating hypothesis tests for 50+ metrics

14 Overcoming Shiny constraints Overnight preprocessing on shiny server R script queries the SQL database and calculates hypothesis tests and saves off an.rdata file for each test that contains the aggregated data Raw A/B testing data (SQL) load_data.r Live in dashboard As tests are selected in the dashboard, Shiny pulls the aggregated file for each test, which now contains historical values instead of daily snapshots FUHGETTABOUTIT! Aggregated data is wicked small Bottleneck #3: test_1.rdata Bottleneck #1: Users queue Reading in large tests test_2.rdata server.r test_3.rdata test_4.rdata NO WORRIES! The dashboard is so fast we won t notice Shiny Dashboard NOT ANYMORE! Bottleneck #2: This is done in the Calculating hypothesis tests for morning 50+ metrics

15 Making the most of your data

16 When is a test done?

17 Aggregated data gives a time series view Test begins

18 Time series helps prevent premature reads P Value Test looks 95% significant here! Date

19 P-value should stabilize over time P Value Win or lose, the P-value should stabilize before a test is finished Date

20 When to think about scaling

21 Shiny: prototype vs production Prototype Production Hosting Local Shiny server, shinyapps.io, etc Number of concurrent users One Multiple Page load time Easy to overlook Instant, UX is important Data storage Often pull in unused rows or columns Loads only necessary data Stability and maintenance Only needs to be working when demoing Minimal downtime

22 Measuring Shiny usage Make sure you know how many users you have!

23 What we learned

24 Let SQL be SQL and R be R R SQL Big data aggregation Possible, but slow Made for exactly this Hypothesis tests and charts Made for exactly this Painful, need tools

25 Data tips for Shiny in production 1. Subset your input data before reading it in 2. Use.RData files 3. Consider ETL process - do you really need real-time data? 4. Monitor usage

26 Additional resources A/B Testing in the Wild [Etsy] - Emily Robinson A/B Testing at Stack Overflow - Julia Silge Experiments at Airbnb - Jan Overgoor Shiny server system performance monitoring - Huidong Tian

27 Questions? We re hiring! Contact me

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from

More information

High-Performance Distributed DBMS for Analytics

High-Performance Distributed DBMS for Analytics 1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest

More information

Database Performance Analyzer (DPA) Quick Demo

Database Performance Analyzer (DPA) Quick Demo Database Performance Analyzer (DPA) Quick Demo http://database.demo.solarwinds.com/ Log in with the username demo and password demo1. NOTE: You may encounter the following recommended video, while demoing

More information

Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster

Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster Case Study: Tata Communications Delivering a Truly Interactive Business Intelligence Experience on a Large Multi-Tenant Hadoop Cluster CASE STUDY: TATA COMMUNICATIONS 1 Ten years ago, Tata Communications,

More information

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo About Me Currently Co-founder/CEO of Qubole Ran the Data Infrastructure Team at Facebook till 2011

More information

New Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply

New Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply New Data Architectures For Netflow Analytics NANOG 74 Fangjin Yang - Cofounder @ Imply The Problem Comparing technologies Overview Operational analytic databases Try this at home The Problem Netflow data

More information

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database How do we build TiDB a Distributed, Consistent, Scalable, SQL Database About me LiuQi ( 刘奇 ) JD / WandouLabs / PingCAP Co-founder / CEO of PingCAP Open-source hacker / Infrastructure software engineer

More information

Scaling with Continuous Deployment

Scaling with Continuous Deployment Scaling with Continuous Deployment Web 2.0 Expo New York, NY, September 29, 2010 Brett G. Durrett (@bdurrett) Vice President Engineering & Operations, IMVU, Inc. 0 An online community where members use

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

Two Success Stories - Optimised Real-Time Reporting with BI Apps

Two Success Stories - Optimised Real-Time Reporting with BI Apps Oracle Business Intelligence 11g Two Success Stories - Optimised Real-Time Reporting with BI Apps Antony Heljula October 2013 Peak Indicators Limited 2 Two Success Stories - Optimised Real-Time Reporting

More information

Genesys Info Mart. gim-etl-media-chat Section

Genesys Info Mart. gim-etl-media-chat Section Genesys Info Mart gim-etl-media-chat Section 11/23/2017 Contents 1 gim-etl-media-chat Section 1.1 q-answer-threshold 1.2 q-short-abandoned-threshold 1.3 short-abandoned-threshold Genesys Info Mart 2 gim-etl-media-chat

More information

PORTAL. A Case Study. Dr. Kristin Tufte Mark Wong September 23, Linux Plumbers Conference 2009

PORTAL. A Case Study. Dr. Kristin Tufte Mark Wong September 23, Linux Plumbers Conference 2009 PORTAL A Case Study Dr. Kristin Tufte (tufte@cecs.pdx.edu) Mark Wong (markwkm@postgresql.org) Linux Plumbers Conference 2009 September 23, 2009 Overview What is PORTAL? How PORTAL works Improving PORTAL

More information

5/2/2015. Overview of SSIS performance Troubleshooting methods Performance tips

5/2/2015. Overview of SSIS performance Troubleshooting methods Performance tips Overview of SSIS performance Troubleshooting methods Performance tips 2 Business intelligence consultant Partner, Linchpin People SQL Server MVP TimMitchell.net / @Tim_Mitchell tim@timmitchell.net 3 1

More information

Jens Bollmann. Welcome! Performance 101 for Small Web Apps. Principal consultant and trainer within the Professional Services group at SkySQL Ab.

Jens Bollmann. Welcome! Performance 101 for Small Web Apps. Principal consultant and trainer within the Professional Services group at SkySQL Ab. Welcome! Jens Bollmann jens@skysql.com Principal consultant and trainer within the Professional Services group at SkySQL Ab. Who is SkySQL Ab? SkySQL Ab is the alternative source for software, services

More information

What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE

What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE About me Freelancer since 2010 Consulting and development Oracle databases APEX BI Blog: APEX-AT-WORK Twitter: @tobias_arnhold - Oracle ACE Associate

More information

BECOME AN APPLICATION SUPER-HERO

BECOME AN APPLICATION SUPER-HERO BECOME AN APPLICATION SUPER-HERO MINIMIZE APPLICATION DOWNTIME AND ACCELERATE TIME TO RESOLUTION Charlie Arehart Independent Consultant charlie@carehart.org / @carehart INTRODUCTION For those new to FusionReactor,

More information

Leveraging Customer Behavioral Data to Drive Revenue the GPU S7456

Leveraging Customer Behavioral Data to Drive Revenue the GPU S7456 Leveraging Customer Behavioral Data to Drive Revenue the GPU way 1 Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send

More information

Analysis Services. Show Me Where It Hurts. Bill Anton Head Prime Data Intelligence

Analysis Services. Show Me Where It Hurts. Bill Anton Head Prime Data Intelligence Analysis Services Show Me Where It Hurts Bill Anton Head Beaver @ Prime Data Intelligence Life Is Good! Photo Credit: SuperCar-RoadTrip.fr Life is Photo Credit: Charlie This is avoidable! Bill Anton Business

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

Improve the Performance of Your T-SQL by Changing Your Habits. Mickey Stuewe Microsoft Junkie Sr Database Developer

Improve the Performance of Your T-SQL by Changing Your Habits. Mickey Stuewe Microsoft Junkie Sr Database Developer Improve the Performance of Your T-SQL by Changing Your Habits Mickey Stuewe Microsoft Junkie Sr Database Developer Your Background DBA Database Developer Programmer Manager Just Checking Things Out 2 Objectives

More information

Big Data Facebook

Big Data Facebook Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Big Data @ FB: Scale

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking

More information

Training Content Key Terms... 1 How to Run a Report... 2 How to View a Dashboard... 5 How to Modify & Customize Reports... 6

Training Content Key Terms... 1 How to Run a Report... 2 How to View a Dashboard... 5 How to Modify & Customize Reports... 6 Salesforce Reporting Tools Technical Assistance email: support@salesforce.asu.edu Salesforce: http://asu.my.salesforce.com Training Content Key Terms... 1 How to Run a Report... 2 How to View a Dashboard...

More information

Capacity metrics in daily MySQL checks. Vladimir Fedorkov MySQL and Friends Devroom FOSDEM 15

Capacity metrics in daily MySQL checks. Vladimir Fedorkov MySQL and Friends Devroom FOSDEM 15 Capacity metrics in daily MySQL checks Vladimir Fedorkov MySQL and Friends Devroom FOSDEM 15 About me Performance geek blog http://astellar.com Twitter @vfedorkov Enjoy LAMP stack tuning Especially MySQL

More information

PERFORMANCE INVESTIGATION TOOLS & TECHNIQUES. 7C Matthew Morris Desynit

PERFORMANCE INVESTIGATION TOOLS & TECHNIQUES. 7C Matthew Morris Desynit PERFORMANCE INVESTIGATION TOOLS & TECHNIQUES 7C Matthew Morris Desynit Desynit > Founded in 2001 > Based in Bristol, U.K > Customers worldwide > Technology Mix 2E/Plex Java &.Net Web & mobile applications

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Cloud Monitoring as a Service. Built On Machine Learning

Cloud Monitoring as a Service. Built On Machine Learning Cloud Monitoring as a Service Built On Machine Learning Table of Contents 1 2 3 4 5 6 7 8 9 10 Why Machine Learning Who Cares Four Dimensions to Cloud Monitoring Data Aggregation Anomaly Detection Algorithms

More information

Update The Statistics On A Single Table+sql Server 2005

Update The Statistics On A Single Table+sql Server 2005 Update The Statistics On A Single Table+sql Server 2005 There are different ways statistics are created and maintained in SQL Server: to find out all of those statistics created by SQL Server Query Optimizer

More information

Rows and Range, Preceding and Following

Rows and Range, Preceding and Following Rows and Range, Preceding and Following SQL Server 2012 adds many new features to Transact SQL (T-SQL). One of my favorites is the Rows/Range enhancements to the over clause. These enhancements are often

More information

LazyBase: Trading freshness and performance in a scalable database

LazyBase: Trading freshness and performance in a scalable database LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY

More information

FUN WITH ANALYTIC FUNCTIONS UTOUG TRAINING DAYS 2017

FUN WITH ANALYTIC FUNCTIONS UTOUG TRAINING DAYS 2017 FUN WITH ANALYTIC FUNCTIONS UTOUG TRAINING DAYS 2017 ABOUT ME Born and raised here in UT In IT for 10 years, DBA for the last 6 Databases and Data are my hobbies, I m rather quite boring This isn t why

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER Measuring Business Intelligence Throughput on a Single Server QlikView Scalability Center Technical White Paper December 2012 qlikview.com QLIKVIEW THROUGHPUT

More information

Incremental Updates VS Full Reload

Incremental Updates VS Full Reload Incremental Updates VS Full Reload Change Data Capture Minutes VS Hours 1 Table of Contents Executive Summary - 3 Accessing Data from a Variety of Data Sources and Platforms - 4 Approaches to Moving Changed

More information

Professional Edition Tutorial: Excel Spreadsheets

Professional Edition Tutorial: Excel Spreadsheets -- DRAFT DOCUMENTATION RELEASE-- Information Subject to Change Professional Edition Tutorial: Excel Spreadsheets Pronto, Visualizer, and Dashboards 2.0 Documentation Release 3/7/2017 i Copyright 2015-2017

More information

HOSTED CONTACT CENTRE

HOSTED CONTACT CENTRE ---------------------------------------------------------------------------- ------ HOSTED CONTACT CENTRE ANALYTICS GUIDE Version 9.4 Revision 1.0 Confidentiality and Proprietary Statement This document

More information

E(xtract) T(ransform) L(oad)

E(xtract) T(ransform) L(oad) Gunther Heinrich, Tobias Steimer E(xtract) T(ransform) L(oad) OLAP 20.06.08 Agenda 1 Introduction 2 Extract 3 Transform 4 Load 5 SSIS - Tutorial 2 1 Introduction 1.1 What is ETL? 1.2 Alternative Approach

More information

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more

More information

In-Memory Data Management

In-Memory Data Management In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.

More information

USE A/B TESTING TO MARKET LIKE HUBSPOT WORKBOOK for HubSpot Customers

USE A/B TESTING TO MARKET LIKE HUBSPOT WORKBOOK for HubSpot Customers USE A/B TESTING TO MARKET LIKE HUBSPOT WORKBOOK for HubSpot Customers Guide to using A/B testing to take your marketing to the next stratosphere. A Publication of 2 USE WITH THE COMPANION EBOOK Get the

More information

Partitioning in Oracle 12 c. Bijaya K Adient

Partitioning in Oracle 12 c. Bijaya K Adient Partitioning in Oracle 12 c Bijaya K Pusty @ Adient Partitioning in Oracle 12 c AGENDA Concepts of Partittioning? Partitioning Basis Partitioning Strategy Additions Improvments in 12c Partitioning Indexes

More information

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018 MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018 Few words about Percona Monitoring and Management (PMM) 100% Free, Open Source

More information

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR Table of Contents Foreword... 2 New Era of Rapid Data Warehousing... 3 Eliminating Slow Reporting and Analytics Pains... 3 Applying 20 Years

More information

Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco

Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco Flexible Network Analytics in the Cloud Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco Introduction Harsh realities of network analytics netbeam Demo

More information

TEN QUERY TUNING TECHNIQUES

TEN QUERY TUNING TECHNIQUES TEN QUERY TUNING TECHNIQUES Every SQL Programmer Should Know Kevin Kline Director of Engineering Services at SentryOne Microsoft MVP since 2003 Facebook, LinkedIn, Twitter at KEKLINE kkline@sentryone.com

More information

TRUE DATABASE VISIBILITY Meet your speakers Raymond Pe Sr Database Administrator Alliant Credit Union Ron Kozakowski Manager, Data Services Alliant Cr

TRUE DATABASE VISIBILITY Meet your speakers Raymond Pe Sr Database Administrator Alliant Credit Union Ron Kozakowski Manager, Data Services Alliant Cr MGT2426BU Alliant Credit Union Cashes in on True Database Visibility in vrealize Operations Raymond Pe, Ron Kozakowski, Alliant Credit Union Gregory Hohertz, Blue Medora TRUE DATABASE VISIBILITY Meet your

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

OVERCOMING CHARTAPHOBIA

OVERCOMING CHARTAPHOBIA OVERCOMING CHARTAPHOBIA Moving Your Organization Toward Interesting and Enlightening Data Viz Meagan Longoria SQL Saturday #396 Getting Started Slides are on my blog. Questions and comments are expected

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

Splunk Review. 1. Introduction

Splunk Review. 1. Introduction Splunk Review 1. Introduction 2. Splunk Splunk is a software tool for searching, monitoring and analysing machine generated data via web interface. It indexes and correlates real-time and non-real-time

More information

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various

More information

Tips & Tricks: Vault QualityDocs Dashboards and Reports. October 22, 2014

Tips & Tricks: Vault QualityDocs Dashboards and Reports. October 22, 2014 Tips & Tricks: Vault QualityDocs Dashboards and Reports October 22, 2014 Today s Session Interactive session to build reports and dashboards in Vault QualityDocs Overview of the capabilities of Vault reporting

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Background. Let s see what we prescribed.

Background. Let s see what we prescribed. Background Patient B s custom application had slowed down as their data grew. They d tried several different relief efforts over time, but performance issues kept popping up especially deadlocks. They

More information

Vlookup for dummies two sheets vlookup

Vlookup for dummies two sheets vlookup Vlookup for dummies two sheets Click on the 'fx' button above column B many people start by typing "=vlookup. " but you don't have to! Clicking the "fx" button is much quicker!. * IF AND-OR Combinations:

More information

COMP390 (Design &) Implementation

COMP390 (Design &) Implementation COMP390 (Design &) Implementation A rough guide Consisting of some ideas to assist the development of large and small projects in Computer Science (With thanks to Dave Shield) Design & Implementation What

More information

How Rust is Tilde s Competitive Advantage

How Rust is Tilde s Competitive Advantage Jan. 2018 Rust Case Study: How Rust is Tilde s Competitive Advantage The analytics startup innovates safely with the help of Rust Copyright 2018 The Rust Project Developers All rights reserved graphics

More information

How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony

How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony How eharmony Turns Big Data into True Love Sridhar Chiguluri, Lead ETL Developer eharmony Grant Parsamyan, Director of BI & Data Warehousing eharmony 1 Agenda Company Overview What is Big Data? Challenges

More information

AZURE CONTAINER INSTANCES

AZURE CONTAINER INSTANCES AZURE CONTAINER INSTANCES -Krunal Trivedi ABSTRACT In this article, I am going to explain what are Azure Container Instances, how you can use them for hosting, when you can use them and what are its features.

More information

DATA VISUALIZATION Prepare the data for visualization Data presentation architecture (DPA) is a skill-set that seeks to identify, locate, manipulate,

DATA VISUALIZATION Prepare the data for visualization Data presentation architecture (DPA) is a skill-set that seeks to identify, locate, manipulate, DATA VISUALIZATION Prepare the data for visualization Data presentation architecture (DPA) is a skill-set that seeks to identify, locate, manipulate, format and present data in such a way as to optimally

More information

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements

More information

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data Big Data Really Fast A Proven In-Memory Analytical Processing Platform for Big Data 2 Executive Summary / Overview: Big Data can be a big headache for organizations that have outgrown the practicality

More information

I Want To Go Faster! A Beginner s Guide to Indexing

I Want To Go Faster! A Beginner s Guide to Indexing I Want To Go Faster! A Beginner s Guide to Indexing Bert Wagner Slides available here! @bertwagner bertwagner.com youtube.com/c/bertwagner bert@bertwagner.com Why Indexes? Biggest bang for the buck Can

More information

Identify and Eliminate Oracle Database Bottlenecks

Identify and Eliminate Oracle Database Bottlenecks Identify and Eliminate Oracle Database Bottlenecks Improving database performance isn t just about optimizing your queries. Oftentimes the infrastructure that surrounds it can inhibit or enhance Oracle

More information

From 1 to 10K with Ganglia and Nagios. Spike Morelli aka Space Linden

From 1 to 10K with Ganglia and Nagios. Spike Morelli aka Space Linden From 1 to 10K with Ganglia and Nagios Spike Morelli aka Space Linden About Second Life 3D Virtual World Not a game About Second Life Built by Residents Textured Scripted Animated Owned About Second Life

More information

Lesson 11 Transcript: Concurrency and locking

Lesson 11 Transcript: Concurrency and locking Lesson 11 Transcript: Concurrency and locking Slide 1: Cover Welcome to Lesson 11 of the DB2 on Campus Lecture Series. We are going to talk today about concurrency and locking. My name is Raul Chong and

More information

Monitor DNS errors in a dashboard

Monitor DNS errors in a dashboard Monitor DNS errors in a dashboard Published: 2018-04-20 The Domain Name System (DNS) is an essential service for resolving hostnames to IP addresses. Any system that needs to locate and communicate with

More information

Monitor database health in a dashboard

Monitor database health in a dashboard Monitor database health in a dashboard Published: 2018-04-20 When someone reports that a database query failed or is too slow, several questions come to mind. Finding the answers can be a time-consuming

More information

Designing dashboards for performance. Reference deck

Designing dashboards for performance. Reference deck Designing dashboards for performance Reference deck Basic principles 1. Everything in moderation 2. If it isn t fast in database, it won t be fast in Tableau 3. If it isn t fast in desktop, it won t be

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Distributed Computing.

Distributed Computing. Distributed Computing at Hai.Thai@rackspace.com About: Me ME About: Me ME 09 Tech grad B.S. Computer Engineering 4 years at rackspace About: Rackspace About: Rackspace Managed + Cloud hosting Cloud Applications:

More information

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another

More information

MSG: An Overview of a Messaging System for the Grid

MSG: An Overview of a Messaging System for the Grid MSG: An Overview of a Messaging System for the Grid Daniel Rodrigues Presentation Summary Current Issues Messaging System Testing Test Summary Throughput Message Lag Flow Control Next Steps Current Issues

More information

CSE 410 Computer Systems. Hal Perkins Spring 2010 Lecture 12 More About Caches

CSE 410 Computer Systems. Hal Perkins Spring 2010 Lecture 12 More About Caches CSE 4 Computer Systems Hal Perkins Spring Lecture More About Caches Reading Computer Organization and Design Section 5. Introduction Section 5. Basics of Caches Section 5. Measuring and Improving Cache

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

Exploiting Concurrency

Exploiting Concurrency Exploiting Concurrency How I stopped worrying and started threading Michael Meeks michael.meeks@collabora.com mmeeks / irc.freenode.net Collabora Productivity Stand at the crossroads and look; ask for

More information

COPYRIGHTED MATERIAL. Getting Started with Google Analytics. P a r t

COPYRIGHTED MATERIAL. Getting Started with Google Analytics. P a r t P a r t I Getting Started with Google Analytics As analytics applications go, Google Analytics is probably the easiest (or at least one of the easiest) available in the market today. But don t let the

More information

Apache Kylin. OLAP on Hadoop

Apache Kylin. OLAP on Hadoop Apache Kylin OLAP on Hadoop Agenda What s Apache Kylin? Tech Highlights Performance Roadmap Q & A http://kylin.io What s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite

More information

RIPE NCC Routing Information Service (RIS)

RIPE NCC Routing Information Service (RIS) RIPE NCC Routing Information Service (RIS) Overview Colin Petrie 14/12/2016 RON++ What is RIS? What is RIS? Worldwide network of BGP collectors Deployed at Internet Exchange Points - Including at AMS-IX

More information

MarkLogic Server. Monitoring MarkLogic Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

MarkLogic Server. Monitoring MarkLogic Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved. Monitoring MarkLogic Guide 1 MarkLogic 9 May, 2017 Last Revised: 9.0-2, July, 2017 Copyright 2017 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents Monitoring MarkLogic Guide

More information

CONTENTS EXECUTING DATA. . PHONE.

CONTENTS EXECUTING DATA.  . PHONE. CONTENTS EXECUTING DATA. EMAIL. PHONE. 1 Here at SalesLoft, we believe in inside sales and the power of the sales development team. This is the document we equip our SDRs with to ensure the highest likelihood

More information

Service Level Report Dashboard 7.2

Service Level Report Dashboard 7.2 User Guide Focused Insights for SAP Solution Manager Document Version: 1.1 2017-07-31 ST-OST 200 SP 1 Typographic Conventions Type Style Example Example EXAMPLE Example Example EXAMPLE Description

More information

The Associative Difference

The Associative Difference White Paper The Associative Difference Freedom from the limitations of query-based tools September, 2017 qlik.com Table of Contents Introduction 3 Qlik s Associative Difference 3 Query-based tools limitations

More information

Working with Pentaho Interactive Reporting and Metadata

Working with Pentaho Interactive Reporting and Metadata Working with Pentaho Interactive Reporting and Metadata Change log (if you want to use it): Date Version Author Changes Contents Overview... 1 Before You Begin... 1 Other Prerequisites... Error! Bookmark

More information

Qlik Sense Performance Benchmark

Qlik Sense Performance Benchmark Technical Brief Qlik Sense Performance Benchmark This technical brief outlines performance benchmarks for Qlik Sense and is based on a testing methodology called the Qlik Capacity Benchmark. This series

More information

Microsoft End to End Business Intelligence Boot Camp

Microsoft End to End Business Intelligence Boot Camp Microsoft End to End Business Intelligence Boot Camp 55045; 5 Days, Instructor-led Course Description This course is a complete high-level tour of the Microsoft Business Intelligence stack. It introduces

More information

Data Warehousing with Perl Colin Bradford

Data Warehousing with Perl Colin Bradford Data Warehousing with Perl Colin Bradford Data Warehousing with Perl An example operational schema Some typical reporting questions Answering with the operational database Introduction to Star schemas

More information

Data Modelling for DW & Cubes

Data Modelling for DW & Cubes Data Modelling for DW & Cubes Alex Whittles Alex@PurpleFrogSystems.com PurpleFrogSystems.com PurpleFrogSystems.com/blog @PurpleFrogSys SQLSaturday #467 Sponsors Alex Whittles SQL Relay ExCo SQLRelay.co.uk

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

Build ETL efficiently (10x) with Minimal Logging

Build ETL efficiently (10x) with Minimal Logging Build ETL efficiently (10x) with Minimal Logging Simon Cho Blog : Simonsql.com Simon@simonsql.com SQL Saturday Chicago 2017 - Sponsors Thank you Our sponsors This Session Designed for 3 hours including

More information

Scaling Instagram. AirBnB Tech Talk 2012 Mike Krieger Instagram

Scaling Instagram. AirBnB Tech Talk 2012 Mike Krieger Instagram Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram me - Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything communicating and sharing

More information

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc. JAVASCRIPT CHARTING Scaling for the Enterprise with Metric Insights 2013 Copyright Metric insights, Inc. A REVOLUTION IS HAPPENING... 3! Challenges... 3! Borrowing From The Enterprise BI Stack... 4! Visualization

More information

Performance Issue : More than 30 sec to load. Design OK, No complex calculation. 7 tables joined, 500+ millions rows

Performance Issue : More than 30 sec to load. Design OK, No complex calculation. 7 tables joined, 500+ millions rows Bienvenue Nicolas Performance Issue : More than 30 sec to load Design OK, No complex calculation 7 tables joined, 500+ millions rows Denormalize, Materialized Views, Columnstore Index Less than 5 sec to

More information

EXAM PRO:MS SQL 2008, Designing a Business Intelligence. Buy Full Product.

EXAM PRO:MS SQL 2008, Designing a Business Intelligence. Buy Full Product. Microsoft EXAM - 70-452 PRO:MS SQL Server@ 2008, Designing a Business Intelligence Buy Full Product http://www.examskey.com/70-452.html Examskey Microsoft 70-452 exam demo product is here for you to test

More information

Low Latency Data Grids in Finance

Low Latency Data Grids in Finance Low Latency Data Grids in Finance Jags Ramnarayan Chief Architect GemStone Systems jags.ramnarayan@gemstone.com Copyright 2006, GemStone Systems Inc. All Rights Reserved. Background on GemStone Systems

More information

MS-55045: Microsoft End to End Business Intelligence Boot Camp

MS-55045: Microsoft End to End Business Intelligence Boot Camp MS-55045: Microsoft End to End Business Intelligence Boot Camp Description This five-day instructor-led course is a complete high-level tour of the Microsoft Business Intelligence stack. It introduces

More information

The Idiot s Guide to Quashing MicroServices. Hani Suleiman

The Idiot s Guide to Quashing MicroServices. Hani Suleiman The Idiot s Guide to Quashing MicroServices Hani Suleiman The Promised Land Welcome to Reality Logging HA/DR Monitoring Provisioning Security Debugging Enterprise frameworks Don t Panic WHOAMI I wrote

More information