real-time delivery architecture

Size: px
Start display at page:

Download "real-time delivery architecture"

Transcription

1 real-time delivery uc berkeley - 27 august 2012

2 designing twitter

3 what are the goals? evolve from being solely a web stack

4 ROUTING PRESENTATION LOGIC STORAGE & RETRIEVAL T-Bird T-Flock + Haplo Monorail Darkwing Flock(s)

5 what are the goals? evolve from being solely a web stack isolate responsibilities and concerns site speed and reliability developer innovation speed

6

7 Pull Targeted twitter.com home_timeline API Queried Search API Push User / Site Streams Mobile Push (SMS, etc.) Track / Follow Streams

8

9

10

11

12 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Cache Fanout

13 Write API pipelined 4k destinations at a time Blender replicated Timeline Service HTTP Push Mobile Push Hadoop Batch Compute keyed off recipient Timeline Cache Search Cache insert Fanout Push Compute Ingester Social Graph Service

14 Write API Ingester RPUSHX to only add to cached timelines Blender Timeline Service Tweet IDPush User ID HTTP 8 bytes 8 bytes Mobile Push Bits Hadoop 4 bytes Batch Compute native list structure Push Compute Search Cache using redis Timeline Cache Fanout

15 Write API Ingester RPUSHX to only add to cached timelines Blender Timeline Service Tweet IDPush User ID HTTP Bits Tweet ID Bits Hadoop Tweet ID User ID Mobile Tweet ID User ID Push Bits Tweet ID User ID Bits Tweet ID User ID Bits Tweet ID User ID Bits Tweet ID User ID Bits Tweet ID User ID Bits Tweet ID User ID Bits Tweet ID User ID Bits Tweet ID Tweet ID Batch Compute native list structure Push Compute Search Cache using redis Timeline Cache Fanout

16 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Cache Fanout

17 Write API Fanout Timeline Cache Gizmoduck Timeline Service TweetyPie

18 Pull Targeted twitter.com home_timeline API Queried Search API Push User / Site Streams Mobile Push (SMS, etc.) Track / Follow Streams

19 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Cache Fanout

20 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Cache Fanout

21 Write API Ingester Blender Timeline Service HTTP Push Hadoop queries one replica of all indexes Mobile Push merges & ranks results Batch Compute blender Push Compute Timeline Cache Search Index Fanout

22 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

23 Pull Targeted twitter.com home_timeline API Queried Search API Push User / Site Streams Mobile Push (SMS, etc.) Track / Follow Streams

24 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

25 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

26 http push / hosebird maintains persistent connections with end clients processes tweet & social graph events event-based router

27 Hosebird Firehose Write API Hosebird User Streams Hosebird Track / Follow event propagation write API sends all events into hosebird; sees content creation events, social graph changes, etc. different queues for public tweets, protected tweets, social events, etc.

28 Hosebird Firehose Write API Hosebird User Streams Hosebird Track / Follow event cascading bandwidth management simultaneous connection management (~1m long lived & open connections to this cluster)

29 Hosebird Firehose Write API Hosebird User Streams Hosebird Track / Follow firehose edge machine simply outputs the public tweet queue only allow a limited number of firehoses per hosebird box for bandwidth management

30 Hosebird Firehose Write API Hosebird Track / Follow Hosebird User Streams track / follow simple query based on tweet content keeps list of terms / users of interest parses public tweets at the edge, and if term matches a token, or user is of interest, then route

31 Hosebird Firehose Write API Hosebird Track / Follow Hosebird User Streams user streams replicate home timeline experience upon login, obtain following list keep cached following list coherent by seeing social graph updates route tweet if from a followed user

32 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

33 Pull Targeted twitter.com home_timeline API Queried Search API Push User / Site Streams Mobile Push (SMS, etc.) Track / Follow Streams

34 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

35 Write API Ingester Blender Timeline Service Mobile Push Hadoop Social Graph Service Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

36 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

37 Pull Targeted twitter.com home_timeline API Queried Search API Push User / Site Streams Mobile Push (SMS, etc.) Track / Follow Streams

38 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

39 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

40 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

41 Synchronous Path Write API Ingester Blender Timeline Service Mobile Push Asynchronous Path Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout Query Path

42 Synchronous Path Write API Ingester Blender Timeline Service Mobile Push Asynchronous Path Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout Query Path

43 Synchronous Path Write API Ingester Blender Timeline Service Mobile Push Asynchronous Path Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout Query Path

44 Blender Timeline Service Write Path HTTP Push Mobile Push Hadoop Batch Compute Timeline Cache Fanout Push Compute Ingester Search Index Read Path Write API

45 Blender Timeline Service Write Path HTTP Push Mobile Push Hadoop Batch Compute Timeline Cache Fanout Push Compute Ingester Search Index Read Path Write API

46 things we re trying...

47

48 Write API Ingester Fanout Search Index Timeline Cache search index [ hello, world ] fanout index [@danadanger,...]

49 User Intent Query Expansion Hello, world Hello AND s home timeline user_timeline:nelson OR user_timeline:danadanger

50 Write API fan-in fan-out O(1) write Ingester Fanout O(n) write O(n) read Search Index Timeline Cache O(1) read

51 User Intent Hello, world Query Expansion Hello AND s home timeline home_timeline:raffi

52 User Intent Query Expansion Hello, world Hello AND s home timeline home_timeline:raffi OR user_timeline:taylorswift13

53 streaming compute continuous computation driven by the events that come into twitter generalizing the push mechanism

54 Write API Ingester Blender Timeline Service Mobile Push Hadoop Batch Compute HTTP Push Push Compute Timeline Cache Search Index Fanout

55 timeline query statistics >150m active users worldwide 300k qps poll-based 1ms p50 / 4ms p99 30k qps search-based timelines

56 tweet input ~340m tweets per day ~4K/sec daily average ~6K/sec daily peak >10K/sec during large events

57

58 followed by following

59 timeline delivery statistics 26b deliveries / day (~18m / min) 3.5 p50 to deliver to 1m ~300k deliveries / sec

60 thanks!

Realtime & Personalized

Realtime & Personalized Realtime & Personalized Notifications @Twitter @pathak_s @lamgary March 8 2017 I was following it on Twitter, I didn't actually see it live. I kept on refreshing my notifications, I saw people were tweeting

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Java Message Service (JMS) is a standardized messaging interface that has become a pervasive part of the IT landscape

More information

Building Durable Real-time Data Pipeline

Building Durable Real-time Data Pipeline Building Durable Real-time Data Pipeline Apache BookKeeper at Twitter @sijieg Twitter Background Layered Architecture Agenda Design Details Performance Scale @Twitter Q & A Publish-Subscribe Online services

More information

Improve Web Application Performance with Zend Platform

Improve Web Application Performance with Zend Platform Improve Web Application Performance with Zend Platform Shahar Evron Zend Sr. PHP Specialist Copyright 2007, Zend Technologies Inc. Agenda Benchmark Setup Comprehensive Performance Multilayered Caching

More information

Reactive Microservices Architecture on AWS

Reactive Microservices Architecture on AWS Reactive Microservices Architecture on AWS Sascha Möllering Solutions Architect, @sascha242, Amazon Web Services Germany GmbH Why are we here today? https://secure.flickr.com/photos/mgifford/4525333972

More information

Last Class: RPCs and RMI. Today: Communication Issues

Last Class: RPCs and RMI. Today: Communication Issues Last Class: RPCs and RMI Case Study: Sun RPC Lightweight RPCs Remote Method Invocation (RMI) Design issues Lecture 9, page 1 Today: Communication Issues Message-oriented communication Persistence and synchronicity

More information

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013 PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

More information

How you can benefit from using. javier

How you can benefit from using. javier How you can benefit from using I was Lois Lane redis has super powers myth: the bottleneck redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop,mset -P 16 -q On my laptop: SET: 513610 requests

More information

The Road to a Complete Tweet Index

The Road to a Complete Tweet Index The Road to a Complete Tweet Index Yi Zhuang Staff Software Engineer @ Twitter Outline 1. Current Scale of Twitter Search 2. The History of Twitter Search Infra 3. Complete Tweet Index 4. Search Engine

More information

BUILDING LARGE VOD LIBRARIES WITH NEXT GENERATION ON DEMAND ARCHITECTURE. Weidong Mao Comcast Fellow Office of the CTO Comcast Cable

BUILDING LARGE VOD LIBRARIES WITH NEXT GENERATION ON DEMAND ARCHITECTURE. Weidong Mao Comcast Fellow Office of the CTO Comcast Cable BUILDING LARGE VOD LIBRARIES WITH NEXT GENERATION ON DEMAND ARCHITECTURE Weidong Mao Comcast Fellow Office of the CTO Comcast Cable Abstract The paper presents an integrated Video On Demand (VOD) content

More information

<Insert Picture Here> QCon: London 2009 Data Grid Design Patterns

<Insert Picture Here> QCon: London 2009 Data Grid Design Patterns QCon: London 2009 Data Grid Design Patterns Brian Oliver Global Solutions Architect brian.oliver@oracle.com Oracle Coherence Oracle Fusion Middleware Product Management Agenda Traditional

More information

An Introduction to Apache Spark

An Introduction to Apache Spark An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations

More information

Tutorial 8 Build resilient, responsive and scalable web applications with SocketPro

Tutorial 8 Build resilient, responsive and scalable web applications with SocketPro Tutorial 8 Build resilient, responsive and scalable web applications with SocketPro Contents: Introduction SocketPro ways for resilient, responsive and scalable web applications Vertical scalability o

More information

AWS Lambda + nodejs Hands-On Training

AWS Lambda + nodejs Hands-On Training AWS Lambda + nodejs Hands-On Training (4 Days) Course Description & High Level Contents AWS Lambda is changing the way that we build systems in the cloud. This new compute service in the cloud runs your

More information

Apache BookKeeper. A High Performance and Low Latency Storage Service

Apache BookKeeper. A High Performance and Low Latency Storage Service Apache BookKeeper A High Performance and Low Latency Storage Service Hello! I am Sijie Guo - PMC Chair of Apache BookKeeper Co-creator of Apache DistributedLog Twitter Messaging/Pub-Sub Team Yahoo! R&D

More information

SparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY

SparkStreaming. Large scale near- realtime stream processing. Tathagata Das (TD) UC Berkeley UC BERKELEY SparkStreaming Large scale near- realtime stream processing Tathagata Das (TD) UC Berkeley UC BERKELEY Motivation Many important applications must process large data streams at second- scale latencies

More information

The Stream Processor as a Database. Ufuk

The Stream Processor as a Database. Ufuk The Stream Processor as a Database Ufuk Celebi @iamuce Realtime Counts and Aggregates The (Classic) Use Case 2 (Real-)Time Series Statistics Stream of Events Real-time Statistics 3 The Architecture collect

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline 10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Copyright 2003 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4. Other Approaches

More information

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk Splunk & AWS Gain real-time insights from your data at scale Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk Forward-Looking Statements During the course of this presentation, we may

More information

Technical Note. Abstract

Technical Note. Abstract Technical Note Dell PowerEdge Expandable RAID Controllers 5 and 6 Dell PowerVault MD1000 Disk Expansion Enclosure Solution for Microsoft SQL Server 2005 Always On Technologies Abstract This technical note

More information

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu } Introduction } Architecture } File

More information

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified

More information

The Technology of the Business Data Lake. Appendix

The Technology of the Business Data Lake. Appendix The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform

More information

Building loosely coupled and scalable systems using Event-Driven Architecture. Jonas Bonér Patrik Nordwall Andreas Källberg

Building loosely coupled and scalable systems using Event-Driven Architecture. Jonas Bonér Patrik Nordwall Andreas Källberg Building loosely coupled and scalable systems using Event-Driven Architecture Jonas Bonér Patrik Nordwall Andreas Källberg Why is EDA Important for Scalability? What building blocks does EDA consists of?

More information

Implementing Replication. Overview of Replication Managing Publications and Subscriptions Configuring Replication in Some Common Scenarios

Implementing Replication. Overview of Replication Managing Publications and Subscriptions Configuring Replication in Some Common Scenarios Implementing Replication Overview of Replication Managing Publications and Subscriptions Configuring Replication in Some Common Scenarios Lesson 1: Overview of Replication Distributing and Synchronizing

More information

IBM Active Cloud Engine/Active File Management. Kalyan Gunda

IBM Active Cloud Engine/Active File Management. Kalyan Gunda IBM Active Cloud Engine/Active File Management Kalyan Gunda kgunda@in.ibm.com Agenda Need of ACE? Inside ACE Use Cases Data Movement across sites How do you move Data across sites today? FTP, Parallel

More information

Oracle Responsys. Release 18B. New Feature Summary ORACLE

Oracle Responsys. Release 18B. New Feature Summary ORACLE Oracle Responsys Release 18B New Feature Summary ORACLE TABLE OF CONTENTS Revision History 4 Overview 4 APIs 4 New Throttling Limits for Web Services APIs 4 New Asynchronous Web Services APIs 5 New REST

More information

Griddable.io architecture

Griddable.io architecture Griddable.io architecture Executive summary This whitepaper presents the architecture of griddable.io s smart grids for synchronized data integration. Smart transaction grids are a novel concept aimed

More information

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Managing IoT and Time Series Data with Amazon ElastiCache for Redis Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All

More information

Functionality, Challenges and Architecture of Social Networks

Functionality, Challenges and Architecture of Social Networks Functionality, Challenges and Architecture of Social Networks INF 5370 Outline Social Network Services Functionality Business Model Current Architecture and Scalability Challenges Conclusion 1 Social Network

More information

To Shard or Not to Shard That is the question! Peter Zaitsev April 21, 2016

To Shard or Not to Shard That is the question! Peter Zaitsev April 21, 2016 To Shard or Not to Shard That is the question! Peter Zaitsev April 21, 2016 Story Let s start with the story 2 First things to decide Before you decide how to shard you d best understand whether or not

More information

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

PASS4TEST. IT Certification Guaranteed, The Easy Way!   We offer free update service for one year PASS4TEST IT Certification Guaranteed, The Easy Way! \ http://www.pass4test.com We offer free update service for one year Exam : 0B0-105 Title : BEA8.1 Certified Architect:Enterprise Architecture Vendors

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

TSAR A TimeSeries AggregatoR. Anirudh Todi TSAR

TSAR A TimeSeries AggregatoR. Anirudh Todi TSAR TSAR A TimeSeries AggregatoR Anirudh Todi Twitter @anirudhtodi TSAR What is TSAR? What is TSAR? TSAR is a framework and service infrastructure for specifying, deploying and operating timeseries aggregation

More information

Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit

Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Redis Labs on POWER8 Server: The Promise of OpenPOWER Value Jeffrey L. Leeds, Ph.D. Vice President, Alliances & Channels Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Who We Are

More information

UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING

UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING UNIVERSITY OF TORONTO FACULTY OF APPLIED SCIENCE AND ENGINEERING ECE361 Computer Networks Midterm March 06, 2017, 6:15PM DURATION: 80 minutes Calculator Type: 2 (non-programmable calculators) Examiner:

More information

Today: Distributed Middleware. Middleware

Today: Distributed Middleware. Middleware Today: Distributed Middleware Middleware concepts Case study: CORBA Lecture 24, page 1 Middleware Software layer between application and the OS Provides useful services to the application Abstracts out

More information

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Storm at Twitter Twitter Web Analytics Before Storm Queues Workers Example (simplified) Example Workers schemify tweets and

More information

BUILDING MICROSERVICES ON AZURE. ~ Vaibhav

BUILDING MICROSERVICES ON AZURE. ~ Vaibhav BUILDING MICROSERVICES ON AZURE ~ Vaibhav Gujral @vabgujral About Me Over 11 years of experience Working with Assurant Inc. Microsoft Certified Azure Architect MCSD, MCP, Microsoft Specialist Aspiring

More information

Twitter Adaptation Layer Submitted for Drexel University s CS544

Twitter Adaptation Layer Submitted for Drexel University s CS544 Twitter Adaptation Layer Submitted for Drexel University s CS544 Josh Datko www.datko.net 9 June 2012 1 Description of Service The Twitter Adaptation Layer (TWAL) provides connected, best-effort-end-to-end

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/

More information

Improving efficiency of Twitter Infrastructure using Chargeback

Improving efficiency of Twitter Infrastructure using Chargeback Improving efficiency of Twitter Infrastructure using Chargeback @vinucharanya @micheal AGENDA Brief History Problem Chargeback Engineering Challenges The product Impact Future Getty Images from http://www.fifa.com/worldcup/news/y=2010/m=7/news=pride-for-africa-spain-strike-gold-2247372.html

More information

for Multi-Services Gateways

for Multi-Services Gateways KURA an OSGi-basedApplication Framework for Multi-Services Gateways Introduction & Technical Overview Pierre Pitiot Grenoble 19 février 2014 Multi-Service Gateway Approach ESF / Increasing Value / Minimizing

More information

1z0-479 oracle. Number: 1z0-479 Passing Score: 800 Time Limit: 120 min.

1z0-479 oracle. Number: 1z0-479 Passing Score: 800 Time Limit: 120 min. 1z0-479 oracle Number: 1z0-479 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 What is the role of a user data store in Oracle Identity Federation (OIF) 11g when it is configured as an Identity

More information

Technical Note. Dell/EMC Solutions for Microsoft SQL Server 2005 Always On Technologies. Abstract

Technical Note. Dell/EMC Solutions for Microsoft SQL Server 2005 Always On Technologies. Abstract Technical Note Dell/EMC Solutions for Microsoft SQL Server 2005 Always On Technologies Abstract This technical note provides information on the Dell/EMC storage solutions, based on the Microsoft SQL Server

More information

Building a Data-Friendly Platform for a Data- Driven Future

Building a Data-Friendly Platform for a Data- Driven Future Building a Data-Friendly Platform for a Data- Driven Future Benjamin Hindman - @benh 2016 Mesosphere, Inc. All Rights Reserved. INTRO $ whoami BENJAMIN HINDMAN Co-founder and Chief Architect of Mesosphere,

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Smart Client Offline Data Caching and Synchronization

Smart Client Offline Data Caching and Synchronization Smart Client Offline Data Caching and Synchronization Brian Noyes Principal Software Architect IDesign,, Inc. www.idesign.net Offline Operations Challenges 1 What is a Smart Client Rich user interface

More information

Which compute option is designed for the above scenario? A. OpenWhisk B. Containers C. Virtual Servers D. Cloud Foundry

Which compute option is designed for the above scenario? A. OpenWhisk B. Containers C. Virtual Servers D. Cloud Foundry 1. A developer needs to create support for a workload that is stateless and short-living. The workload can be any one of the following: - API/microservice /web application implementation - Mobile backend

More information

Tungsten Replicator for Kafka, Elasticsearch, Cassandra

Tungsten Replicator for Kafka, Elasticsearch, Cassandra Tungsten Replicator for Kafka, Elasticsearch, Cassandra Topics In todays session Replicator Basics Filtering and Glue Kafka and Options Elasticsearch and Options Cassandra Future Direction 2 Asynchronous

More information

Bringing Data to Life

Bringing Data to Life Bringing Data to Life Data management and Visualization Techniques Benika Hall Rob Harrison Corporate Model Risk March 16, 2018 Introduction Benika Hall Analytic Consultant Wells Fargo - Corporate Model

More information

Deep Learning Inference as a Service

Deep Learning Inference as a Service Deep Learning Inference as a Service Mohammad Babaeizadeh Hadi Hashemi Chris Cai Advisor: Prof Roy H. Campbell Use case 1: Model Developer Use case 1: Model Developer Inference Service Use case

More information

Everything You Need to Know About MySQL Group Replication

Everything You Need to Know About MySQL Group Replication Everything You Need to Know About MySQL Group Replication Luís Soares (luis.soares@oracle.com) Principal Software Engineer, MySQL Replication Lead Copyright 2017, Oracle and/or its affiliates. All rights

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo Document Sub Title Yotpo Technical Overview 07/18/2016 2015 Yotpo Contents Introduction... 3 Yotpo Architecture... 4 Yotpo Back Office (or B2B)... 4 Yotpo On-Site Presence... 4 Technologies... 5 Real-Time

More information

CSCI 466 Midterm Networks Fall 2013

CSCI 466 Midterm Networks Fall 2013 CSCI 466 Midterm Networks Fall 2013 Name: This exam consists of 6 problems on the following 7 pages. You may use your single-sided hand-written 8 ½ x 11 note sheet and a calculator during the exam. No

More information

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store Oracle NoSQL Database A Distributed Key-Value Store Charles Lamb The following is intended to outline our general product direction. It is intended for information purposes only,

More information

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

Oracle 10g and IPv6 IPv6 Summit 11 December 2003 Oracle 10g and IPv6 IPv6 Summit 11 December 2003 Marshal Presser Principal Enterprise Architect Oracle Corporation Agenda Oracle Distributed Computing Role of Networking IPv6 Support Plans Early IPv6 Implementations

More information

Evolution of an Apache Spark Architecture for Processing Game Data

Evolution of an Apache Spark Architecture for Processing Game Data Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead

More information

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011 Data Infrastructure at LinkedIn Shirshanka Das XLDB 2011 1 Me UCLA Ph.D. 2005 (Distributed protocols in content delivery networks) PayPal (Web frameworks and Session Stores) Yahoo! (Serving Infrastructure,

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud DATA INTEGRATION PLATFORM CLOUD Experience Powerful Integration in the Want a unified, powerful, data-driven solution for all your data integration needs? Oracle Integration simplifies your data integration

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

Cloud-Native Applications. Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0

Cloud-Native Applications. Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 Cloud-Native Applications Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 Cloud-Native Characteristics Lean Form a hypothesis, build just enough to validate or disprove it. Learn

More information

Spark Streaming. Guido Salvaneschi

Spark Streaming. Guido Salvaneschi Spark Streaming Guido Salvaneschi 1 Spark Streaming Framework for large scale stream processing Scales to 100s of nodes Can achieve second scale latencies Integrates with Spark s batch and interactive

More information

Surviving congestion in geo-distributed storage systems

Surviving congestion in geo-distributed storage systems Surviving congestion in geo-distributed storage systems Brian Cho Marcos K. Aguilera University of Illinois at Urbana-Champaign Microsoft Research Silicon Valley Geo-distributed data centers Web applications

More information

Going Serverless. Building Production Applications Without Managing Infrastructure

Going Serverless. Building Production Applications Without Managing Infrastructure Going Serverless Building Production Applications Without Managing Infrastructure Objectives of this talk Outline what serverless means Discuss AWS Lambda and its considerations Delve into common application

More information

The Evolution of Big Data Platforms and Data Science

The Evolution of Big Data Platforms and Data Science IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering

More information

Real-time data processing with Apache Flink

Real-time data processing with Apache Flink Real-time data processing with Apache Flink Gyula Fóra gyfora@apache.org Flink committer Swedish ICT Stream processing Data stream: Infinite sequence of data arriving in a continuous fashion. Stream processing:

More information

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

Discretized Streams. An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, Ion Stoica UC BERKELEY Motivation Many important

More information

Chapter 4 Communication

Chapter 4 Communication DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 4 Communication Layered Protocols (1) Figure 4-1. Layers, interfaces, and protocols in the OSI

More information

Build, Deploy & Operate Intelligent Chatbots with Amazon Lex

Build, Deploy & Operate Intelligent Chatbots with Amazon Lex Build, Deploy & Operate Intelligent Chatbots with Amazon Lex Ian Massingham AWS Technical Evangelist @IanMmmm aws.amazon.com/lex 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

Spark, Shark and Spark Streaming Introduction

Spark, Shark and Spark Streaming Introduction Spark, Shark and Spark Streaming Introduction Tushar Kale tusharkale@in.ibm.com June 2015 This Talk Introduction to Shark, Spark and Spark Streaming Architecture Deployment Methodology Performance References

More information

We are ready to serve Latest Testing Trends, Are you ready to learn? New Batch Details

We are ready to serve Latest Testing Trends, Are you ready to learn? New Batch Details We are ready to serve Latest Testing Trends, Are you ready to learn? START DATE : New Batch Details TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : SOAP UI, SOA Testing, API Testing,

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

Managing Copy Services

Managing Copy Services This chapter contains the following sections: Copy Services, page 1 Consistency Groups, page 10 Copy Services Both IBM Storwize and IBM SAN Volume Controllers provide Copy Services functions that enable

More information

Databases suck for Messaging

Databases suck for Messaging Databases suck for Messaging Alexis Richardson Oxford Geek Night May 2009 1 Computers were meant to get rid of this 2 A new kind of fail? 3 Solution - use a database? 4 Databases were meant to get rid

More information

STORM AND LOW-LATENCY PROCESSING.

STORM AND LOW-LATENCY PROCESSING. STORM AND LOW-LATENCY PROCESSING Low latency processing Similar to data stream processing, but with a twist Data is streaming into the system (from a database, or a netk stream, or an HDFS file, or ) We

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors.

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors. About the Tutorial Storm was originally created by Nathan Marz and team at BackType. BackType is a social analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short time, Apache

More information

Securent Entitlement Management Solution. v 3.1 GA. PDP and PEP Cache Clustering. September Part No. PDPPEPCACHE-31GA-1

Securent Entitlement Management Solution. v 3.1 GA. PDP and PEP Cache Clustering. September Part No. PDPPEPCACHE-31GA-1 Securent Entitlement Management Solution v 3.1 GA PDP and PEP Cache Clustering September 2007 Part No. PDPPEPCACHE-31GA-1 Copyright Copyright 2006-2007 Securent, Inc. All Rights Reserved. Restricted Rights

More information

Distributed Systems. Tutorial 9 Windows Azure Storage

Distributed Systems. Tutorial 9 Windows Azure Storage Distributed Systems Tutorial 9 Windows Azure Storage written by Alex Libov Based on SOSP 2011 presentation winter semester, 2011-2012 Windows Azure Storage (WAS) A scalable cloud storage system In production

More information

Improve WordPress performance with caching and deferred execution of code. Danilo Ercoli Software Engineer

Improve WordPress performance with caching and deferred execution of code. Danilo Ercoli Software Engineer Improve WordPress performance with caching and deferred execution of code Danilo Ercoli Software Engineer http://daniloercoli.com Agenda PHP Caching WordPress Page Caching WordPress Object Caching Deferred

More information

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage

More information

CPSC 441 COMPUTER COMMUNICATIONS MIDTERM EXAM

CPSC 441 COMPUTER COMMUNICATIONS MIDTERM EXAM CPSC 441 COMPUTER COMMUNICATIONS MIDTERM EXAM Department of Computer Science University of Calgary Professor: Carey Williamson November 1, 2005 This is a CLOSED BOOK exam. Textbooks, notes, laptops, personal

More information

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

How can you implement this through a script that a scheduling daemon runs daily on the application servers? You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups

More information

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Research challenges in data-intensive computing The Stratosphere Project Apache Flink Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive

More information

ITP 342 Mobile App Development. APIs

ITP 342 Mobile App Development. APIs ITP 342 Mobile App Development APIs API Application Programming Interface (API) A specification intended to be used as an interface by software components to communicate with each other An API is usually

More information

Communication. Overview

Communication. Overview Communication Chapter 2 1 Overview Layered protocols Remote procedure call Remote object invocation Message-oriented communication Stream-oriented communication 2 Layered protocols Low-level layers Transport

More information

Distributed Systems COMP 212. Lecture 15 Othon Michail

Distributed Systems COMP 212. Lecture 15 Othon Michail Distributed Systems COMP 212 Lecture 15 Othon Michail RPC/RMI vs Messaging RPC/RMI great in hiding communication in DSs But in some cases they are inappropriate What happens if we cannot assume that the

More information

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

CS60021: Scalable Data Mining. Sourangshu Bhattacharya CS60021: Scalable Data Mining Sourangshu Bhattacharya In this Lecture: Outline: HDFS Motivation HDFS User commands HDFS System architecture HDFS Implementation details Sourangshu Bhattacharya Computer

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

Search Engines and Time Series Databases

Search Engines and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18

More information