From the event loop to the distributed system. Martyn 3rd November, 2011

Size: px
Start display at page:

Download "From the event loop to the distributed system. Martyn 3rd November, 2011"

Transcription

1

2 From the event loop to the distributed system Martyn 3rd November, 2011

3 From the event loop to the distributed system

4 From the event loop to the distributed system An introduction to Pusher

5 From the event loop to the distributed system An introduction to Pusher The event loop

6 From the event loop to the distributed system An introduction to Pusher The event loop Why you d use it

7 From the event loop to the distributed system An introduction to Pusher The event loop Why you d use it Managing complexity

8 From the event loop to the distributed system An introduction to Pusher The event loop Why you d use it Managing complexity The distributed system

9 From the event loop to the distributed system An introduction to Pusher The event loop Why you d use it Managing complexity The distributed system Some general considerations

10 From the event loop to the distributed system An introduction to Pusher The event loop Why you d use it Managing complexity The distributed system Some general considerations Some specific problems and how we solved them

11 Who am I?

12 Who am I? Martyn Loughran

13 Who am I? Martyn Loughran CTO of Pusher

14 Who am I? Martyn Loughran CTO of Pusher We re based in London, England

15 Who am I? Martyn Loughran CTO of Pusher We re based in London, England Rubyist and EventMachine enthusiast

16 Who am I? Martyn Loughran CTO of Pusher We re based in London, England Rubyist and EventMachine enthusiast Started building Pusher in January 2010

17 Who am I? Martyn Loughran CTO of Pusher We re based in London, England Rubyist and EventMachine enthusiast Started building Pusher in January 2010 Eu não falo Português

18 Part I An introduction to Pusher

19 So what is Pusher anyway? I have stickers ->

20 So what is Pusher anyway? A web service which helps developers add real-time functionality to their web applications I have stickers ->

21 So what is Pusher anyway? A web service which helps developers add real-time functionality to their web applications It makes scaling easy I have stickers ->

22 So what is Pusher anyway? A web service which helps developers add real-time functionality to their web applications It makes scaling easy Complex distributed system I have stickers ->

23 Notifications

24 Notifications

25 Notifications

26 Chat

27 Collaboration

28 Data Sync CloudApp

29 WebSocket, the basics:

30 WebSocket, the basics: A pretty silly logo

31 WebSocket, the basics: A pretty silly logo Sockets for the web

32 WebSocket, the basics: A pretty silly logo Sockets for the web Bidirectional

33 WebSocket, the basics: A pretty silly logo Sockets for the web Bidirectional Low latency

34 WebSocket, the basics: A pretty silly logo Sockets for the web Bidirectional Low latency Bandwidth efficient

35 WebSocket, the basics: A pretty silly logo Sockets for the web Bidirectional Low latency Bandwidth efficient Already supported in Safari, Chrome, and Firefox

36 WebSocket, the basics: A pretty silly logo Sockets for the web Bidirectional Low latency Bandwidth efficient Already supported in Safari, Chrome, and Firefox Coming to IE in version 10

37 We use EventMachine

38 We use EventMachine Evented IO for Ruby

39 We use EventMachine Evented IO for Ruby em-websocket

40 We use EventMachine Evented IO for Ruby em-websocket em-hiredis

41 Ruby doesn t scale!

42 13,969,264 Number of API requests made yesterday (There are 86,400 seconds in a day)

43 35,552,810,379 Total number of messages sent to clients since launch

44 < 10ms Mean end to end latency (excluding the internet)

45 Part II The Event Loop

46 Why use an event loop?

47 Why use an event loop? To handle massive numbers of connections

48 Why use an event loop? To handle massive numbers of connections

49 Why use an event loop? To handle massive numbers of connections To share data without Mutexes

50 Why use an event loop? To handle massive numbers of connections To share data without Mutexes Efficient scheduling of work

51 Why use an event loop? To handle massive numbers of connections To share data without Mutexes Efficient scheduling of work

52 It s really easy to use in Ruby require 'eventmachine' EM.run do # Start a server # Make some network connections # Create a timer # etc. end

53 The Pusher event loop Redis Pubsub Redis WebSocket WebSocket WebSocket WebSocket Timer Timer Timer ZeroMQ

54 Never block the reactor

55 Don t get caught up in callback spaghetti

56 Using callbacks and deferrable objects EM.run { stream = TwitterStream.new('yourtwitterusername', 'pass', 'term') stream.ontweet { tweet LanguageDetector.new(tweet).callback { lang puts "New tweet in #{lang}: #{tweet}" } } }

57 Return a deferrable from a function def do_something_complex df = EM::DefaultDeferrable.new use_lots_of_callbacks {... { df.succeed(result) }... df.fail(error) } return df end

58 Pass a deferrable to a strategy Juggler.juggle(:send_webhook, 100) do df, job_params http = EM::HttpRequest.new(job_params['url'].post({ :body => job_params["data"] }) http.callback do response df.success end http.errback do df.fail end end

59 Or try Fibers

60 Part III The distributed system

61 A distributed system is a collection of independent computers that appears to its users as a single coherent system Distributed Systems: Principles and Paradigms, Tanenbaum and Steen 2006

62 The distributed system

63 The distributed system Why would I build one?

64 The distributed system Why would I build one? Work doesn t fit on a single machine any more

65 The distributed system Why would I build one? Work doesn t fit on a single machine any more You need better availability

66 The distributed system Why would I build one? Work doesn t fit on a single machine any more You need better availability

67 The distributed system Why would I build one? Work doesn t fit on a single machine any more You need better availability How can I make one?

68 The distributed system Why would I build one? Work doesn t fit on a single machine any more You need better availability How can I make one? Decouple the application so that each function is handled by a separate component

69 The distributed system Why would I build one? Work doesn t fit on a single machine any more You need better availability How can I make one? Decouple the application so that each function is handled by a separate component Scale components horizontally, and independantly

70 The distributed system Why would I build one? Work doesn t fit on a single machine any more You need better availability How can I make one? Decouple the application so that each function is handled by a separate component Scale components horizontally, and independantly Make components tolerant to failure

71 State

72 Messaging

73 Do not communicate by sharing memory; instead, share memory by communicating. Effective Go, Google State Messaging

74 State: CAP theorem It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: - Consistency (all nodes see the same data at the same time) - Availability (a guarantee that every request receives a response about whether it was successful or failed) - Partition tolerance (the system continues to operate despite arbitrary message loss)

75 State: More questions

76 State: More questions What performance do you need?

77 State: More questions What performance do you need? How durable does it need to be?

78 State: More questions What performance do you need? How durable does it need to be? How much data do you need to store?

79 State: More questions What performance do you need? How durable does it need to be? How much data do you need to store? Does it need to be highly available?

80 State: More questions What performance do you need? How durable does it need to be? How much data do you need to store? Does it need to be highly available? Does it need to be consistent / eventually consistent?

81 State so what do we use?

82 MySQL ~ 20GB

83 MySQL ~ 20GB Consistent

84 MySQL ~ 20GB Consistent Durable

85 MySQL ~ 20GB Consistent Durable Not highly available - but this doesn t matter

86 MySQL ~ 20GB Consistent Durable Not highly available - but this doesn t matter

87 MySQL ~ 20GB Consistent Durable Not highly available - but this doesn t matter Rails models

88 MySQL ~ 20GB Consistent Durable Not highly available - but this doesn t matter Rails models Aggregated usage statistics

89 Redis ~ 500MB

90 Redis ~ 500MB Consistent

91 Redis ~ 500MB Consistent Very fast

92 Redis ~ 500MB Consistent Very fast

93 Redis ~ 500MB Consistent Very fast Shared memory for all processes

94 Redis ~ 500MB Consistent Very fast Shared memory for all processes Some current statistics, waiting to be aggregated

95 ZooKeeper ~ 1MB

96 ZooKeeper ~ 1MB Slow

97 ZooKeeper ~ 1MB Slow Consistent

98 ZooKeeper ~ 1MB Slow Consistent Highly available

99 ZooKeeper ~ 1MB Slow Consistent Highly available Not partition tolerant

100 ZooKeeper ~ 1MB Slow Consistent Highly available Not partition tolerant

101 ZooKeeper ~ 1MB Slow Consistent Highly available Not partition tolerant Processes state, and assignment of roles

102 Messaging

103 Messaging Central broker

104 Messaging Central broker AMQP - the SQL of messaging?

105 Messaging Central broker AMQP - the SQL of messaging? A single all powerful box

106 Messaging Central broker AMQP - the SQL of messaging? A single all powerful box Simple, but hard to scale

107 Messaging Central broker AMQP - the SQL of messaging? A single all powerful box Simple, but hard to scale Custom messaging topologies

108 Messaging Central broker AMQP - the SQL of messaging? A single all powerful box Simple, but hard to scale Custom messaging topologies ZeroMQ - point to point, fanout, pubsub, load balanced

109 Messaging Central broker AMQP - the SQL of messaging? A single all powerful box Simple, but hard to scale Custom messaging topologies ZeroMQ - point to point, fanout, pubsub, load balanced Lots of choices, therefore complex

110 Messaging Central broker AMQP - the SQL of messaging? A single all powerful box Simple, but hard to scale Custom messaging topologies ZeroMQ - point to point, fanout, pubsub, load balanced Lots of choices, therefore complex This is the future, but we re not quite there yet

111 Messaging what do we use?

112 Messaging what do we use? Redis pub/sub

113 Messaging what do we use? Redis pub/sub ZeroMQ

114 Messaging what do we use? Redis pub/sub ZeroMQ Beanstalkd

115 Some examples

116 Usage statistics and latency metrics In memory Redis MySQL

117 Usage statistics and latency metrics Loads of events In memory Redis MySQL

118 Usage statistics and latency metrics Loads of events Collect incrementers and distributions in memory In memory Redis MySQL

119 Usage statistics and latency metrics Loads of events Collect incrementers and distributions in memory Flush to redis every minute In memory Redis MySQL

120 Usage statistics and latency metrics Loads of events Collect incrementers and distributions in memory Flush to redis every minute Eventually consistent state In memory Redis MySQL

121 Storing presence information

122 Storing presence information Need to know when a user joins or leaves a channel

123 Storing presence information Need to know when a user joins or leaves a channel Needs to be consistent across processes

124 Storing presence information Need to know when a user joins or leaves a channel Needs to be consistent across processes Use redis incrementers

125 Storing presence information Need to know when a user joins or leaves a channel Needs to be consistent across processes Use redis incrementers Needs to survive process failure Use a global hash, and a hash per process, with redis transactions

126 Storing presence information Need to know when a user joins or leaves a channel Needs to be consistent across processes Use redis incrementers Needs to survive process failure Use a global hash, and a hash per process, with redis transactions Consistent state

127 Optimising internal messaging

128 Optimising internal messaging Debug console shows all events for all connections

129 Optimising internal messaging Debug console shows all events for all connections Unnecessary messaging, most of the time

130 Optimising internal messaging Debug console shows all events for all connections Unnecessary messaging, most of the time Only publish data when it s needed

131 Optimising internal messaging Debug console shows all events for all connections Unnecessary messaging, most of the time Only publish data when it s needed Eventually consistent, distributed state

132 Redis caches, and live caches # (pseudo simplified version) set = RedisLiveSet.new("debug_open") set.add('42') # redis.sadd("debug_open", 42) # redis.publish("debug_open", ["sadd", "42"]) # On another process set.member?('42') # Checks the in memory set

133 Recovering from process failure

134 Recovering from process failure Store process UUIDs in ZooKeeper as ephemeral files

135 Recovering from process failure Store process UUIDs in ZooKeeper as ephemeral files Leader process notices process failure, and takes required action

136 Recovering from process failure Store process UUIDs in ZooKeeper as ephemeral files Leader process notices process failure, and takes required action Low volume, highly available, and consistent

137 Some other thoughts...

138 Avoid configuration

139 Distributed locking

140 Delay anything you can

141 Think about concurrency

142 In Conclusion

143 In Conclusion Consider an event loop for concurrency

144 In Conclusion Consider an event loop for concurrency EventMachine is great, you don t need to use node.js

145 In Conclusion Consider an event loop for concurrency EventMachine is great, you don t need to use node.js Think about state & messaging It s all about compromises; there are no right answers

146 In Conclusion Consider an event loop for concurrency EventMachine is great, you don t need to use node.js Think about state & messaging It s all about compromises; there are no right answers Find creative solutions to your problems

147 any questions? We re hiring (in London) Come and talk about WebSockets, EM or whatever Thanks for listening Martyn try pusher, it s great! I have stickers, come and get one! ->

How you can benefit from using. javier

How you can benefit from using. javier How you can benefit from using I was Lois Lane redis has super powers myth: the bottleneck redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop,mset -P 16 -q On my laptop: SET: 513610 requests

More information

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems Distributed Architectures & Microservices CS 475, Spring 2018 Concurrent & Distributed Systems GFS Architecture GFS Summary Limitations: Master is a huge bottleneck Recovery of master is slow Lots of success

More information

Distributed Computation Models

Distributed Computation Models Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

Data Consistency Now and Then

Data Consistency Now and Then Data Consistency Now and Then Todd Schmitter JPMorgan Chase June 27, 2017 Room #208 Data consistency in real life Social media Facebook post: January 22, 2017, at a political rally Comments displayed are

More information

Get it done. One event at a time. How I learned to stop worrying and love EventMachine.

Get it done. One event at a time. How I learned to stop worrying and love EventMachine. Get it done. One event at a time. How I learned to stop worrying and love EventMachine. 1 Who Am I Dan Sinclair (dan@aiderss.com) AideRSS code monkey 2 Overview What Why How Gotchas 3 What is EventMachine

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

IERG 4080 Building Scalable Internet-based Services

IERG 4080 Building Scalable Internet-based Services Department of Information Engineering, CUHK Term 1, 2016/17 IERG 4080 Building Scalable Internet-based Services Lecture 7 Asynchronous Tasks and Message Queues Lecturer: Albert C. M. Au Yeung 20 th & 21

More information

IERG 4080 Building Scalable Internet-based Services

IERG 4080 Building Scalable Internet-based Services Department of Information Engineering, CUHK MScIE 2 nd Semester, 2015/16 IERG 4080 Building Scalable Internet-based Services Lecture 9 Web Sockets for Real-time Communications Lecturer: Albert C. M. Au

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Building loosely coupled and scalable systems using Event-Driven Architecture. Jonas Bonér Patrik Nordwall Andreas Källberg

Building loosely coupled and scalable systems using Event-Driven Architecture. Jonas Bonér Patrik Nordwall Andreas Källberg Building loosely coupled and scalable systems using Event-Driven Architecture Jonas Bonér Patrik Nordwall Andreas Källberg Why is EDA Important for Scalability? What building blocks does EDA consists of?

More information

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013 PNUTS: Yahoo! s Hosted Data Serving Platform Reading Review by: Alex Degtiar (adegtiar) 15-799 9/30/2013 What is PNUTS? Yahoo s NoSQL database Motivated by web applications Massively parallel Geographically

More information

NoSQL Databases Analysis

NoSQL Databases Analysis NoSQL Databases Analysis Jeffrey Young Intro I chose to investigate Redis, MongoDB, and Neo4j. I chose Redis because I always read about Redis use and its extreme popularity yet I know little about it.

More information

Tutorial 8 Build resilient, responsive and scalable web applications with SocketPro

Tutorial 8 Build resilient, responsive and scalable web applications with SocketPro Tutorial 8 Build resilient, responsive and scalable web applications with SocketPro Contents: Introduction SocketPro ways for resilient, responsive and scalable web applications Vertical scalability o

More information

8 Simple Ways To Free Up Space On Your iphone Without Deleting Photos

8 Simple Ways To Free Up Space On Your iphone Without Deleting Photos 8 Simple Ways To Free Up Space On Your iphone Without Deleting Photos LIKE US ON FACEBOOK BY MUSTAFA GATOLLARI DECEMBER 30, 2016 6:41 AM I m not going to launch into a long-winded diatribe about how Apple

More information

IEMS 5722 Mobile Network Programming and Distributed Server Architecture

IEMS 5722 Mobile Network Programming and Distributed Server Architecture Department of Information Engineering, CUHK MScIE 2 nd Semester, 2016/17 IEMS 5722 Mobile Network Programming and Distributed Server Architecture Lecture 9 Asynchronous Tasks & Message Queues Lecturer:

More information

Building next-gen Web Apps with WebSocket. Copyright Kaazing Corporation. All rights reserved.

Building next-gen Web Apps with WebSocket. Copyright Kaazing Corporation. All rights reserved. Building next-gen Web Apps with WebSocket Copyright 2011 - Kaazing Corporation. All rights reserved. Who am I? Graham Gear Solution Architect, with Kaazing, purveyors of HTML5 enabling tech Based in London,

More information

RingBase. Design Specification. March 4, Chandra Krintz CS 189A. Wednesday 6PM. Date March 4, 2014 Mentor Colin Kelley

RingBase. Design Specification. March 4, Chandra Krintz CS 189A. Wednesday 6PM. Date March 4, 2014 Mentor Colin Kelley RingBase Design Specification March 4, 2014 Group Name: RingBase Instructor Course Lab Section Teaching Assistant Chandra Krintz CS 189A Wednesday 6PM Geoffrey Douglas Date March 4, 2014 Mentor Colin Kelley

More information

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm

More information

AWS Lambda + nodejs Hands-On Training

AWS Lambda + nodejs Hands-On Training AWS Lambda + nodejs Hands-On Training (4 Days) Course Description & High Level Contents AWS Lambda is changing the way that we build systems in the cloud. This new compute service in the cloud runs your

More information

Apache ZooKeeper and orchestration in distributed systems. Andrew Kondratovich

Apache ZooKeeper and orchestration in distributed systems. Andrew Kondratovich Apache ZooKeeper and orchestration in distributed systems Andrew Kondratovich andrew.kondratovich@gmail.com «A distributed system is one in which the failure of a computer you didn't even know existed

More information

Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018

Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018 Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018 Presenting Today Jon Hyman CTO & Co-Founder Braze (Formerly Appboy) @jon_hyman Mobile is at the vanguard of a new wave

More information

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

CA464 Distributed Programming

CA464 Distributed Programming 1 / 25 CA464 Distributed Programming Lecturer: Martin Crane Office: L2.51 Phone: 8974 Email: martin.crane@computing.dcu.ie WWW: http://www.computing.dcu.ie/ mcrane Course Page: "/CA464NewUpdate Textbook

More information

Scaling for Humongous amounts of data with MongoDB

Scaling for Humongous amounts of data with MongoDB Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

Kaazing. Connect. Everything. WebSocket The Web Communication Revolution

Kaazing. Connect. Everything. WebSocket The Web Communication Revolution Kaazing. Connect. Everything. WebSocket The Web Communication Revolution 1 Copyright 2011 Kaazing Corporation Speaker Bio John Fallows Co-Founder: Kaazing, At the Heart of the Living Web Co-Author: Pro

More information

Scaling Slack. Bing Wei

Scaling Slack. Bing Wei Scaling Slack Bing Wei Infrastructure@Slack 2 3 Our Mission: To make people s working lives simpler, more pleasant, and more productive. 4 From supporting small teams To serving gigantic organizations

More information

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Introduction Version: October 25, 2009 2 / 26 Contents Chapter

More information

Engineering Robust Server Software

Engineering Robust Server Software Engineering Robust Server Software Scalability Other Scalability Issues Database Load Testing 2 Databases Most server applications use databases Very complex pieces of software Designed for scalability

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition. Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Version: February 21, 2011 1 / 26 Contents Chapter 01: 02: Architectures

More information

Mega-scale Postgres How to run 1,000,000 Postgres Databases

Mega-scale Postgres How to run 1,000,000 Postgres Databases Mega-scale Postgres How to run 1,000,000 Postgres Databases Program What is Heroku & Heroku Postgres? Organizing principles for mega-scale operations Heroku Postgres Code deployment is good, but what

More information

0-1 Million in 46 Days Scaling a Facebook Application in Rails

0-1 Million in 46 Days Scaling a Facebook Application in Rails 0-1 Million in 46 Days Scaling a Facebook Application in Rails Ikai Lan Linkedin Ikai Lan From 0 to 1,000,000 in 46 Days: Scaling a Facebook Application in Rails Slide 1 Hi! I m Ikai Lan Ikai Lan From

More information

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System Overview The current paradigm (CCL and Relational DataBase) Propose of a new monitor data system using NoSQL Monitoring Storage Requirements

More information

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 416 NoSQL Databases NoSQL Databases Overview Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Databases Brought Back to Life!!! Image copyright: www.dragoart.com Image

More information

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2015 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 1 Question 1 Why did the use of reference counting for remote objects prove to be impractical? Explain. It s not fault

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (2/2) March 16, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor

More information

November 2017 WebRTC for Live Media and Broadcast Second screen and CDN traffic optimization. Author: Jesús Oliva Founder & Media Lead Architect

November 2017 WebRTC for Live Media and Broadcast Second screen and CDN traffic optimization. Author: Jesús Oliva Founder & Media Lead Architect November 2017 WebRTC for Live Media and Broadcast Second screen and CDN traffic optimization Author: Jesús Oliva Founder & Media Lead Architect Introduction It is not a surprise if we say browsers are

More information

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR Petri Kero CTO / Ministry of Games MOBILE GAME BACKEND CHALLENGES Lots of concurrent users Complex interactions between players Persistent world with frequent

More information

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University 11/5/2018 CS435 Introduction to Big Data - FALL 2018 W12.A.0.0 CS435 Introduction to Big Data 11/5/2018 CS435 Introduction to Big Data - FALL 2018 W12.A.1 Consider a Graduate Degree in Computer Science

More information

Don t Give Up on Serializability Just Yet. Neha Narula

Don t Give Up on Serializability Just Yet. Neha Narula Don t Give Up on Serializability Just Yet Neha Narula Don t Give Up on Serializability Just Yet A journey into serializable systems Neha Narula MIT CSAIL GOTO Chicago May 2015 2 @neha PhD candidate at

More information

Microservices, Messaging and Science Gateways. Review microservices for science gateways and then discuss messaging systems.

Microservices, Messaging and Science Gateways. Review microservices for science gateways and then discuss messaging systems. Microservices, Messaging and Science Gateways Review microservices for science gateways and then discuss messaging systems. Micro- Services Distributed Systems DevOps The Gateway Octopus Diagram Browser

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options

Choosing a MySQL HA Solution Today. Choosing the best solution among a myriad of options Choosing a MySQL HA Solution Today Choosing the best solution among a myriad of options Questions...Questions...Questions??? How to zero in on the right solution You can t hit a target if you don t have

More information

The Gearman Cookbook OSCON Eric Day Senior Software Rackspace

The Gearman Cookbook OSCON Eric Day  Senior Software Rackspace The Gearman Cookbook OSCON 2010 Eric Day http://oddments.org/ Senior Software Engineer @ Rackspace Thanks for being here! OSCON 2010 The Gearman Cookbook 2 Ask questions! Grab a mic for long questions.

More information

Get it done. One event at a time. How I learned to stop worrying and love EventMachine.

Get it done. One event at a time. How I learned to stop worrying and love EventMachine. Get it done. One event at a time. How I learned to stop worrying and love EventMachine. Table of Contents Introduction... 3 Getting Started... 5 Timers... 6 Deferring and Delaying Work... 8 EM#next_tick...

More information

Module 6 Node.js and Socket.IO

Module 6 Node.js and Socket.IO Module 6 Node.js and Socket.IO Module 6 Contains 2 components Individual Assignment and Group Assignment Both are due on Wednesday November 15 th Read the WIKI before starting Portions of today s slides

More information

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8) PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle

More information

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

More information

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ] Database Availability and Integrity in NoSQL Fahri Firdausillah [M031010012] What is NoSQL Stands for Not Only SQL Mostly addressing some of the points: nonrelational, distributed, horizontal scalable,

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

Corey Clark PhD Daniel Montgomery

Corey Clark PhD Daniel Montgomery Corey Clark PhD Daniel Montgomery Web Dev Platform Cross Platform Cross Browser WebGL HTML5 Web Socket Web Worker Hardware Acceleration Optimized Communication Channel Parallel Processing JaHOVA OS Kernel

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Triple R Riak, Redis and RabbitMQ at XING

Triple R Riak, Redis and RabbitMQ at XING Triple R Riak, Redis and RabbitMQ at XING Dr. Stefan Kaes, Sebastian Röbke NoSQL matters Cologne, April 27, 2013 ActivityStream Intro 3 Types of Feeds News Feed Me Feed Company Feed Activity Creation

More information

Scaling DreamFactory

Scaling DreamFactory Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud

More information

GETTING STARTED 8 December 2016

GETTING STARTED 8 December 2016 GETTING STARTED 8 December 2016 About Platform... 4 Browser support... 5 Registration Registering as a Teacher... 6 Registering as a Student... 6 Registering as School... 6 Registering as Municipality

More information

A short-term plan for Redis

A short-term plan for Redis A short-term plan for Redis @antirez - Pivotal Redis is made of pieces Transactions Replication Storage API Scripting Sentinel Pub/Sub CLI Cluster Persistence Networking Evolution Redis can be analyzed

More information

App Engine: Datastore Introduction

App Engine: Datastore Introduction App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and

More information

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems ZooKeeper & Curator CS 475, Spring 2018 Concurrent & Distributed Systems Review: Agreement In distributed systems, we have multiple nodes that need to all agree that some object has some state Examples:

More information

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016 DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016 AGENDA FOR TODAY Advanced Mysql More than just SELECT Creating tables MySQL optimizations: Storage engines, indexing.

More information

Panoptes: A Network Telemetry Ecosystem - Part Deux

Panoptes: A Network Telemetry Ecosystem - Part Deux Panoptes: A Network Telemetry Ecosystem - Part Deux Panoptes is: Greenfield Python based network telemetry platform that provides real time telemetry and analytics @ Yahoo Implements discovery, polling,

More information

Large-scale Game Messaging in Erlang at IMVU

Large-scale Game Messaging in Erlang at IMVU Large-scale Game Messaging in Erlang at IMVU Jon Watte Technical Director, IMVU Inc @jwatte / #erlangfactory Presentation Overview Describe the problem Low-latency game messaging and state distribution

More information

Back-end architecture

Back-end architecture Back-end architecture Tiberiu Vilcu Prepared for EECS 411 Sugih Jamin 2 January 2018 https://education.github.com/pack 1 2 Outline HTTP 1. HTTP and useful web tools 2. Designing APIs 3. Back-end services

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 7: Mutable State (2/2) November 13, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

The Application Layer HTTP and FTP

The Application Layer HTTP and FTP The Application Layer HTTP and FTP File Transfer Protocol (FTP) Allows a user to copy files to/from remote hosts Client program connects to FTP server provides a login id and password allows the user to

More information

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016 Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network

More information

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

GridGain and Apache Ignite In-Memory Performance with Durability of Disk GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite

More information

MySQL Database Scalability

MySQL Database Scalability MySQL Database Scalability Nextcloud Conference 2016 TU Berlin Oli Sennhauser Senior MySQL Consultant at FromDual GmbH oli.sennhauser@fromdual.com 1 / 14 About FromDual GmbH Support Consulting remote-dba

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol

More information

Making RAMCloud Writes Even Faster

Making RAMCloud Writes Even Faster Making RAMCloud Writes Even Faster (Bring Asynchrony to Distributed Systems) Seo Jin Park John Ousterhout Overview Goal: make writes asynchronous with consistency. Approach: rely on client Server returns

More information

RabbitMQ Overview. Tony Garnock-Jones

RabbitMQ Overview. Tony Garnock-Jones RabbitMQ Overview Tony Garnock-Jones Agenda AMQP in 3 minutes RabbitMQ architecture Availability, Clustering, Federation Durability, Persistence, Memory usage Security Operational Tools

More information

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc. PROFESSIONAL NoSQL Shashank Tiwari WILEY John Wiley & Sons, Inc. Examining CONTENTS INTRODUCTION xvil CHAPTER 1: NOSQL: WHAT IT IS AND WHY YOU NEED IT 3 Definition and Introduction 4 Context and a Bit

More information

CSC443: Web Programming 2

CSC443: Web Programming 2 CSC443: Web Programming Lecture 20: Web Sockets Haidar M. Harmanani HTML5 WebSocket Standardized by IETF in 2011. Supported by most major browsers including Google Chrome, Internet Explorer, Firefox, Safari

More information

MySQL. The Right Database for GIS Sometimes

MySQL. The Right Database for GIS Sometimes MySQL The Right Database for GIS Sometimes Who am I? Web/GIS Software Engineer with Cimbura.com BS in IT, MGIS Michael Moore I like making and using tools (digital or physical) GIS Web Services I m most

More information

Data Management CS 4720 Mobile Application Development

Data Management CS 4720 Mobile Application Development Data Management Mobile Application Development Desktop Applications What are some common applications you use day-to-day? Browser (Chrome, Firefox, Safari, etc.) Music Player (Spotify, itunes, etc.) Office

More information

Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud

Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud Sudipto Das, Divyakant Agrawal, Amr El Abbadi Report by: Basil Kohler January 4, 2013 Prerequisites This report elaborates and

More information

Introduction to NoSQL

Introduction to NoSQL Introduction to NoSQL Agenda History What is NoSQL Types of NoSQL The CAP theorem History - RDBMS Relational DataBase Management Systems were invented in the 1970s. E. F. Codd, "Relational Model of Data

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another

More information

Reactor Pattern & Event-Driven Programming

Reactor Pattern & Event-Driven Programming Reactor Pattern & Event-Driven Programming A scalable concurrent approach, using EventMachine with Thin as an example Lin Jen-Shin, http://godfat.org/ Reactor Pattern & Event-Driven Programming A scalable

More information

Databases suck for Messaging

Databases suck for Messaging Databases suck for Messaging Alexis Richardson Oxford Geek Night May 2009 1 Computers were meant to get rid of this 2 A new kind of fail? 3 Solution - use a database? 4 Databases were meant to get rid

More information

Huge market -- essentially all high performance databases work this way

Huge market -- essentially all high performance databases work this way 11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

TRANSACTIONS AND ABSTRACTIONS

TRANSACTIONS AND ABSTRACTIONS TRANSACTIONS AND ABSTRACTIONS OVER HBASE Andreas Neumann @anew68! Continuuity AGENDA Transactions over HBase: Why? What? Implementation: How? The approach Transaction Manager Abstractions Future WHO WE

More information

CTI-TC Weekly Working Sessions

CTI-TC Weekly Working Sessions CTI-TC Weekly Working Sessions Meeting Date: October 4, 2016 Time: 15:00:00 UTC Purpose: Weekly CTI-TC Joint Working Session Attendees: Agenda: Jordan Trey Darley Wunder Ivan Kirillov Stephen Banghart

More information

the road to cloud native applications Fabien Hermenier

the road to cloud native applications Fabien Hermenier the road to cloud native applications Fabien Hermenier 1 cloud ready applications single-tiered monolithic hardware specific cloud native applications leverage cloud services scalable reliable 2 Agenda

More information

Designing a scalable twitter

Designing a scalable twitter Designing a scalable twitter Nati Shalom, CTO & Founder Gigas John D. Mitchell Mad Scientist of Friendster. a2 About Gigas Technologies Enabling applications to run a distributed cluster as if it was a

More information

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques Recap Distributed Systems Case Study: Amazon Dynamo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A? Eventual consistency? Availability and partition tolerance over consistency

More information

Evolution of the "Web

Evolution of the Web Evolution of the "Web App" @HenrikJoreteg @Hoarse_JS THIS USED TO BE SIMPLE! 1. WRITE SOME HTML 2. LAY IT OUT WITH FRAMES OR TABLES 3. FTP IT TO A SERVER! 4. BAM! CONGRATULATIONS, YOU RE A WEB DEVELOPER!

More information

RailsConf Europe 2008 Juggernaut Realtime Rails. Alex MacCaw and Stuart Eccles

RailsConf Europe 2008 Juggernaut Realtime Rails. Alex MacCaw and Stuart Eccles RailsConf Europe 2008 Juggernaut Realtime Rails Alex MacCaw and Stuart Eccles RailsConf Europe 2008 Juggernaut Realtime Rails Alex MacCaw and Stuart Eccles http://www.madebymany.co.uk/ server push HTTP

More information

WELCOME

WELCOME WELCOME Josh Josh Kalderimis @j2h github.com/joshk #38ish Wellington NEW ZEALAND Amsterdam but now... before we get going... -35 -35 WAT!! Desconstruindo Travis LOGGING METRICS MONITORING

More information

The Right Read Optimization is Actually Write Optimization. Leif Walsh

The Right Read Optimization is Actually Write Optimization. Leif Walsh The Right Read Optimization is Actually Write Optimization Leif Walsh leif@tokutek.com The Right Read Optimization is Write Optimization Situation: I have some data. I want to learn things about the world,

More information