Ken Birman. Cornell Dept of Computer Science Colloquium

Size: px
Start display at page:

Download "Ken Birman. Cornell Dept of Computer Science Colloquium"

Transcription

1 Ken Birman Sept 24, 2009 Cornell Dept of Computer Science Colloquium 1

2 My career has been focused on multicast Mostly, in support of service replication and this was important in the old world Client server computing, dominated by databases ACID consistency guarantees were the key The virtual synchrony model is a form of ACID property But a new world has displaced the old one Sept 24, 2009 Cornell Dept of Computer Science Colloquium 2

3 Sept 24, 2009 Cornell Dept of Computer Science Colloquium 3

4 Web 2.0 Mashups Simple ways to create and share collaboration and social network applications [Try it! Examples: Live Objects, Google Wave, Javascript/AJAX, Silverlight, Java Fx, Adobe FLEX and AIR, etc. Sept 24, 2009 Cornell Dept of Computer Science Colloquium 4

5 Cloud computing platforms have a characteristic style Client system is a browser or an application pretending to be a browser (web services). If a request times out the client will retry it Cloud platform has a massive number of servers facing the clients. These are stateless and handle client requests with a focus on speed. If one fails, there is no warning. The cluster manager just restarts it, perhaps elsewhere Behind the servers is a row of stateful (usually database) servers. These operate off some form of work queues, usually implemented as enterprise service buses (ESBs). Think publish subscribe with logging. They push updates to the client facing systems. Finally, there is a big pool of machines to run backend applications. This looks like a loosely coupled collection of Linux machines sharing a file system and a lock manager, and often programmed mostly with Hadoop (MapReduce). Sept 24, 2009 Cornell Dept of Computer Science Colloquium 5

6 Client systems use web technologies , file storage, IM, search Web services Databases, files with data used by servers ESB glues it together Google/IBM/Amazon/Yahoo! host the services Embarrassingly parallel backend tasks prepare the databases and files

7 Range in size from edge facilities to megascale. Incredible economies of scale Approximate costs for a small size center (1K servers) and a larger, 50K server center. Technology Cost in smallsized Data Center Cost in Large Data Center Cloud Advantage Network $95 per Mbps/ month $13 per Mbps/ month 7.1 Storage Administratio n $2.20 per GB/ month ~140 servers/ Administrator $0.40 per GB/ month >1000 Servers/ Administrator Each data center is 11.5 times the size of a football field Slides 4 17 provided by Roger Barga, Head of Cloud Computing, Microsoft

8 Old world: we replicated servers for speed and availability, but maintained consistency New world: scalability matters most of all Lots of replication, but weak consistency Sept 24, 2009 Cornell Dept of Computer Science Colloquium 8

9 Why can t we have both? Scalability Consistency too Sept 24, 2009 Cornell Dept of Computer Science Colloquium 9

10 Sept 24, 2009 Cornell Dept of Computer Science Colloquium 10

11 Why do reliability/consistency mechanisms scale poorly? Focus on replication supported by multicast Can we do something about it? For example, can we fix the whole mess? How would a truly scalable platform look? Sept 24, 2009 Cornell Dept of Computer Science Colloquium 11

12 Sept 24, 2009 Cornell Dept of Computer Science Colloquium 12

13 As described by Randy Shoup at LADIS 2008 Thou shalt 1. Partition Everything 2. Use Asynchrony Everywhere 3. Automate Everything 4. Remember: Everything Fails 5. Embrace Inconsistency Sept 24, 2009 Cornell Dept of Computer Science Colloquium 13

14 Werner Vogels is CTO at Amazon.com His first act? He banned multicast! Amazon was troubled by platform instability Vogels decreed: all communication via SOAP/TCP This was slower but Stability matters more than speed Sept 24, 2009 Cornell Dept of Computer Science Colloquium 14

15 Key to scalability is decoupling, loosest possible synchronization Any synchronized mechanism is a risk His approach: create a committee Anyone who wants to deploy a highly consistent mechanism needs committee approval. They don t meet very often Sept 24, 2009 Cornell Dept of Computer Science Colloquium 15

16 Consistency technologies just don t scale! Sept 11, 24, 2009 P2P Cornell 2009 Dept Seattle, of Computer Washington Science Colloquium 16

17 This is the common thread All three guys Really build massive data centers, that work And are opposed to consistency mechanisms Sept 24, 2009 Cornell Dept of Computer Science Colloquium 17

18 A consistent distributed system will often have many components, but users observe behavior indistinguishable from that of a single component reference system Reference Model Implementation Sept 24, 2009 Cornell Dept of Computer Science Colloquium 18

19 Transactions that update replicated data Atomic broadcast or other forms of reliable multicast protocols Distributed 2 phase locking mechanisms Sept 24, 2009 Cornell Dept of Computer Science Colloquium 19

20 A=3 B=7 B = B A A=A+1 Non replicated reference execution Synchronous execution Virtually synchronous execution Virtual synchrony is a consistency model: Synchronous runs: indistinguishable from non replicated object that saw the same updates (like Paxos) Virtually synchronous runs are indistinguishable from synchronous runs Sept 24, 2009 Cornell Dept of Computer Science Colloquium 20

21 Inconsistency causes bugs Clients would never be able to trust servers a free for all My rent check bounced? That can t be right! Weak or best effort consistency? Jason Fane Properties Sept 2009 Tommy Tenant Strong security guarantees demand consistency Would you trust a medical electronic health records system or a bank that used weak consistency for better scalability? Sept 24, 2009 Cornell Dept of Computer Science Colloquium 21

22 They think consistency mechanisms are capable of destabilizing the whole data center They typically run over UDP and IP multicast As a result, are fast in the laboratory, but lack flow control. If a node gets overwhelmed they just keep on sending, and things get worse and worse They also believe that these mechanisms tend to cause synchronization over sets of nodes, which can cause load oscillations Sept 24, 2009 Cornell Dept of Computer Science Colloquium 22

23 A blend of stories (ebay, Amazon, Yahoo): Pub sub message bus very popular. System scaled up. Rolled out a faster ethernet. Product uses IPMC to accelerate sending All goes well until one day, under heavy load, loss rates spike, triggering collapse Oscillation observed Sept 24, 2009 Cornell Dept of Computer Science Colloquium 23

24 Without IP multicast (IPMC)? Pub/sub struggles to disseminate data This inevitably limits speed, scalability But IPMC itself scales poorly! Routers and NICs become promiscuous if we use large numbers of multicast addresses [Tock] Then everyone gets every IPMC packet Sept 24, 2009 Cornell Dept of Computer Science Colloquium 24

25 Routers and NICs use Bloom Filters for quick forwarding/receive decisions Should I forward packet P on link L? Is this packet one I should accept? Such a filter is just an array of k bit vectors Hash message k times; test/set bit b k in bvec[k] Too many addresses cause the filters to fill up and high false positive rate ensues Sept 24, 2009 Cornell Dept of Computer Science Colloquium 25

26 If IPMC becomes promiscuous, receivers get lots of junk It shows up fast, mixed with valid data This swamps the receivers, which drop packets including good packets So loss rates skyrocket, TCP shuts down. Woe ensues! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 26

27 1. System gets larger and more loaded 2. Generalized loss starts to occur 3. Causes a massive spike of NAK messages. Retransmissions just make things worse meltdown! 4. After 90 seconds, the pub sub product resets Sept 24, 2009 Cornell Dept of Computer Science Colloquium 27

28 Oscillations can also arise in other ways For example, a very slow server that grabs a lock Some systems even self synchronize But those problems can usually be solved Distinction: Those are software issues; better designs can fix IPMC is hardware that breaks if we just scale the size of our overall system up! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 28

29 A consistent system fights to maintain some guarantee: a feedback loop Thus when the network collapses, the consistency property is often seen to be at fault Had we not cared about consistency, feedback instability would not have occurred So consistency is dangerous! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 29

30 Replicate with TCP overlay? Data often stale Use IPMC? It melts down catastrophically My code was bug free. It worked in testing and ran for years in our data center Then they upgraded the data center and my old application explodes. Was that my fault??! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 30

31 My group has been working on this topic for about two years now Quick summary: Fix IP multicast to make multicast a friendly and safe citizen again [Dr Multicast: HotNets 08, LADIS 08, submission to Eurosys 10] Gossip tunnels for UDP and other point to point traffic: provably safe and scalable [Gossip Objects: ICP2P 09; submitted to ACM TOCS] Sept 24, 2009 Cornell Dept of Computer Science Colloquium 31

32 Underway now: Collaboration with Cisco to put these and related tools onto Internet backbone routers Goal: transparent software fault tolerance, with a reliable multicast that runs at 100Gbit data rates Also will be able to host virtual services on the routers themselves Translated back to a data center: safe and guaranteed stable reliability mechanisms Sept 24, 2009 Cornell Dept of Computer Science Colloquium 32

33 CTOs fear disruptive network meltdowns Fix the network, then reimplement consistency protocols as scalable solutions within NTI Well I ll give you another chance making a story that even a skeptical CTO might consider Sept 24, 2009 Cornell Dept of Computer Science Colloquium 33

34 c c Today s embrace of inconsistency has given us scalable services we just can t trust. As cloud takes on medical care, banking inconsistency just isn t good enough NTI: cloud scale consistency without fear has papers (NTI is still work in progress) Sept 24, 2009 Cornell Dept of Computer Science Colloquium 34

CS5412: ANATOMY OF A CLOUD

CS5412: ANATOMY OF A CLOUD 1 CS5412: ANATOMY OF A CLOUD Lecture VIII Ken Birman How are cloud structured? 2 Clients talk to clouds using web browsers or the web services standards But this only gets us to the outer skin of the cloud

More information

CS5412: HOW MUCH ORDERING?

CS5412: HOW MUCH ORDERING? 1 CS5412: HOW MUCH ORDERING? Lecture XVI Ken Birman Ordering 2 The key to consistency turns has turned out to be delivery ordering (durability is a separate thing) Given replicas that are initially in

More information

CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME SETTINGS KEN BIRMAN. Rao Professor of Computer Science Cornell University

CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME SETTINGS KEN BIRMAN. Rao Professor of Computer Science Cornell University CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME SETTINGS KEN BIRMAN Rao Professor of Computer Science Cornell University CAN THE CLOUD DO REAL-TIME? Internet of Things, Smart Grid / Buildings / Cars,...

More information

CS603: Distributed Systems

CS603: Distributed Systems CS603: Distributed Systems Lecture 1: Basic Communication Services Cristina Nita-Rotaru Lecture 1/ Spring 2006 1 Reference Material Textbooks Ken Birman: Reliable Distributed Systems Recommended reading

More information

DIVING IN: INSIDE THE DATA CENTER

DIVING IN: INSIDE THE DATA CENTER 1 DIVING IN: INSIDE THE DATA CENTER Anwar Alhenshiri Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs it to

More information

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and Jaliya Ekanayake Range in size from edge facilities

More information

CS5412: THE REALTIME CLOUD

CS5412: THE REALTIME CLOUD 1 CS5412: THE REALTIME CLOUD Lecture XXIV Ken Birman Can the Cloud Support Real-Time? 2 More and more real time applications are migrating into cloud environments Monitoring of traffic in various situations,

More information

CS5412: DIVING IN: INSIDE THE DATA CENTER

CS5412: DIVING IN: INSIDE THE DATA CENTER 1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman We ve seen one cloud service 2 Inside a cloud, Dynamo is an example of a service used to make sure that cloud-hosted applications can scale

More information

CS5412: DIVING IN: INSIDE THE DATA CENTER

CS5412: DIVING IN: INSIDE THE DATA CENTER 1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs

More information

Reliable Distributed System Approaches

Reliable Distributed System Approaches Reliable Distributed System Approaches Manuel Graber Seminar of Distributed Computing WS 03/04 The Papers The Process Group Approach to Reliable Distributed Computing K. Birman; Communications of the ACM,

More information

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Java Message Service (JMS) is a standardized messaging interface that has become a pervasive part of the IT landscape

More information

CS October 2017

CS October 2017 Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

More information

CS5412: OTHER DATA CENTER SERVICES

CS5412: OTHER DATA CENTER SERVICES 1 CS5412: OTHER DATA CENTER SERVICES Lecture V Ken Birman Tier two and Inner Tiers 2 If tier one faces the user and constructs responses, what lives in tier two? Caching services are very common (many

More information

CLOUD COMPUTING Lecture 27: CS 2110 Spring 2013

CLOUD COMPUTING Lecture 27: CS 2110 Spring 2013 CLOUD COMPUTING Lecture 27: CS 2110 Spring 2013 Computing has evolved... 2 Fifteen years ago: desktop/laptop + clusters Then Web sites Social networking sites with photos, etc Cloud computing systems Cloud

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Cornell University, Hebrew University * (*) Others are also involved in some aspects of this project I ll mention them when their work arises

Cornell University, Hebrew University * (*) Others are also involved in some aspects of this project I ll mention them when their work arises Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev Cornell University, Hebrew University * (*) Others are also involved in some aspects of this project I ll mention them when their work arises Live

More information

RELIABILITY & AVAILABILITY IN THE CLOUD

RELIABILITY & AVAILABILITY IN THE CLOUD RELIABILITY & AVAILABILITY IN THE CLOUD A TWILIO PERSPECTIVE twilio.com To the leaders and engineers at Twilio, the cloud represents the promise of reliable, scalable infrastructure at a price that directly

More information

CS5412: BIMODAL MULTICAST ASTROLABE

CS5412: BIMODAL MULTICAST ASTROLABE 1 CS5412: BIMODAL MULTICAST ASTROLABE Lecture XIX Ken Birman Gossip 201 2 Recall from early in the semester that gossip spreads in log(system size) time But is this actually fast? 1.0 % infected 0.0 Time

More information

Distributed Data Store

Distributed Data Store Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is

More information

Indirect Communication

Indirect Communication Indirect Communication To do q Today q q Space and time (un)coupling Common techniques q Next time: Overlay networks xkdc Direct coupling communication With R-R, RPC, RMI Space coupled Sender knows the

More information

Infrastructures for Cloud Computing and Big Data M

Infrastructures for Cloud Computing and Big Data M University of Bologna Dipartimento di Informatica Scienza e Ingegneria (DISI) Engineering Bologna Campus Class of Infrastructures for Cloud Computing and Big Data M Cloud support and Global strategies

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [REPLICATION & CONSISTENCY] Frequently asked questions from the previous class survey Shrideep Pallickara Computer Science Colorado State University L25.1 L25.2 Topics covered

More information

Data Intensive Scalable Computing

Data Intensive Scalable Computing Data Intensive Scalable Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Examples of Big Data Sources Wal-Mart 267 million items/day, sold at 6,000 stores HP built them

More information

Distributed Systems CS6421

Distributed Systems CS6421 Distributed Systems CS6421 Intro to Distributed Systems and the Cloud Prof. Tim Wood v I teach: Software Engineering, Operating Systems, Sr. Design I like: distributed systems, networks, building cool

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

ebay Marketplace Architecture

ebay Marketplace Architecture ebay Marketplace Architecture Architectural Strategies, Patterns, and Forces Randy Shoup, ebay Distinguished Architect QCon SF 2007 November 9, 2007 What we re up against ebay manages Over 248,000,000

More information

Our topic. Concurrency Support. Why is this a problem? Why threads? Classic solution? Resulting architecture. Ken Birman

Our topic. Concurrency Support. Why is this a problem? Why threads? Classic solution? Resulting architecture. Ken Birman Our topic Concurrency Support Ken Birman To get high performance, distributed systems need to achieve a high level of concurrency As a practical matter: interleaving tasks, so that while one waits for

More information

Reliable Distributed Systems. Scalability

Reliable Distributed Systems. Scalability Reliable Distributed Systems Scalability Scalability Today we ll focus on how things scale Basically: look at a property that matters Make something bigger Like the network, the number of groups, the number

More information

ebay s Architectural Principles

ebay s Architectural Principles ebay s Architectural Principles Architectural Strategies, Patterns, and Forces for Scaling a Large ecommerce Site Randy Shoup ebay Distinguished Architect QCon London 2008 March 14, 2008 What we re up

More information

CS505: Distributed Systems

CS505: Distributed Systems Cristina Nita-Rotaru CS505: Distributed Systems Protocols. Slides prepared based on material by Prof. Ken Birman at Cornell University, available at http://www.cs.cornell.edu/ken/book/ Required reading

More information

Introduction to the Active Everywhere Database

Introduction to the Active Everywhere Database Introduction to the Active Everywhere Database INTRODUCTION For almost half a century, the relational database management system (RDBMS) has been the dominant model for database management. This more than

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer MySQL Group Replication Bogdan Kecman MySQL Principal Technical Engineer Bogdan.Kecman@oracle.com 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended

More information

QOS Quality Of Service

QOS Quality Of Service QOS Quality Of Service Michael Schär Seminar in Distributed Computing Outline Definition QOS Attempts and problems in the past (2 Papers) A possible solution for the future: Overlay networks (2 Papers)

More information

Advanced option settings on the command line. Set the interface and ports for the OpenVPN daemons

Advanced option settings on the command line. Set the interface and ports for the OpenVPN daemons Advanced option settings on the command line docs.openvpn.net/command-line/advanced-option-settings-on-the-command-line Set the interface and ports for the OpenVPN daemons In the Admin UI under Server

More information

Data Consistency Now and Then

Data Consistency Now and Then Data Consistency Now and Then Todd Schmitter JPMorgan Chase June 27, 2017 Room #208 Data consistency in real life Social media Facebook post: January 22, 2017, at a political rally Comments displayed are

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

Best Practices for Scaling Websites Lessons from ebay

Best Practices for Scaling Websites Lessons from ebay Best Practices for Scaling Websites Lessons from ebay Randy Shoup ebay Distinguished Architect QCon Asia 2009 Challenges at Internet Scale ebay manages 86.3 million active users worldwide 120 million items

More information

TCP Strategies. Keepalive Timer. implementations do not have it as it is occasionally regarded as controversial. between source and destination

TCP Strategies. Keepalive Timer. implementations do not have it as it is occasionally regarded as controversial. between source and destination Keepalive Timer! Yet another timer in TCP is the keepalive! This one is not required, and some implementations do not have it as it is occasionally regarded as controversial! When a TCP connection is idle

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Data Centers. Tom Anderson

Data Centers. Tom Anderson Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,

More information

CS5412: THE BASE METHODOLOGY VERSUS THE ACID MODEL

CS5412: THE BASE METHODOLOGY VERSUS THE ACID MODEL 1 CS5412: THE BASE METHODOLOGY VERSUS THE ACID MODEL Lecture VIII Ken Birman Today s lecture will be a bit short 2 We have a guest with us today: Kate Jenkins from Akamai The world s top content hosting

More information

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech Cloud Computing DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech Managing servers isn t for everyone What are some prohibitive issues? (we touched on these last time) Cost (initial/operational)

More information

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010 Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node

More information

The Internet. CS514: Intermediate Course in Operating Systems. Massive scale. Constant flux. Typical examples. A tale of woe. Sample tale of woe

The Internet. CS514: Intermediate Course in Operating Systems. Massive scale. Constant flux. Typical examples. A tale of woe. Sample tale of woe The Internet CS54: Intermediate Course in Operating Systems Massive scale. Constant flux Professor Ken Birman Vivek Vishnumurthy: TA Source: Burch and Cheswick Demand for more autonomic, intelligent behavior

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

Optimization of Windows 8, 8.1 and 10 for better performance

Optimization of Windows 8, 8.1 and 10 for better performance Windows 10 Optimization of Windows 8, 8.1 and 10 for better performance By default, Windows 10 turns your PC into a server for distributing updates to other machines. Here's how to make it stop. One of

More information

17655: Discussion: The New z/os Interface for the Touch Generation

17655: Discussion: The New z/os Interface for the Touch Generation 17655: Discussion: The New z/os Interface for the Touch Generation Thursday, August 13, 2015: 12:30 PM-1:30 PM Europe 2 (Walt Disney World Dolphin ) Speaker: Geoff Smith(IBM Corporation) 1 Trademarks The

More information

Distributed Systems Intro and Course Overview

Distributed Systems Intro and Course Overview Distributed Systems Intro and Course Overview COS 418: Distributed Systems Lecture 1 Wyatt Lloyd Distributed Systems, What? 1) Multiple computers 2) Connected by a network 3) Doing something together Distributed

More information

Performance Benchmarking an Enterprise Message Bus. Anurag Sharma Pramod Sharma Sumant Vashisth

Performance Benchmarking an Enterprise Message Bus. Anurag Sharma Pramod Sharma Sumant Vashisth Performance Benchmarking an Enterprise Message Bus Anurag Sharma Pramod Sharma Sumant Vashisth About the Authors Sumant Vashisth is Director of Engineering, Security Management Business Unit at McAfee.

More information

Solace Message Routers and Cisco Ethernet Switches: Unified Infrastructure for Financial Services Middleware

Solace Message Routers and Cisco Ethernet Switches: Unified Infrastructure for Financial Services Middleware Solace Message Routers and Cisco Ethernet Switches: Unified Infrastructure for Financial Services Middleware What You Will Learn The goal of zero latency in financial services has caused the creation of

More information

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks. Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers Overview 15-441 Computer Networking TCP & router queuing Lecture 10 TCP & Routers TCP details Workloads Lecture 10: 09-30-2002 2 TCP Performance TCP Performance Can TCP saturate a link? Congestion control

More information

CS514: Intermediate Course in Computer Systems

CS514: Intermediate Course in Computer Systems : Intermediate Course in Computer Systems Lecture 15: February 24, 2003 Overcoming network outages in applications that need high availability Network outages We ve seen a number of security mechanisms

More information

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time.

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. SOLUTION SET In class we often used smart highway (SH) systems as

More information

Ananta: Cloud Scale Load Balancing. Nitish Paradkar, Zaina Hamid. EECS 589 Paper Review

Ananta: Cloud Scale Load Balancing. Nitish Paradkar, Zaina Hamid. EECS 589 Paper Review Ananta: Cloud Scale Load Balancing Nitish Paradkar, Zaina Hamid EECS 589 Paper Review 1 Full Reference Patel, P. et al., " Ananta: Cloud Scale Load Balancing," Proc. of ACM SIGCOMM '13, 43(4):207-218,

More information

Keeping Your Network at Peak Performance as You Virtualize the Data Center

Keeping Your Network at Peak Performance as You Virtualize the Data Center Keeping Your Network at Peak Performance as You Virtualize the Data Center Laura Knapp WW Business Consultant Laurak@aesclever.com Applied Expert Systems, Inc. 2011 1 Background The Physical Network Inside

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

COMMUNICATION IN DISTRIBUTED SYSTEMS

COMMUNICATION IN DISTRIBUTED SYSTEMS Distributed Systems Fö 3-1 Distributed Systems Fö 3-2 COMMUNICATION IN DISTRIBUTED SYSTEMS Communication Models and their Layered Implementation 1. Communication System: Layered Implementation 2. Network

More information

CS514: Intermediate Course in Computer Systems. Lecture 38: April 23, 2003 Nested transactions and other weird implementation issues

CS514: Intermediate Course in Computer Systems. Lecture 38: April 23, 2003 Nested transactions and other weird implementation issues : Intermediate Course in Computer Systems Lecture 38: April 23, 2003 Nested transactions and other weird implementation issues Transactions Continued We saw Transactions on a single server 2 phase locking

More information

NFON Whitepaper: Integrating Microsoft Lync (Skype for Business) with Telephony

NFON Whitepaper: Integrating Microsoft Lync (Skype for Business) with Telephony . Myths and facts - for enterprise owners, managers and buyers. Document Version: V1.0 Date: November 2015 NFON UK Ltd, 140 Wales Farm Road, London, W3 6UG, UK Page 1 of 7 1. INTRODUCTION While the business

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015 Page 1 Example Replicated File Systems NFS Coda Ficus Page 2 NFS Originally NFS did not have any replication capability

More information

Third Wave. How to Incorporate Legacy Devices to the. of Internet Evolution

Third Wave. How to Incorporate Legacy Devices to the. of Internet Evolution How to Incorporate Legacy Devices to the Third Wave of Internet Evolution John Rinaldi Business Development Manager Real Time Automation N26 W23315 Paul Rd Pewaukee, WI 53072 (p) 262-436-9299 414-460-6556

More information

CMSC 433 Programming Language Technologies and Paradigms. Spring 2013

CMSC 433 Programming Language Technologies and Paradigms. Spring 2013 1 CMSC 433 Programming Language Technologies and Paradigms Spring 2013 Distributed Computing Concurrency and the Shared State This semester we have been looking at concurrent programming and how it is

More information

Fighting Phishing I: Get phish or die tryin.

Fighting Phishing I: Get phish or die tryin. Fighting Phishing I: Get phish or die tryin. Micah Nelson and Max Hyppolite bit.ly/nercomp_sap918 Please, don t forget to submit your feedback for today s session at the above URL. If you use social media

More information

CS5412: NETWORKS AND THE CLOUD

CS5412: NETWORKS AND THE CLOUD 1 CS5412: NETWORKS AND THE CLOUD Lecture III Ken Birman The Internet and the Cloud 2 Cloud computing is transforming the Internet! Mix of traffic has changed dramatically Demand for networking of all kinds

More information

Introduction to Distributed * Systems

Introduction to Distributed * Systems Introduction to Distributed * Systems Outline about the course relationship to other courses the challenges of distributed systems distributed services *ility for distributed services about the course

More information

AWS Solution Architecture Patterns

AWS Solution Architecture Patterns AWS Solution Architecture Patterns Objectives Key objectives of this chapter AWS reference architecture catalog Overview of some AWS solution architecture patterns 1.1 AWS Architecture Center The AWS Architecture

More information

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions Chapter 1: Solving Integration Problems Using Patterns 2 Introduction The Need for Integration Integration Challenges

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses

More information

TIME FOR A NEW NETWORK REINVENTING THE NETWORK. Abner Germanow March 15, 2011

TIME FOR A NEW NETWORK REINVENTING THE NETWORK. Abner Germanow March 15, 2011 TIME FOR A NEW NETWORK REINVENTING THE NETWORK Abner Germanow March 15, 2011 IT SPENDING The Network 3 Copyright 2010 Juniper Networks, Inc. www.juniper.net IMPACT OF THE NETWORK The Network Not Connected

More information

GOSSIP ARCHITECTURE. Gary Berg css434

GOSSIP ARCHITECTURE. Gary Berg css434 GOSSIP ARCHITECTURE Gary Berg css434 WE WILL SEE Architecture overview Consistency models How it works Availability and Recovery Performance and Scalability PRELIMINARIES Why replication? Fault tolerance

More information

ApsaraDB for Redis. Product Introduction

ApsaraDB for Redis. Product Introduction ApsaraDB for Redis is compatible with open-source Redis protocol standards and provides persistent memory database services. Based on its high-reliability dual-machine hot standby architecture and seamlessly

More information

Like It Or Not Web Applications and Mashups Will Be Hot

Like It Or Not Web Applications and Mashups Will Be Hot Like It Or Not Web Applications and Mashups Will Be Hot Tommi Mikkonen Tampere University of Technology tommi.mikkonen@tut.fi Antero Taivalsaari Sun Microsystems Laboratories antero.taivalsaari@sun.com

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8)

PNUTS and Weighted Voting. Vijay Chidambaram CS 380 D (Feb 8) PNUTS and Weighted Voting Vijay Chidambaram CS 380 D (Feb 8) PNUTS Distributed database built by Yahoo Paper describes a production system Goals: Scalability Low latency, predictable latency Must handle

More information

Indirect Communication

Indirect Communication Indirect Communication Vladimir Vlassov and Johan Montelius KTH ROYAL INSTITUTE OF TECHNOLOGY Time and Space In direct communication sender and receivers exist in the same time and know of each other.

More information

ICANN and Technical Work: Really? Yes! Steve Crocker DNS Symposium, Madrid, 13 May 2017

ICANN and Technical Work: Really? Yes! Steve Crocker DNS Symposium, Madrid, 13 May 2017 ICANN and Technical Work: Really? Yes! Steve Crocker DNS Symposium, Madrid, 13 May 2017 Welcome, everyone. I appreciate the invitation to say a few words here. This is an important meeting and I think

More information

SEARCH ENGINE OPTIMIZATION Noun The process of maximizing the number of visitors to a particular website by ensuring that the site appears high on the list of results returned by a search engine such as

More information

Programming Models MapReduce

Programming Models MapReduce Programming Models MapReduce Majd Sakr, Garth Gibson, Greg Ganger, Raja Sambasivan 15-719/18-847b Advanced Cloud Computing Fall 2013 Sep 23, 2013 1 MapReduce In a Nutshell MapReduce incorporates two phases

More information

Eventual Consistency 1

Eventual Consistency 1 Eventual Consistency 1 Readings Werner Vogels ACM Queue paper http://queue.acm.org/detail.cfm?id=1466448 Dynamo paper http://www.allthingsdistributed.com/files/ amazon-dynamo-sosp2007.pdf Apache Cassandra

More information

Current Topics in OS Research. So, what s hot?

Current Topics in OS Research. So, what s hot? Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general

More information

More on LANS. LAN Wiring, Interface

More on LANS. LAN Wiring, Interface More on LANS Chapters 10-11 LAN Wiring, Interface Mostly covered this material already NIC = Network Interface Card Separate processor, buffers incoming/outgoing data CPU might not be able to keep up network

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE

Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE Your Data Demands More NETAPP ENABLES YOU TO LEVERAGE YOUR DATA & COMPUTE FROM ANYWHERE IN ITS EARLY DAYS, NetApp s (www.netapp.com) primary goal was to build a market for network-attached storage and

More information

Cloud Computing CS

Cloud Computing CS Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part

More information

ICANN Start, Episode 1: Redirection and Wildcarding. Welcome to ICANN Start. This is the show about one issue, five questions:

ICANN Start, Episode 1: Redirection and Wildcarding. Welcome to ICANN Start. This is the show about one issue, five questions: Recorded in October, 2009 [Music Intro] ICANN Start, Episode 1: Redirection and Wildcarding Welcome to ICANN Start. This is the show about one issue, five questions: What is it? Why does it matter? Who

More information

Primary-Backup Replication

Primary-Backup Replication Primary-Backup Replication CS 240: Computing Systems and Concurrency Lecture 7 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Simplified Fault Tolerance

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Upgrade Your MuleESB with Solace s Messaging Infrastructure The era of ubiquitous connectivity is upon us. The amount of data most modern enterprises must collect, process and distribute is exploding as a result of real-time process flows, big data, ubiquitous

More information

Azure Cloud Architecture

Azure Cloud Architecture Azure Cloud Architecture Training Schedule 2015 May 18-20 Belgium (TBD) Overview This course is a deep dive in every architecture aspect of the Azure Platform-as-a-Service components. It delivers the needed

More information

Managing Data at Scale: Microservices and Events. Randy linkedin.com/in/randyshoup

Managing Data at Scale: Microservices and Events. Randy linkedin.com/in/randyshoup Managing Data at Scale: Microservices and Events Randy Shoup @randyshoup linkedin.com/in/randyshoup Background VP Engineering at Stitch Fix o Combining Art and Science to revolutionize apparel retail Consulting

More information