Ken Birman. Cornell Dept of Computer Science Colloquium

Size: px

Start display at page:

Download "Ken Birman. Cornell Dept of Computer Science Colloquium"

Hubert Gallagher
5 years ago
Views:

1 Ken Birman Sept 24, 2009 Cornell Dept of Computer Science Colloquium 1

2 My career has been focused on multicast Mostly, in support of service replication and this was important in the old world Client server computing, dominated by databases ACID consistency guarantees were the key The virtual synchrony model is a form of ACID property But a new world has displaced the old one Sept 24, 2009 Cornell Dept of Computer Science Colloquium 2

3 Sept 24, 2009 Cornell Dept of Computer Science Colloquium 3

Web 2.0 Mashups Simple ways to create and share collaboration and social network applications [Try it! http://liveobjects.cs.cornell.

4 Web 2.0 Mashups Simple ways to create and share collaboration and social network applications [Try it! Examples: Live Objects, Google Wave, Javascript/AJAX, Silverlight, Java Fx, Adobe FLEX and AIR, etc. Sept 24, 2009 Cornell Dept of Computer Science Colloquium 4

5 Cloud computing platforms have a characteristic style Client system is a browser or an application pretending to be a browser (web services). If a request times out the client will retry it Cloud platform has a massive number of servers facing the clients. These are stateless and handle client requests with a focus on speed. If one fails, there is no warning. The cluster manager just restarts it, perhaps elsewhere Behind the servers is a row of stateful (usually database) servers. These operate off some form of work queues, usually implemented as enterprise service buses (ESBs). Think publish subscribe with logging. They push updates to the client facing systems. Finally, there is a big pool of machines to run backend applications. This looks like a loosely coupled collection of Linux machines sharing a file system and a lock manager, and often programmed mostly with Hadoop (MapReduce). Sept 24, 2009 Cornell Dept of Computer Science Colloquium 5

6 Client systems use web technologies , file storage, IM, search Web services Databases, files with data used by servers ESB glues it together Google/IBM/Amazon/Yahoo! host the services Embarrassingly parallel backend tasks prepare the databases and files

Range in size from edge facilities to megascale.

small size center (1K servers) and a larger, 50K server

Technology Cost in smallsized Data Center Cost in Large

$13 per Mbps/ month 7.1 Storage Administratio n $2.

40 per GB/ month >1000 Servers/ Administrator 5.7 7.

7 Range in size from edge facilities to megascale. Incredible economies of scale Approximate costs for a small size center (1K servers) and a larger, 50K server center. Technology Cost in smallsized Data Center Cost in Large Data Center Cloud Advantage Network $95 per Mbps/ month $13 per Mbps/ month 7.1 Storage Administratio n $2.20 per GB/ month ~140 servers/ Administrator $0.40 per GB/ month >1000 Servers/ Administrator Each data center is 11.5 times the size of a football field Slides 4 17 provided by Roger Barga, Head of Cloud Computing, Microsoft

8 Old world: we replicated servers for speed and availability, but maintained consistency New world: scalability matters most of all Lots of replication, but weak consistency Sept 24, 2009 Cornell Dept of Computer Science Colloquium 8

9 Why can t we have both? Scalability Consistency too Sept 24, 2009 Cornell Dept of Computer Science Colloquium 9

10 Sept 24, 2009 Cornell Dept of Computer Science Colloquium 10

11 Why do reliability/consistency mechanisms scale poorly? Focus on replication supported by multicast Can we do something about it? For example, can we fix the whole mess? How would a truly scalable platform look? Sept 24, 2009 Cornell Dept of Computer Science Colloquium 11

12 Sept 24, 2009 Cornell Dept of Computer Science Colloquium 12

13 As described by Randy Shoup at LADIS 2008 Thou shalt 1. Partition Everything 2. Use Asynchrony Everywhere 3. Automate Everything 4. Remember: Everything Fails 5. Embrace Inconsistency Sept 24, 2009 Cornell Dept of Computer Science Colloquium 13

14 Werner Vogels is CTO at Amazon.com His first act? He banned multicast! Amazon was troubled by platform instability Vogels decreed: all communication via SOAP/TCP This was slower but Stability matters more than speed Sept 24, 2009 Cornell Dept of Computer Science Colloquium 14

Key to scalability is decoupling, loosest possible synchronization Any synchronized mechanism is a risk His approach: create a committee Anyone who wants

15 Key to scalability is decoupling, loosest possible synchronization Any synchronized mechanism is a risk His approach: create a committee Anyone who wants to deploy a highly consistent mechanism needs committee approval. They don t meet very often Sept 24, 2009 Cornell Dept of Computer Science Colloquium 15

16 Consistency technologies just don t scale! Sept 11, 24, 2009 P2P Cornell 2009 Dept Seattle, of Computer Washington Science Colloquium 16

17 This is the common thread All three guys Really build massive data centers, that work And are opposed to consistency mechanisms Sept 24, 2009 Cornell Dept of Computer Science Colloquium 17

18 A consistent distributed system will often have many components, but users observe behavior indistinguishable from that of a single component reference system Reference Model Implementation Sept 24, 2009 Cornell Dept of Computer Science Colloquium 18

19 Transactions that update replicated data Atomic broadcast or other forms of reliable multicast protocols Distributed 2 phase locking mechanisms Sept 24, 2009 Cornell Dept of Computer Science Colloquium 19

A=3 B=7 B = B A A=A+1 Non replicated reference execution

synchrony is a consistency model: Synchronous runs:

updates (like Paxos) Virtually synchronous runs are

20 A=3 B=7 B = B A A=A+1 Non replicated reference execution Synchronous execution Virtually synchronous execution Virtual synchrony is a consistency model: Synchronous runs: indistinguishable from non replicated object that saw the same updates (like Paxos) Virtually synchronous runs are indistinguishable from synchronous runs Sept 24, 2009 Cornell Dept of Computer Science Colloquium 20

00 Sept 2009 Tommy Tenant Strong security guarantees demand consistency Would you trust a medical electronic

21 Inconsistency causes bugs Clients would never be able to trust servers a free for all My rent check bounced? That can t be right! Weak or best effort consistency? Jason Fane Properties Sept 2009 Tommy Tenant Strong security guarantees demand consistency Would you trust a medical electronic health records system or a bank that used weak consistency for better scalability? Sept 24, 2009 Cornell Dept of Computer Science Colloquium 21

22 They think consistency mechanisms are capable of destabilizing the whole data center They typically run over UDP and IP multicast As a result, are fast in the laboratory, but lack flow control. If a node gets overwhelmed they just keep on sending, and things get worse and worse They also believe that these mechanisms tend to cause synchronization over sets of nodes, which can cause load oscillations Sept 24, 2009 Cornell Dept of Computer Science Colloquium 22

23 A blend of stories (ebay, Amazon, Yahoo): Pub sub message bus very popular. System scaled up. Rolled out a faster ethernet. Product uses IPMC to accelerate sending All goes well until one day, under heavy load, loss rates spike, triggering collapse Oscillation observed Sept 24, 2009 Cornell Dept of Computer Science Colloquium 23

24 Without IP multicast (IPMC)? Pub/sub struggles to disseminate data This inevitably limits speed, scalability But IPMC itself scales poorly! Routers and NICs become promiscuous if we use large numbers of multicast addresses [Tock] Then everyone gets every IPMC packet Sept 24, 2009 Cornell Dept of Computer Science Colloquium 24

25 Routers and NICs use Bloom Filters for quick forwarding/receive decisions Should I forward packet P on link L? Is this packet one I should accept? Such a filter is just an array of k bit vectors Hash message k times; test/set bit b k in bvec[k] Too many addresses cause the filters to fill up and high false positive rate ensues Sept 24, 2009 Cornell Dept of Computer Science Colloquium 25

26 If IPMC becomes promiscuous, receivers get lots of junk It shows up fast, mixed with valid data This swamps the receivers, which drop packets including good packets So loss rates skyrocket, TCP shuts down. Woe ensues! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 26

27 1. System gets larger and more loaded 2. Generalized loss starts to occur 3. Causes a massive spike of NAK messages. Retransmissions just make things worse meltdown! 4. After 90 seconds, the pub sub product resets Sept 24, 2009 Cornell Dept of Computer Science Colloquium 27

28 Oscillations can also arise in other ways For example, a very slow server that grabs a lock Some systems even self synchronize But those problems can usually be solved Distinction: Those are software issues; better designs can fix IPMC is hardware that breaks if we just scale the size of our overall system up! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 28

29 A consistent system fights to maintain some guarantee: a feedback loop Thus when the network collapses, the consistency property is often seen to be at fault Had we not cared about consistency, feedback instability would not have occurred So consistency is dangerous! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 29

center Then they upgraded the data center and my old

30 Replicate with TCP overlay? Data often stale Use IPMC? It melts down catastrophically My code was bug free. It worked in testing and ran for years in our data center Then they upgraded the data center and my old application explodes. Was that my fault??! Sept 24, 2009 Cornell Dept of Computer Science Colloquium 30

31 My group has been working on this topic for about two years now Quick summary: Fix IP multicast to make multicast a friendly and safe citizen again [Dr Multicast: HotNets 08, LADIS 08, submission to Eurosys 10] Gossip tunnels for UDP and other point to point traffic: provably safe and scalable [Gossip Objects: ICP2P 09; submitted to ACM TOCS] Sept 24, 2009 Cornell Dept of Computer Science Colloquium 31

32 Underway now: Collaboration with Cisco to put these and related tools onto Internet backbone routers Goal: transparent software fault tolerance, with a reliable multicast that runs at 100Gbit data rates Also will be able to host virtual services on the routers themselves Translated back to a data center: safe and guaranteed stable reliability mechanisms Sept 24, 2009 Cornell Dept of Computer Science Colloquium 32

CTOs fear disruptive network meltdowns Fix the

story that even a skeptical CTO might consider

33 CTOs fear disruptive network meltdowns Fix the network, then reimplement consistency protocols as scalable solutions within NTI Well I ll give you another chance making a story that even a skeptical CTO might consider Sept 24, 2009 Cornell Dept of Computer Science Colloquium 33

34 c c Today s embrace of inconsistency has given us scalable services we just can t trust. As cloud takes on medical care, banking inconsistency just isn t good enough NTI: cloud scale consistency without fear has papers (NTI is still work in progress) Sept 24, 2009 Cornell Dept of Computer Science Colloquium 34

CS5412: ANATOMY OF A CLOUD

1 CS5412: ANATOMY OF A CLOUD Lecture VIII Ken Birman How are cloud structured? 2 Clients talk to clouds using web browsers or the web services standards But this only gets us to the outer skin of the cloud