Architecture of Systems for Processing Massive Amounts of Data

Size: px
Start display at page:

Download "Architecture of Systems for Processing Massive Amounts of Data"

Transcription

1 Architecture of Systems for Processing Massive Amounts of Data Milan Vojnović Microsoft Research April 2011 The Goals of this Lecture Learn about the underlying principles of system design for processing of massive amounts of data Learn about the state-of-the-art systems used in production and commercial systems of major Internet online service providers Ex. Amazon, Google and Microsoft Learn about some alternative system designs proposed in research papers 2 1

2 Typical System Characteristics Distributed system built from inexpensive commodity components Share-nothing model Machines with their own CPU, memory and hard disks interconnected with a network Failures of machines and network components are common 3 Application Requirements Support for storing and processing of large files GB, TB, PB quite common Efficient processing that uses large streaming reads and writes Operations on contiguous regions of a file Support for structured data Ex. tables, incremental processing Parallel processing Parallel computation complexities hidden from the programmer Accommodate declarative and imperative programming Quality of service requirements Ex. fast processing speed, high availability 4 2

3 Contents Network Architecture File System Job Scheduling Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 5 Network Architecture Typically hierarchical organization Either two- or three-level trees of switches or routers Core tier (ex , 10 GigE ports) Aggregation tier (ex X GigE ports) Computers clustered in racks 6 3

4 Oversubscription Switches allow all directly connected hosts to communicate with one another at the full speed of their network interface Oversubscription The ratio of the worst-case achievable aggregate bandwidth among the end hosts to the total bisection bandwidth of a particular network Pros: Lowers the cost Cons: Complicates design of protocols as the design must be conscious of network bandwidth asymmetries 7 Oversubscription: Examples 1:1 = all hosts may potentially communicate with arbitrary other hosts at the full bandwidth of their network interface 5:1 = 20% of available host bandwidth is available for some connection patterns Typical designs 2.5:1 (400 Mbps) to 8:1 (125 Mbps) 8 4

5 Alternative: Fat-Tree Topology Built using many small commodity switches Lower cost Fat-tree topology k-ary tree k pods each containing two layers of k/2 aggregation switches (upper and lower layers) (k/2)x(k/2) k-port core switches Each in the lower aggregation layer directly connected to k/2 hosts, other k/2 ports connected to k/2 switches in the upper aggregation layer Each core switch has one port connected to each of k pods A fat-tree with k-port switches supports k 3 /4 hosts 9 An Example Fat-Tree Topology k =

6 Addressing IP address blocks /8 Pod switch: 10.pod.switch.1 Pod = the pod number [0,k-1] Switch = position of the switch in the pod [0,k-1] from left to right bottom to top Core switch: 10.k.j.i (j,i) = switch coordinate in the (k/2)x(k/2) core switch grid; i,j in [1,k/2] from top-left Host: 10.pod.switch.ID ID = host position in the subnet [2,k/2+1] from left to right 11 IP Routing Goal: IP routing using paths across the network so that the load on switches is balanced Two-level prefix lookup Primary table contains first-level prefixes Secondary table contains second-level suffix and port A first-level prefix in the primary table may contain a pointer to a secondary table Prefix said to be terminating if no (suffix, port) entry is associated to this prefix Inter-pod routing has default /0 prefix and routing uses suffix routing based on host ID Centralized configuration of routing table entries Appropriate for data centre scenarios 12 6

7 IP Routing Example Routing table at switch IP address forwarded to port 1 IP address forwarded to port 3 13 IP Routing Table Generation Aggregation switch routing tables for load balancing 14 7

8 IP Routing Table Generation (cont d) Core switch routing tables Load balancing in the initial part of the route through aggregation switches 15 Packing One drawback of the fat-tree topology is the number of required cables Larger fan out of switches Packing aims at Minimizing the number of external cables Reducing the overall cable length Allowing for incremental deployment Aggregation switches partitioned over pod racks Star layout per pod to reduce cable length Pod rack is a hub and racks are leaves to reduce Only external cabling is to core switches 16 8

9 Packing: Example 17 Network Architecture File System Job Scheduling Contents Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 18 9

10 File System Needs to meet a set of design requirements for efficient processing of massive data sets Ex. Google File System (GFS) Cosmos (Microsoft) 19 Design Requirements Storing a modest number of large files Ex. millions of files, each of size 100 MB or larger, multi GB file size are common Large streaming reads Individual operations typically read multiple MBs; clients often read through a contiguous region of a file Many large sequential writes Mostly append to a file; modifications are rare Semantics to support multiple clients concurrently appending to a file Atomicity with minimal overhead High sustained bandwidth more important than latency 20 10

11 System Architecture File partitioned into chunks ( extents in Cosmos) Ex. chunk size 64 MB Choice of chunk size for efficient reads Specialized master nodes Ex. handle namespace management and locking, replica placement, creation, re-replication and rebalancing Chunkservers Maintain replicas Clients Issue read and write requests 21 GFS Architecture Single master node Centralized component: simplifies the design Ex. Easier to implement chunk placement strategies using global knowledge Clients never read or write file data through the master Otherwise, the master may become a bottleneck 22 11

12 GFS Architecture (cont d) Client sends request for chunk index The master replies with the corresponding chunk handle and location of the replicas Client reads data directly from a chunkserver 23 Consistency Model A file region consistent if all clients always see the same data, regardless of which replicas they read from Uses a lease mechanism for consistent mutation order across replicas Mutation = an operation that changes the contents or metadata of a chunk (ex. write or append) The master grants a chunk lease to one of the replicas (called primary) Delegation to minimize management overhead of the master Lease granted for a time period that can be repeatedly extended by the primary The primary determines a serial order for all mutations of the chunk, which is then followed by all replicas 24 12

13 The Lease Mechanism 1. Client asks for chunkserver that holds lease for the chunk; if none exists, the master grants one to a replica it choses 2. Primary and secondary replicas communicated to the client 3. The client pushes the data to all the replicas 4. Once all replicas ack to have received the data, the client sends write request to the primary 5. The primary forwards the write request to all secondary replicas 6. Each secondary ack to the primary to have applied the operation 7. The primary replies to the client 25 Chunk Replication Resilience to failures of machines and network partitioning Chunks replicated to multiple chunkservers on different racks K copy replication 26 13

14 Data Flow Pipelined data transmission through a chain of chunk servers The goal is to fully utilize each machine s network interface Each machine forwards the data to the closest machine in the network topology that has not received it Distances estimated from IP addresses Data pipelined over TCP connections In absence of network congestion, the transfer time for R replicas = B/C + R L B = number of bytes to transfer C = network interface bandwidth L = latency to transfer bytes between two machines 27 Network Architecture File System Job Scheduling Contents Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 28 14

15 Job Scheduling Jobs consists of multiple tasks Ex. a task needs to process data on a machine Scheduling of tasks to machines with respect to the following criteria: Inter-job fairness: allocation of resources fair across jobs with respect to an adopted notion of fairness Data locality: tasks placed near to data Scheduling of tasks performed by execution control (runtime) Examples: Quincy (Microsoft), Delay Scheduling (Hadoop) 29 Principles of Job Scheduling in Distributed Cluster Systems Separation of the inter-job fairness objective and data-locality objective Inter-job fairness Ex. weighted round robin, or special case of uniform round robin Data-locality Different ways to accommodate this 30 15

16 Queue-based Scheduling Data structure to encode locality preference with no inter-job fairness C i = machine i R i = rack i X = cluster w i j = task i of job j 31 Simple Greedy Fairness M = number of machines K = current number of jobs N j = number of unfinished tasks of job j Baseline allocation to job j: B j = min M K, N j If B j < M j the remaining slots divided equally among jobs that have additional tasks to determine the allocation B j ; else A j = B j Greed Fair Scheduler: block job j if the number of allocated machines is A j or more 32 16

17 Simple Greedy Fairness (cont d) Suffers form a stick-slot problem When a task finishes, fairness requires to serve the same job Solution by using a hysteresis approach A job unblocked if the number of running tasks falls below A j α, for some α > 0 Essentially uniform round-robin Uniform allocation across jobs Same as with Hadoop Fair Scheduler 33 Combining Inter-Job Fairness and Locality Preference Each job given an allocation according to a fairness criteria Allocation derived by solving a mincost flow problem The costs encode locality preference Ex. machine and rack preference 34 17

18 Delay Scheduling Inter-job fairness criteria: essentially the same as with Quincy Basic idea: Scheduling a job that should be served according the inter-job fairness criterion postponed for a limited number of scheduling slots, if the head-of-the-line task of this job cannot be assigned locally 35 Delay Scheduling (cont d) D = input parameter determining the maximum number of skips per job 36 18

19 Configuring Number of Skips M = number of machines L = slots per machine P j = set of machines on which job j has data to process (preferred machine for job j) p j = P j /M T = task processing time R = number of replicas per data chunk For a job j that is farthest below its fair share the probability of launching a non-local task = (1 p j ) D Exponentially decreasing with D Choose D such that the average portion of locally assigned tasks for a job with N tasks is at least 1 ε, for given ε > 0 37 Assumption 1 Ass. 1: All N tasks require data from same machine Sufficient: D log 1 ε 1 log 1 R M = M R log 1 ε + o(m R ) 38 19

20 Proof Sketch Probability that a task is assigned to a preferred machine: 1 1 R M D 1 e RD M Therefore, it suffices to chose D such that 1 e RD M 1 ε which yields the result 39 Assumption 2 Ass. 2: each task prefers a machine selected uniformly at random from the set of M machines Suppose NR = o( suffices M), then for every ε > 1 N it D M R log εn = M R 1 Nε + o( 1 Nε ) 40 20

21 Proof Sketch First note that machines preferred by tasks are all distinct with high probability; ex. going to 1 if NR = o M 1 1 M 1 2 M 1 NR 1 M 1 NR M NR e M (NR) 2 NR 2 M 1 Given there are K unfinished tasks for job j, the probability that a local task assignment is approximately: 1 1 KR D 1 e RD M K M Average fraction of local assignments per job is at least 1 N N K=1 1 e RD M K 1 1 N K=0 e RD M K 1 = 1 N 1 e RD M The result follows by requiring the right-hand side is at least 1 ε 41 Network Architecture File System Job Scheduling Contents Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 42 21

22 MapReduce Abstraction of group-by and aggregation Applications typically use several rounds of Map and Reduce phases 43 Example of Map and Reduce reduce map 44 22

23 System Components 45 Dryad A general-purpose distributed execution engine for coarse-grained data-parallel computations Based on specifying a dataflow graph Vertices contain code Directed edges (channels) describe data flows DAG = Directed Acyclic Graph 46 23

24 System Architecture NS = name server JM = job manager Determines the assignment of vertices to machines and orchestrates the overall execution D = machines V = vertices 47 Vertices and Channels vertex input channels producer consumer output channels channel Vertex Denotes computation code Typically sequential But also supports even-based programming, ex. using shared thread pool Channel types File (default): preserved after vertex execution until the job completes TCP: requires no disk accesses, but both end-point vertices must be scheduled to run at the same time Shared memory FIFO: low comm cost, but end-point vertices must run within the same process 48 24

25 Data Flow Graph: Construction Operators Clone Point-wise composition Complete bipartite composition Merge 49 Construction Operators (cont d) 50 25

26 Example: Histogram Computation Compute histogram of record frequencies Map phase P = read a part of the file to extract records D = distribute the input using hash partitioning S = performs in-memory sort C = compute the total count per record 51 Example: Histogram Computation Reduce phase MS = sort based on the record hash C = computes total count per record 52 26

27 Example: Histogram Computation Optimized Version Wasteful to execute Q vertices for every input partition Small input partition size Much smaller than RAM size Inefficient to read from many input partitions 53 SCOPE Structured Computations Optimized for Parallel Computations Data modelled as a set of rows comprised of typed columns Declarative language Programs tells what to do, not how to do it Resembles SQL with C# expressions Sequence of commands, typically data transformation operators (take one or more rowsets as input, perform some operation on the data, output a rowset) Compiler and optimizer responsible for generating an efficient execution plan and the runtime for executing the plan with minimal overhead 54 27

28 SCOPE Software Stack SCOPE script SCOPE compiler SCOPE runtime SCOPE optimizer Cosmos execution environment Cosmos file system Cosmos files 55 Example: Histogram Find most popular queries that were requested more than 1000 times Step-by-step equivalent: 56 28

29 Example: Histogram Execution Plan Extractors read extents in parallel Partial aggregation at the rack level* Distribute partition on the grouping column Final aggregation Take only rows with count larger than 1000 Sort by count Merge sorted results * Exploits knowledge about network topology 57 DryadLINQ Similar purpose as with SCOPE but uses LINQ LINQ = Language Integrated Query A set of.net constructs for programming with datasets Objects can be of any.net type Easy to compute using vectors and matrices 58 29

30 DryadLINQ Software Stack DryadLINQ Dryad Cluster services High-level language API Distributed execution, fault-tolerance, scheduling Remote process execution, naming, storage Windows Server Windows Server Windows Server 59 Example: Histogram 60 30

31 Contents Network Architecture File System Job Scheduling Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 61 Design Principles Provide a client with a structured data model that supports control over layout and format Distributed storage system for structured data Efficient reads/writes Consistency High availability 62 31

32 BigTable Data model: multidimensional sorted map (row:string, column: string, time:int64) -> string Column families Group of column keys; basic unit of access control Small number of column families each possibly consisting of many columns Atomic reads and writes Uses horizontal partitioning Rowsets distributed across machines Efficient reads of short ranges as they typically require access to a small number of machines Consistency Uses highly-available and persistent distributed lock service Chuby 63 An Example Table Rows correspond to reversed URLs Contents is the web page content The anchor column family consists of anchor text that referenced the web page Timestamps t i indicate various snapshots 64 32

33 Interface to a Table Write to a table: Read from a table: 65 System Architecture Tablet = contiguous region of the key space Master Assigns tablets to tablet servers Detects the addition and expiration of tablet servers Balances tablet server load Garbage collection Tablet server Tablet server Tablet server Stores a collection of tablets 66 33

34 Indexing of Tablet Locations Three-level hierarchy similar to B+ trees Stores the location of the root tablet Root tablet stores the location of all tablets METADATA tablet points to location for a set of tablets 67 Dictionary: key -> value Amazon s Dynamo Many services require only to store and retrieve a primary key No need for complex relational database queries Key requirement: high availability Always writable Relaxing consistency guarantees Other requirements Incremental scalability Symmetry (no special roles taken by some components) Decentralization (no centralized components) Leverage system heterogeneity 68 34

35 Key System Design Choices Partitioning by consistent hashing Allows for incremental scalability High availability for writes using vector clocks with reconciliations during reads Handling temporary failures using quorum Recovering from system failures using Merkle trees Membership and failure detection using a gossip-based membership protocol 69 Consistent Hashing (for resilience to failures) B is the coordinator for key K Preference list for key K contains B, C and D 70 35

36 Data Versioning using Vector Clocks Vector clock = a list of (node, counter) pairs both versions must be kept 71 Percolator Incremental processing using distributed transactions and notifications Two main abstractions ACID transactions over a random-access repository Observers a way to organize an incremental computation 72 36

37 Notifications Observers - the user written code that is triggered by changes to the table Similar to database triggers or events in active databases Percolator application is a series of observers Notifications designed to help structure an incremental computation 73 References Network Architecture A Scalable Commodity Data Center Network Architecture, M. Al-Fares, A. Loukissas and A. Vahdat, SIGCOMM 2008 File System The Google File System, S. Ghemawat, H. Gobioff and S.- T. Leung, SOSP 2003 Job Scheduling Quincy: Fair Scheduling for Distributed Computing Clusters, M. Isard et al, SOSP 2009 Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling, M. Zaharia et al, EuroSys

38 References (cont d) Parallel Computing MapReduce: Simplified Data Processing on Large Clusters, J. Dean and S. Ghemawat, OSDI 2004 Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, M. Isard et al, EuroSys 2007 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets, R. Chaiken et al, VLDB 2008 DryadLINQ: A System for General-Purpose Distributed Data- Parallel Computing Using a High-Level Language, Y. Yu et al Structured Data Bigtable: A Distributed Storage System for Structured Data, F. Chang et al, OSDI 2006 Dynamo Amazon s Highly Available Key-value Store, G. DeCandia et al, SOSP 2007 Large-scale Incremental Processing Using Distributed Transactions and Notifications, D. Peng and F. Dabek, OSDI

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance

More information

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Google File System (GFS) and Hadoop Distributed File System (HDFS) Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani The Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani CS5204 Operating Systems 1 Introduction GFS is a scalable distributed file system for large data intensive

More information

Google File System. By Dinesh Amatya

Google File System. By Dinesh Amatya Google File System By Dinesh Amatya Google File System (GFS) Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung designed and implemented to meet rapidly growing demand of Google's data processing need a scalable

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems

More information

CSE 124: Networked Services Lecture-16

CSE 124: Networked Services Lecture-16 Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

The Google File System

The Google File System The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

The Google File System. Alexandru Costan

The Google File System. Alexandru Costan 1 The Google File System Alexandru Costan Actions on Big Data 2 Storage Analysis Acquisition Handling the data stream Data structured unstructured semi-structured Results Transactions Outline File systems

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

CSE 124: Networked Services Fall 2009 Lecture-19

CSE 124: Networked Services Fall 2009 Lecture-19 CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

MapReduce. U of Toronto, 2014

MapReduce. U of Toronto, 2014 MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in

More information

Google File System. Arun Sundaram Operating Systems

Google File System. Arun Sundaram Operating Systems Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung ECE7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective (Winter 2015) Presentation Report GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

More information

BigData and Map Reduce VITMAC03

BigData and Map Reduce VITMAC03 BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to

More information

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs

CS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.0.0 CS435 Introduction to Big Data 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.1 FAQs Deadline of the Programming Assignment 3

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)

More information

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E. 18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File

More information

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks

More information

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?

More information

Google Disk Farm. Early days

Google Disk Farm. Early days Google Disk Farm Early days today CS 5204 Fall, 2007 2 Design Design factors Failures are common (built from inexpensive commodity components) Files large (multi-gb) mutation principally via appending

More information

Staggeringly Large File Systems. Presented by Haoyan Geng

Staggeringly Large File Systems. Presented by Haoyan Geng Staggeringly Large File Systems Presented by Haoyan Geng Large-scale File Systems How Large? Google s file system in 2009 (Jeff Dean, LADIS 09) - 200+ clusters - Thousands of machines per cluster - Pools

More information

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E. 18-hdfs-gfs.txt Thu Nov 01 09:53:32 2012 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2012 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File

More information

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006 Google File System, Replication Amin Vahdat CSE 123b May 23, 2006 Annoucements Third assignment available today Due date June 9, 5 pm Final exam, June 14, 11:30-2:30 Google File System (thanks to Mahesh

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs

More information

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing ΕΠΛ 602:Foundations of Internet Technologies Cloud Computing 1 Outline Bigtable(data component of cloud) Web search basedonch13of thewebdatabook 2 What is Cloud Computing? ACloudis an infrastructure, transparent

More information

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and

More information

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Yuval Carmel Tel-Aviv University Advanced Topics in Storage Systems - Spring 2013 Yuval Carmel Tel-Aviv University "Advanced Topics in About & Keywords Motivation & Purpose Assumptions Architecture overview & Comparison Measurements How does it fit in? The Future 2 About & Keywords

More information

Distributed System. Gang Wu. Spring,2018

Distributed System. Gang Wu. Spring,2018 Distributed System Gang Wu Spring,2018 Lecture7:DFS What is DFS? A method of storing and accessing files base in a client/server architecture. A distributed file system is a client/server-based application

More information

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

Percolator. Large-Scale Incremental Processing using Distributed Transactions and Notifications. D. Peng & F. Dabek

Percolator. Large-Scale Incremental Processing using Distributed Transactions and Notifications. D. Peng & F. Dabek Percolator Large-Scale Incremental Processing using Distributed Transactions and Notifications D. Peng & F. Dabek Motivation Built to maintain the Google web search index Need to maintain a large repository,

More information

Map Reduce. Yerevan.

Map Reduce. Yerevan. Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung ACM SIGOPS 2003 {Google Research} Vaibhav Bajpai NDS Seminar 2011 Looking Back time Classics Sun NFS (1985) CMU Andrew FS (1988) Fault

More information

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles INF3190:Distributed Systems - Examples Thomas Plagemann & Roman Vitenberg Outline Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles Today: Examples Googel File System (Thomas)

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

Google File System 2

Google File System 2 Google File System 2 goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) focus on multi-gb files handle appends efficiently (no random writes & sequential reads) co-design

More information

Performance Gain with Variable Chunk Size in GFS-like File Systems

Performance Gain with Variable Chunk Size in GFS-like File Systems Journal of Computational Information Systems4:3(2008) 1077-1084 Available at http://www.jofci.org Performance Gain with Variable Chunk Size in GFS-like File Systems Zhifeng YANG, Qichen TU, Kai FAN, Lei

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

Outline. Spanner Mo/va/on. Tom Anderson

Outline. Spanner Mo/va/on. Tom Anderson Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable

More information

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver Abstract GFS from Scratch Ge Bian, Niket Agarwal, Wenli Looi https://github.com/looi/cs244b Dec 2017 GFS from Scratch is our partial re-implementation of GFS, the Google File System. Like GFS, our system

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:

More information

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Architecture

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Architecture What Is Datacenter (Warehouse) Computing Distributed and Parallel Technology Datacenter, Warehouse and Cloud Computing Hans-Wolfgang Loidl School of Mathematical and Computer Sciences Heriot-Watt University,

More information

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,

More information

Distributed Systems. GFS / HDFS / Spanner

Distributed Systems. GFS / HDFS / Spanner 15-440 Distributed Systems GFS / HDFS / Spanner Agenda Google File System (GFS) Hadoop Distributed File System (HDFS) Distributed File Systems Replication Spanner Distributed Database System Paxos Replication

More information

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05 Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 40) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

Hadoop Distributed File System(HDFS)

Hadoop Distributed File System(HDFS) Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul

More information

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System GFS: Google File System Google C/C++ HDFS: Hadoop Distributed File System Yahoo Java, Open Source Sector: Distributed Storage System University of Illinois at Chicago C++, Open Source 2 System that permanently

More information

Introduction to Distributed Data Systems

Introduction to Distributed Data Systems Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January

More information

Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh

Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh Topics Introduction to Distributed File Systems Coda File System overview Communication, Processes, Naming, Synchronization,

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

BigTable. CSE-291 (Cloud Computing) Fall 2016

BigTable. CSE-291 (Cloud Computing) Fall 2016 BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu } Introduction } Architecture } File

More information

Seminar Report On. Google File System. Submitted by SARITHA.S

Seminar Report On. Google File System. Submitted by SARITHA.S Seminar Report On Submitted by SARITHA.S In partial fulfillment of requirements in Degree of Master of Technology (MTech) In Computer & Information Systems DEPARTMENT OF COMPUTER SCIENCE COCHIN UNIVERSITY

More information

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD

DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD Department of Computer Science Institute of System Architecture, Operating Systems Group DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD OUTLINE Classical distributed file systems NFS: Sun Network File System

More information

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla

Big Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Big Table Google s Storage Choice for Structured Data Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Bigtable: Introduction Resembles a database. Does not support

More information

MapReduce & BigTable

MapReduce & BigTable CPSC 426/526 MapReduce & BigTable Ennan Zhai Computer Science Department Yale University Lecture Roadmap Cloud Computing Overview Challenges in the Clouds Distributed File Systems: GFS Data Process & Analysis:

More information

This material is covered in the textbook in Chapter 21.

This material is covered in the textbook in Chapter 21. This material is covered in the textbook in Chapter 21. The Google File System paper, by S Ghemawat, H Gobioff, and S-T Leung, was published in the proceedings of the ACM Symposium on Operating Systems

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

CS 345A Data Mining. MapReduce

CS 345A Data Mining. MapReduce CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes

More information

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman GFS CS6450: Distributed Systems Lecture 5 Ryan Stutsman Some material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 15. Distributed File Systems Paul Krzyzanowski Rutgers University Fall 2017 1 Google Chubby ( Apache Zookeeper) 2 Chubby Distributed lock service + simple fault-tolerant file system

More information

The Google File System GFS

The Google File System GFS The Google File System GFS Common Goals of GFS and most Distributed File Systems Performance Reliability Scalability Availability Other GFS Concepts Component failures are the norm rather than the exception.

More information

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google

More information

BigTable A System for Distributed Structured Storage

BigTable A System for Distributed Structured Storage BigTable A System for Distributed Structured Storage Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber Adapted

More information

Staggeringly Large Filesystems

Staggeringly Large Filesystems Staggeringly Large Filesystems Evan Danaher CS 6410 - October 27, 2009 Outline 1 Large Filesystems 2 GFS 3 Pond Outline 1 Large Filesystems 2 GFS 3 Pond Internet Scale Web 2.0 GFS Thousands of machines

More information

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader

More information

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file

More information