Architecture of Systems for Processing Massive Amounts of Data
|
|
- Barry Grant
- 6 years ago
- Views:
Transcription
1 Architecture of Systems for Processing Massive Amounts of Data Milan Vojnović Microsoft Research April 2011 The Goals of this Lecture Learn about the underlying principles of system design for processing of massive amounts of data Learn about the state-of-the-art systems used in production and commercial systems of major Internet online service providers Ex. Amazon, Google and Microsoft Learn about some alternative system designs proposed in research papers 2 1
2 Typical System Characteristics Distributed system built from inexpensive commodity components Share-nothing model Machines with their own CPU, memory and hard disks interconnected with a network Failures of machines and network components are common 3 Application Requirements Support for storing and processing of large files GB, TB, PB quite common Efficient processing that uses large streaming reads and writes Operations on contiguous regions of a file Support for structured data Ex. tables, incremental processing Parallel processing Parallel computation complexities hidden from the programmer Accommodate declarative and imperative programming Quality of service requirements Ex. fast processing speed, high availability 4 2
3 Contents Network Architecture File System Job Scheduling Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 5 Network Architecture Typically hierarchical organization Either two- or three-level trees of switches or routers Core tier (ex , 10 GigE ports) Aggregation tier (ex X GigE ports) Computers clustered in racks 6 3
4 Oversubscription Switches allow all directly connected hosts to communicate with one another at the full speed of their network interface Oversubscription The ratio of the worst-case achievable aggregate bandwidth among the end hosts to the total bisection bandwidth of a particular network Pros: Lowers the cost Cons: Complicates design of protocols as the design must be conscious of network bandwidth asymmetries 7 Oversubscription: Examples 1:1 = all hosts may potentially communicate with arbitrary other hosts at the full bandwidth of their network interface 5:1 = 20% of available host bandwidth is available for some connection patterns Typical designs 2.5:1 (400 Mbps) to 8:1 (125 Mbps) 8 4
5 Alternative: Fat-Tree Topology Built using many small commodity switches Lower cost Fat-tree topology k-ary tree k pods each containing two layers of k/2 aggregation switches (upper and lower layers) (k/2)x(k/2) k-port core switches Each in the lower aggregation layer directly connected to k/2 hosts, other k/2 ports connected to k/2 switches in the upper aggregation layer Each core switch has one port connected to each of k pods A fat-tree with k-port switches supports k 3 /4 hosts 9 An Example Fat-Tree Topology k =
6 Addressing IP address blocks /8 Pod switch: 10.pod.switch.1 Pod = the pod number [0,k-1] Switch = position of the switch in the pod [0,k-1] from left to right bottom to top Core switch: 10.k.j.i (j,i) = switch coordinate in the (k/2)x(k/2) core switch grid; i,j in [1,k/2] from top-left Host: 10.pod.switch.ID ID = host position in the subnet [2,k/2+1] from left to right 11 IP Routing Goal: IP routing using paths across the network so that the load on switches is balanced Two-level prefix lookup Primary table contains first-level prefixes Secondary table contains second-level suffix and port A first-level prefix in the primary table may contain a pointer to a secondary table Prefix said to be terminating if no (suffix, port) entry is associated to this prefix Inter-pod routing has default /0 prefix and routing uses suffix routing based on host ID Centralized configuration of routing table entries Appropriate for data centre scenarios 12 6
7 IP Routing Example Routing table at switch IP address forwarded to port 1 IP address forwarded to port 3 13 IP Routing Table Generation Aggregation switch routing tables for load balancing 14 7
8 IP Routing Table Generation (cont d) Core switch routing tables Load balancing in the initial part of the route through aggregation switches 15 Packing One drawback of the fat-tree topology is the number of required cables Larger fan out of switches Packing aims at Minimizing the number of external cables Reducing the overall cable length Allowing for incremental deployment Aggregation switches partitioned over pod racks Star layout per pod to reduce cable length Pod rack is a hub and racks are leaves to reduce Only external cabling is to core switches 16 8
9 Packing: Example 17 Network Architecture File System Job Scheduling Contents Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 18 9
10 File System Needs to meet a set of design requirements for efficient processing of massive data sets Ex. Google File System (GFS) Cosmos (Microsoft) 19 Design Requirements Storing a modest number of large files Ex. millions of files, each of size 100 MB or larger, multi GB file size are common Large streaming reads Individual operations typically read multiple MBs; clients often read through a contiguous region of a file Many large sequential writes Mostly append to a file; modifications are rare Semantics to support multiple clients concurrently appending to a file Atomicity with minimal overhead High sustained bandwidth more important than latency 20 10
11 System Architecture File partitioned into chunks ( extents in Cosmos) Ex. chunk size 64 MB Choice of chunk size for efficient reads Specialized master nodes Ex. handle namespace management and locking, replica placement, creation, re-replication and rebalancing Chunkservers Maintain replicas Clients Issue read and write requests 21 GFS Architecture Single master node Centralized component: simplifies the design Ex. Easier to implement chunk placement strategies using global knowledge Clients never read or write file data through the master Otherwise, the master may become a bottleneck 22 11
12 GFS Architecture (cont d) Client sends request for chunk index The master replies with the corresponding chunk handle and location of the replicas Client reads data directly from a chunkserver 23 Consistency Model A file region consistent if all clients always see the same data, regardless of which replicas they read from Uses a lease mechanism for consistent mutation order across replicas Mutation = an operation that changes the contents or metadata of a chunk (ex. write or append) The master grants a chunk lease to one of the replicas (called primary) Delegation to minimize management overhead of the master Lease granted for a time period that can be repeatedly extended by the primary The primary determines a serial order for all mutations of the chunk, which is then followed by all replicas 24 12
13 The Lease Mechanism 1. Client asks for chunkserver that holds lease for the chunk; if none exists, the master grants one to a replica it choses 2. Primary and secondary replicas communicated to the client 3. The client pushes the data to all the replicas 4. Once all replicas ack to have received the data, the client sends write request to the primary 5. The primary forwards the write request to all secondary replicas 6. Each secondary ack to the primary to have applied the operation 7. The primary replies to the client 25 Chunk Replication Resilience to failures of machines and network partitioning Chunks replicated to multiple chunkservers on different racks K copy replication 26 13
14 Data Flow Pipelined data transmission through a chain of chunk servers The goal is to fully utilize each machine s network interface Each machine forwards the data to the closest machine in the network topology that has not received it Distances estimated from IP addresses Data pipelined over TCP connections In absence of network congestion, the transfer time for R replicas = B/C + R L B = number of bytes to transfer C = network interface bandwidth L = latency to transfer bytes between two machines 27 Network Architecture File System Job Scheduling Contents Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 28 14
15 Job Scheduling Jobs consists of multiple tasks Ex. a task needs to process data on a machine Scheduling of tasks to machines with respect to the following criteria: Inter-job fairness: allocation of resources fair across jobs with respect to an adopted notion of fairness Data locality: tasks placed near to data Scheduling of tasks performed by execution control (runtime) Examples: Quincy (Microsoft), Delay Scheduling (Hadoop) 29 Principles of Job Scheduling in Distributed Cluster Systems Separation of the inter-job fairness objective and data-locality objective Inter-job fairness Ex. weighted round robin, or special case of uniform round robin Data-locality Different ways to accommodate this 30 15
16 Queue-based Scheduling Data structure to encode locality preference with no inter-job fairness C i = machine i R i = rack i X = cluster w i j = task i of job j 31 Simple Greedy Fairness M = number of machines K = current number of jobs N j = number of unfinished tasks of job j Baseline allocation to job j: B j = min M K, N j If B j < M j the remaining slots divided equally among jobs that have additional tasks to determine the allocation B j ; else A j = B j Greed Fair Scheduler: block job j if the number of allocated machines is A j or more 32 16
17 Simple Greedy Fairness (cont d) Suffers form a stick-slot problem When a task finishes, fairness requires to serve the same job Solution by using a hysteresis approach A job unblocked if the number of running tasks falls below A j α, for some α > 0 Essentially uniform round-robin Uniform allocation across jobs Same as with Hadoop Fair Scheduler 33 Combining Inter-Job Fairness and Locality Preference Each job given an allocation according to a fairness criteria Allocation derived by solving a mincost flow problem The costs encode locality preference Ex. machine and rack preference 34 17
18 Delay Scheduling Inter-job fairness criteria: essentially the same as with Quincy Basic idea: Scheduling a job that should be served according the inter-job fairness criterion postponed for a limited number of scheduling slots, if the head-of-the-line task of this job cannot be assigned locally 35 Delay Scheduling (cont d) D = input parameter determining the maximum number of skips per job 36 18
19 Configuring Number of Skips M = number of machines L = slots per machine P j = set of machines on which job j has data to process (preferred machine for job j) p j = P j /M T = task processing time R = number of replicas per data chunk For a job j that is farthest below its fair share the probability of launching a non-local task = (1 p j ) D Exponentially decreasing with D Choose D such that the average portion of locally assigned tasks for a job with N tasks is at least 1 ε, for given ε > 0 37 Assumption 1 Ass. 1: All N tasks require data from same machine Sufficient: D log 1 ε 1 log 1 R M = M R log 1 ε + o(m R ) 38 19
20 Proof Sketch Probability that a task is assigned to a preferred machine: 1 1 R M D 1 e RD M Therefore, it suffices to chose D such that 1 e RD M 1 ε which yields the result 39 Assumption 2 Ass. 2: each task prefers a machine selected uniformly at random from the set of M machines Suppose NR = o( suffices M), then for every ε > 1 N it D M R log εn = M R 1 Nε + o( 1 Nε ) 40 20
21 Proof Sketch First note that machines preferred by tasks are all distinct with high probability; ex. going to 1 if NR = o M 1 1 M 1 2 M 1 NR 1 M 1 NR M NR e M (NR) 2 NR 2 M 1 Given there are K unfinished tasks for job j, the probability that a local task assignment is approximately: 1 1 KR D 1 e RD M K M Average fraction of local assignments per job is at least 1 N N K=1 1 e RD M K 1 1 N K=0 e RD M K 1 = 1 N 1 e RD M The result follows by requiring the right-hand side is at least 1 ε 41 Network Architecture File System Job Scheduling Contents Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 42 21
22 MapReduce Abstraction of group-by and aggregation Applications typically use several rounds of Map and Reduce phases 43 Example of Map and Reduce reduce map 44 22
23 System Components 45 Dryad A general-purpose distributed execution engine for coarse-grained data-parallel computations Based on specifying a dataflow graph Vertices contain code Directed edges (channels) describe data flows DAG = Directed Acyclic Graph 46 23
24 System Architecture NS = name server JM = job manager Determines the assignment of vertices to machines and orchestrates the overall execution D = machines V = vertices 47 Vertices and Channels vertex input channels producer consumer output channels channel Vertex Denotes computation code Typically sequential But also supports even-based programming, ex. using shared thread pool Channel types File (default): preserved after vertex execution until the job completes TCP: requires no disk accesses, but both end-point vertices must be scheduled to run at the same time Shared memory FIFO: low comm cost, but end-point vertices must run within the same process 48 24
25 Data Flow Graph: Construction Operators Clone Point-wise composition Complete bipartite composition Merge 49 Construction Operators (cont d) 50 25
26 Example: Histogram Computation Compute histogram of record frequencies Map phase P = read a part of the file to extract records D = distribute the input using hash partitioning S = performs in-memory sort C = compute the total count per record 51 Example: Histogram Computation Reduce phase MS = sort based on the record hash C = computes total count per record 52 26
27 Example: Histogram Computation Optimized Version Wasteful to execute Q vertices for every input partition Small input partition size Much smaller than RAM size Inefficient to read from many input partitions 53 SCOPE Structured Computations Optimized for Parallel Computations Data modelled as a set of rows comprised of typed columns Declarative language Programs tells what to do, not how to do it Resembles SQL with C# expressions Sequence of commands, typically data transformation operators (take one or more rowsets as input, perform some operation on the data, output a rowset) Compiler and optimizer responsible for generating an efficient execution plan and the runtime for executing the plan with minimal overhead 54 27
28 SCOPE Software Stack SCOPE script SCOPE compiler SCOPE runtime SCOPE optimizer Cosmos execution environment Cosmos file system Cosmos files 55 Example: Histogram Find most popular queries that were requested more than 1000 times Step-by-step equivalent: 56 28
29 Example: Histogram Execution Plan Extractors read extents in parallel Partial aggregation at the rack level* Distribute partition on the grouping column Final aggregation Take only rows with count larger than 1000 Sort by count Merge sorted results * Exploits knowledge about network topology 57 DryadLINQ Similar purpose as with SCOPE but uses LINQ LINQ = Language Integrated Query A set of.net constructs for programming with datasets Objects can be of any.net type Easy to compute using vectors and matrices 58 29
30 DryadLINQ Software Stack DryadLINQ Dryad Cluster services High-level language API Distributed execution, fault-tolerance, scheduling Remote process execution, naming, storage Windows Server Windows Server Windows Server 59 Example: Histogram 60 30
31 Contents Network Architecture File System Job Scheduling Parallel Computing MapReduce, Dryad, SCOPE and DryadLINQ Structured Data BigTable, Amazon s Dynamo, Percolator 61 Design Principles Provide a client with a structured data model that supports control over layout and format Distributed storage system for structured data Efficient reads/writes Consistency High availability 62 31
32 BigTable Data model: multidimensional sorted map (row:string, column: string, time:int64) -> string Column families Group of column keys; basic unit of access control Small number of column families each possibly consisting of many columns Atomic reads and writes Uses horizontal partitioning Rowsets distributed across machines Efficient reads of short ranges as they typically require access to a small number of machines Consistency Uses highly-available and persistent distributed lock service Chuby 63 An Example Table Rows correspond to reversed URLs Contents is the web page content The anchor column family consists of anchor text that referenced the web page Timestamps t i indicate various snapshots 64 32
33 Interface to a Table Write to a table: Read from a table: 65 System Architecture Tablet = contiguous region of the key space Master Assigns tablets to tablet servers Detects the addition and expiration of tablet servers Balances tablet server load Garbage collection Tablet server Tablet server Tablet server Stores a collection of tablets 66 33
34 Indexing of Tablet Locations Three-level hierarchy similar to B+ trees Stores the location of the root tablet Root tablet stores the location of all tablets METADATA tablet points to location for a set of tablets 67 Dictionary: key -> value Amazon s Dynamo Many services require only to store and retrieve a primary key No need for complex relational database queries Key requirement: high availability Always writable Relaxing consistency guarantees Other requirements Incremental scalability Symmetry (no special roles taken by some components) Decentralization (no centralized components) Leverage system heterogeneity 68 34
35 Key System Design Choices Partitioning by consistent hashing Allows for incremental scalability High availability for writes using vector clocks with reconciliations during reads Handling temporary failures using quorum Recovering from system failures using Merkle trees Membership and failure detection using a gossip-based membership protocol 69 Consistent Hashing (for resilience to failures) B is the coordinator for key K Preference list for key K contains B, C and D 70 35
36 Data Versioning using Vector Clocks Vector clock = a list of (node, counter) pairs both versions must be kept 71 Percolator Incremental processing using distributed transactions and notifications Two main abstractions ACID transactions over a random-access repository Observers a way to organize an incremental computation 72 36
37 Notifications Observers - the user written code that is triggered by changes to the table Similar to database triggers or events in active databases Percolator application is a series of observers Notifications designed to help structure an incremental computation 73 References Network Architecture A Scalable Commodity Data Center Network Architecture, M. Al-Fares, A. Loukissas and A. Vahdat, SIGCOMM 2008 File System The Google File System, S. Ghemawat, H. Gobioff and S.- T. Leung, SOSP 2003 Job Scheduling Quincy: Fair Scheduling for Distributed Computing Clusters, M. Isard et al, SOSP 2009 Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling, M. Zaharia et al, EuroSys
38 References (cont d) Parallel Computing MapReduce: Simplified Data Processing on Large Clusters, J. Dean and S. Ghemawat, OSDI 2004 Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, M. Isard et al, EuroSys 2007 SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets, R. Chaiken et al, VLDB 2008 DryadLINQ: A System for General-Purpose Distributed Data- Parallel Computing Using a High-Level Language, Y. Yu et al Structured Data Bigtable: A Distributed Storage System for Structured Data, F. Chang et al, OSDI 2006 Dynamo Amazon s Highly Available Key-value Store, G. DeCandia et al, SOSP 2007 Large-scale Incremental Processing Using Distributed Transactions and Notifications, D. Peng and F. Dabek, OSDI
The Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationAuthors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani
The Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani CS5204 Operating Systems 1 Introduction GFS is a scalable distributed file system for large data intensive
More informationGoogle File System. By Dinesh Amatya
Google File System By Dinesh Amatya Google File System (GFS) Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung designed and implemented to meet rapidly growing demand of Google's data processing need a scalable
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More information! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationCSE 124: Networked Services Lecture-16
Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationThe Google File System
The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationThe Google File System. Alexandru Costan
1 The Google File System Alexandru Costan Actions on Big Data 2 Storage Analysis Acquisition Handling the data stream Data structured unstructured semi-structured Results Transactions Outline File systems
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationCSE 124: Networked Services Fall 2009 Lecture-19
CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationMapReduce. U of Toronto, 2014
MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationGOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
ECE7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective (Winter 2015) Presentation Report GOOGLE FILE SYSTEM: MASTER Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
More informationBigData and Map Reduce VITMAC03
BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to
More informationCS435 Introduction to Big Data FALL 2018 Colorado State University. 11/7/2018 Week 12-B Sangmi Lee Pallickara. FAQs
11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.0.0 CS435 Introduction to Big Data 11/7/2018 CS435 Introduction to Big Data - FALL 2018 W12.B.1 FAQs Deadline of the Programming Assignment 3
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationCSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores
CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)
More information18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
More informationFlat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897
Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks
More informationReferences. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals
References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?
More informationGoogle Disk Farm. Early days
Google Disk Farm Early days today CS 5204 Fall, 2007 2 Design Design factors Failures are common (built from inexpensive commodity components) Files large (multi-gb) mutation principally via appending
More informationStaggeringly Large File Systems. Presented by Haoyan Geng
Staggeringly Large File Systems Presented by Haoyan Geng Large-scale File Systems How Large? Google s file system in 2009 (Jeff Dean, LADIS 09) - 200+ clusters - Thousands of machines per cluster - Pools
More information18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Nov 01 09:53:32 2012 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2012 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
More informationCS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab
CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material
More informationBigtable. Presenter: Yijun Hou, Yixiao Peng
Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng
More informationGoogle File System, Replication. Amin Vahdat CSE 123b May 23, 2006
Google File System, Replication Amin Vahdat CSE 123b May 23, 2006 Annoucements Third assignment available today Due date June 9, 5 pm Final exam, June 14, 11:30-2:30 Google File System (thanks to Mahesh
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationDYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun
DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:
More informationDistributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented
More informationGeorgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong
Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services
More informationBigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao
Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement
More informationProgramming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines
A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs
More informationΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing
ΕΠΛ 602:Foundations of Internet Technologies Cloud Computing 1 Outline Bigtable(data component of cloud) Web search basedonch13of thewebdatabook 2 What is Cloud Computing? ACloudis an infrastructure, transparent
More information4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS
W13.A.0.0 CS435 Introduction to Big Data W13.A.1 FAQs Programming Assignment 3 has been posted PART 2. LARGE SCALE DATA STORAGE SYSTEMS DISTRIBUTED FILE SYSTEMS Recitations Apache Spark tutorial 1 and
More informationYuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013
Yuval Carmel Tel-Aviv University "Advanced Topics in About & Keywords Motivation & Purpose Assumptions Architecture overview & Comparison Measurements How does it fit in? The Future 2 About & Keywords
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture7:DFS What is DFS? A method of storing and accessing files base in a client/server architecture. A distributed file system is a client/server-based application
More informationCS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationCS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationPercolator. Large-Scale Incremental Processing using Distributed Transactions and Notifications. D. Peng & F. Dabek
Percolator Large-Scale Incremental Processing using Distributed Transactions and Notifications D. Peng & F. Dabek Motivation Built to maintain the Google web search index Need to maintain a large repository,
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung ACM SIGOPS 2003 {Google Research} Vaibhav Bajpai NDS Seminar 2011 Looking Back time Classics Sun NFS (1985) CMU Andrew FS (1988) Fault
More informationOutline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles
INF3190:Distributed Systems - Examples Thomas Plagemann & Roman Vitenberg Outline Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles Today: Examples Googel File System (Thomas)
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationGoogle File System 2
Google File System 2 goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) focus on multi-gb files handle appends efficiently (no random writes & sequential reads) co-design
More informationPerformance Gain with Variable Chunk Size in GFS-like File Systems
Journal of Computational Information Systems4:3(2008) 1077-1084 Available at http://www.jofci.org Performance Gain with Variable Chunk Size in GFS-like File Systems Zhifeng YANG, Qichen TU, Kai FAN, Lei
More informationCPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationOutline. Spanner Mo/va/on. Tom Anderson
Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable
More informationAbstract. 1. Introduction. 2. Design and Implementation Master Chunkserver
Abstract GFS from Scratch Ge Bian, Niket Agarwal, Wenli Looi https://github.com/looi/cs244b Dec 2017 GFS from Scratch is our partial re-implementation of GFS, the Google File System. Like GFS, our system
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationBigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng
Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:
More informationWhat Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Architecture
What Is Datacenter (Warehouse) Computing Distributed and Parallel Technology Datacenter, Warehouse and Cloud Computing Hans-Wolfgang Loidl School of Mathematical and Computer Sciences Heriot-Watt University,
More informationBigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis
BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,
More informationDistributed Systems. GFS / HDFS / Spanner
15-440 Distributed Systems GFS / HDFS / Spanner Agenda Google File System (GFS) Hadoop Distributed File System (HDFS) Distributed File Systems Replication Spanner Distributed Database System Paxos Replication
More informationEngineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05
Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationGoogle File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information
Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 40) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationHadoop Distributed File System(HDFS)
Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul
More informationHDFS: Hadoop Distributed File System. Sector: Distributed Storage System
GFS: Google File System Google C/C++ HDFS: Hadoop Distributed File System Yahoo Java, Open Source Sector: Distributed Storage System University of Illinois at Chicago C++, Open Source 2 System that permanently
More informationIntroduction to Distributed Data Systems
Introduction to Distributed Data Systems Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook January
More informationDistributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh
Distributed File Systems (Chapter 14, M. Satyanarayanan) CS 249 Kamal Singh Topics Introduction to Distributed File Systems Coda File System overview Communication, Processes, Naming, Synchronization,
More informationFLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568
FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected
More informationBigTable. CSE-291 (Cloud Computing) Fall 2016
BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationKonstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu } Introduction } Architecture } File
More informationSeminar Report On. Google File System. Submitted by SARITHA.S
Seminar Report On Submitted by SARITHA.S In partial fulfillment of requirements in Degree of Master of Technology (MTech) In Computer & Information Systems DEPARTMENT OF COMPUTER SCIENCE COCHIN UNIVERSITY
More informationDISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD
Department of Computer Science Institute of System Architecture, Operating Systems Group DISTRIBUTED FILE SYSTEMS CARSTEN WEINHOLD OUTLINE Classical distributed file systems NFS: Sun Network File System
More informationBig Table. Google s Storage Choice for Structured Data. Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla
Big Table Google s Storage Choice for Structured Data Presented by Group E - Dawei Yang - Grace Ramamoorthy - Patrick O Sullivan - Rohan Singla Bigtable: Introduction Resembles a database. Does not support
More informationMapReduce & BigTable
CPSC 426/526 MapReduce & BigTable Ennan Zhai Computer Science Department Yale University Lecture Roadmap Cloud Computing Overview Challenges in the Clouds Distributed File Systems: GFS Data Process & Analysis:
More informationThis material is covered in the textbook in Chapter 21.
This material is covered in the textbook in Chapter 21. The Google File System paper, by S Ghemawat, H Gobioff, and S-T Leung, was published in the proceedings of the ACM Symposium on Operating Systems
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationCS 345A Data Mining. MapReduce
CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes
More informationGFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman
GFS CS6450: Distributed Systems Lecture 5 Ryan Stutsman Some material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationDistributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 15. Distributed File Systems Paul Krzyzanowski Rutgers University Fall 2017 1 Google Chubby ( Apache Zookeeper) 2 Chubby Distributed lock service + simple fault-tolerant file system
More informationThe Google File System GFS
The Google File System GFS Common Goals of GFS and most Distributed File Systems Performance Reliability Scalability Availability Other GFS Concepts Component failures are the norm rather than the exception.
More informationbig picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures
Lecture 20 -- 11/20/2017 BigTable big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures what does paper say Google
More informationBigTable A System for Distributed Structured Storage
BigTable A System for Distributed Structured Storage Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber Adapted
More informationStaggeringly Large Filesystems
Staggeringly Large Filesystems Evan Danaher CS 6410 - October 27, 2009 Outline 1 Large Filesystems 2 GFS 3 Pond Outline 1 Large Filesystems 2 GFS 3 Pond Internet Scale Web 2.0 GFS Thousands of machines
More informationDynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation
Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More information