ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
|
|
- Kory Jenkins
- 6 years ago
- Views:
Transcription
1 ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models CIEL: A Universal Execution Engine for Distributed Dataflow Computing Presented by Saeed Shokri
2 1. Overview 2. Why CIEL 2. Goals 3. Design 4. Fault Tolerance 5. Performance 6. Conclusion Outline Ciel means Sky in French and it s pronounced see-elle 2
3 Overview Some existing distributed execution engines for cluster hardware: MapReduce Pregel Dryad Piccolo CIEL 3
4 Overview A distributed execution engine: is a software system that automatically execute a program, in parallel, runs on a cluster of networked computers, that provides a large aggregate amount of computational and I/O performance. Distributed execution engines are attractive: they shield developers from the challenging aspects of distributed and parallel computing, such as synchronization, scheduling, data transfer and dealing with failures. Data-dependent control flow: is the fundamental concept that enables a machine to change its behavior on the basis of intermediate results. This ability increases the computational power of a machine, because it enables the machine to execute iterative algorithms. 4
5 Overview Google s MapReduce: runs programs defined by functions: map(), which operates on the input records to produce intermediate data; and reduce(), which operates on the intermediate data to produce a final result. Apache Hadoop MapReduce: The simplicity of MapReduce led to several clones, including a popular open-source version called Hadoop. Dryad: Microsoft developed a more-general execution engine, called Dryad, which operates on programs that are written as data-flow graphs. 5
6 Distributed Execution Engines comparison CIEL provides: Distributed data-flow computing Task dependencies Dynamic coordination Transparency (fault tolerance, scaling, locality) 6
7 Why CIEL Why we need another distributed execution engine which called CIEL? MapReduce/Dryad have disadvantages: 1. Designed to maximize throughput, not to minimize latency. 2. Perform scheduling before running the algorithm. The resulting schedule is static. These makes MapReduce/Dryad unsuitable for iterative algorithms. Many algorithms contain data-dependent control flow, and cannot be expressed using previous execution engines. 7
8 Sample Iterative algorithm The result of thedo_lots_of_work()function is used to decide whether or not the while loop should terminate. The amount work depends on the input data, and can only be determined by actually running the algorithm. MapReduceand Dryad require a complete list of tasks to be provided when a job is submitted, so they cannot natively handle this type of algorithm. In MapReduceand Dryad, the user must write a separatedriver program, which submits multiple jobs, fetches their results, and makes the decision about when to terminate the computation. The driver program runs outside the cluster, it doesn t enjoy the benefits of running on an execution engine, in particular transparent fault tolerance. If the driver program crashes, or loses network connectivity to the cluster, the entire computation is lost. 8
9 Goals Design a distributed execution framework that can 1. efficiently run iterative algorithms 2. provide a simple interface 3. offer transparent fault tolerance 9
10 CIEL s Model CIEL is an execution model for distributed execution engines that supports data-dependent control flow. The model is based on dynamic task graphs, in which each vertex is a sequential computation that may decide, on the basis of its input, to spawn additional computation and hence rewrite the graph. Data-dependent control flow can be supported in a distributed execution engine by adding the facility for a task to spawn further tasks. A dynamic task graph is like a Dryad data-flow graph, but it also allows tasks to rewrite the graph by spawning new tasks and delegating their outputs. 10
11 Primitives of the model: Dynamic task graphs Objects: The goal of a CIEL job is to produce one or more output objects. An object is an unstructured, finite-length sequence of bytes. Every object has a unique name To simplify consistency and replication, an object is immutable once it has been written, but it is sometimes possible to append to an object. References: Describe an object without possessing its full contents. A reference comprises a name and a set of locations (e.g. hostname-port pairs) where the object with that name is stored. The set of locations may be empty: in that case, the reference is a future reference to an object that has not yet been produced. Otherwise, it is a concrete reference, which may be consumed. 11
12 Dynamic task graphs Tasks: A CIEL job makes progress by executing tasks. A task is a non-blocking atomic computation that executes completely on a single machine. A task has one or more dependencies, which are represented by references. The task becomes runnable when all of its dependencies become concrete. The dependencies include a special object that specifies the behavior of the task (such as an executable binary or a Java class) A task also has one or more expected outputs, which are the names of objects that the task will either create or delegate another task to create. objects references tasks 12
13 Data-dependent control flow For expected outputs, a task must either publish a concrete reference, or spawn a child task with that name as an expected output. The task can publish objects for its expected outputs, which may cause other tasks to become runnable if they depend on those outputs. When the children eventually terminate, any task that depends on the parent s output will eventually become runnable. A child task must only depend on concrete references (i.e. objects that already exist) or future references to the outputs of tasks that have already been spawned (i.e. objects that are already expected to be published). This prevents deadlock, as a cycle cannot form in the dependency graph. The key feature of CIEL is a dynamic task graph 13
14 A Dynamic Task Graph Dynamic Task Graph: Task spawns Task 14
15 System Architecture A CIEL cluster has a single master and many workers. The master dispatches tasks to the workers for execution. After a task completes, the worker publishes a set of objects and may spawn further tasks. Clients submit a job to the master A CIEL cluster 15
16 System Architecture Master: The master maintains the current state of the dynamic task graph in the object table and task table. Each row in the object table contains the latest reference for that object, including its locations, and a pointer to the task that is expected to produce it Each row in the task table corresponds to a spawned task, and contains pointers to the references on which the task depends. The master scheduler is responsible for making progress in a CIEL computation: It lazily evaluates output objects and pairs runnable tasks with idle workers. Task I/O may be large (gigabytes per task), all bulk data is stored on the workers themselves, and the master handles references. The master uses a multiple-queue-based scheduler to dispatch tasks to the worker nearest the data. 16
17 Task and Object table maintained in Master node The object table contains the latest reference for that object, including its locations, and a pointer to the task that is expected to produce it 17
18 System Architecture Workers: The workers execute tasks and store objects. If a worker needs to fetch a remote object, it reads the object directly from another worker. A worker registers with the master, periodically sends a heartbeat to demonstrate its availability. When a task is dispatched to a worker, the appropriate executor is invoked. An executor is a generic component that prepares input data for consumption and invokes some computation on it When a worker executes a task, reply to the master with the set of references that it wishes to publish, and list of any new tasks that it wishes to spawn. The master will then update the object table and task table, and re-evaluate the set of tasks now runnable. 18
19 Skywriting Language for expressing task-level parallelism that runs on top of CIEL Task Creation in Skywriting: Task creation is the distinctive feature that facilitates data dependent control flow. A couple of essential ways to create tasks in Skywriting: 1. spawn: (f, [args,...]) spawns a parallel task that computes and returns a pointer to f(args, ). 2. Spawn-exec: (executor, args, n) spawns a parallel task to run executor with the given args 3. exec: (executor, args, n) synchronous executor 4. dereference: (unary-*) unary dereference operator that applies to a reference. Loads the referenced data and evaluates to the resulting data structure. 19
20 Skywriting script for computing the Fibonacci number The Fibonacci sequence is a set of numbers that starts with a one or a zero, followed by a one, and proceeds based on the rule that each number (called a Fibonacci number) is equal to the sum of the preceding two numbers. The Fibonacci Sequence is the series of numbers: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, , , ,... Fibonacci numbers are of interest to biologists and physicists because they are frequently observed in various natural objects and phenomena. The branching patterns in trees and leaves, for example, and the distribution of seeds in a raspberry are based on Fibonacci numbers. 20
21 Skywriting script for computing the Fibonacci number Skywriting can be used to define a data-dependent parallel algorithm. For n > 1, the fib(n) function spawns two threads to calculate fib(n 1), fib(n - 2) Dereferences the results of these tasks, adds them together, and returns them. Dereference operator(*) is applied to x and y, which blocks the current thread until the future reference has become concrete. This example suggests the possibility of using Skywriting and CIEL to execute parallel dividend- conquer algorithms, such as decision tree learning. 21
22 Spawning Tasks The feature of Skywriting is its ability to spawn new tasks in the middle of executing a job. The language provides two explicit mechanisms for spawning new tasks (the spawn() and spawn exec() functions) and one implicit mechanism (the * -operator). 22
23 Skywriting Task The spawn() function creates a new task to run the given Skywriting function. The Skywriting runtime first creates a data object that contains the new task s environment, including the text of the function to be executed and the values of any arguments passed to the function. This object is called a Skywriting continuation, because it encapsulates the state of a computation. The runtime then creates a task descriptor for the new task, which includes a dependency on the new continuation. Finally, it assigns a reference for the task result, which it returns to the calling script. Blocking on futures 23
24 A non-skywriting Task Creation The spawn exec() function is a lower-level task creation mechanism that allows the caller to invoke code written in a different language. this function is not called directly, but rather through a wrapper for the relevant executor When spawn exec() is called, the runtime serializes the arguments into a data object and creates a task that depends on that object If the arguments to spawn exec() include references, the runtime adds those references to the new task s dependencies to ensure that CIEL will not schedule the task until all of its arguments are available. the runtime creates references for the task outputs, and returns them to the calling script. A non-skywriting task created with spawn_exec(). 24
25 Implicit Task Creation If the task attempts to dereference an object that has not yet been created(i.e. the result of a call to spawn() )the current task must block. CIEL tasks are non-blocking: all synchronization (and data-flow) must be made explicit in the dynamic task graph. the runtime implicitly creates a continuation task that depends on the dereferenced object and the current continuation (i.e. the current Skywriting execution stack). The new task therefore will only run when the dereferenced object has been produced, which provides the necessary synchronization. Skywriting script that spawns two tasks and blocks on their results 25
26 Task Termination A task terminates when it reaches a return statement (or it blocks on a future reference). A Skywriting task has a single output, which is the value of the expression in the return statement. On termination, the runtime stores the output in the local object store, publishes a concrete reference to the object, and sends a list of spawned tasks to the master, in order of creation. 26
27 Fault Tolerance Client: Trivial since no driver program is required. Worker: Monitored by master (similar to Dryad) Master: Master state can be derived from the set of active jobs. This is accomplished with persistent logging, and object table reconstruction by workers secondary masters 27
28 Master fault tolerance (Log Approach) The persistent log approach creates one log file per job. When a job is submitted, a new log file is created and the initial log entry containing the job submission message is written synchronously to that file. All spawn and publish messages that the master receives can be written to the log asynchronously. After the master fails, a new master will replay the log, applying each operation in order to rebuild the dynamic task graph for the job. 28
29 Master fault tolerance (secondary master approach ) The secondary master approach is similar to the persistent log approach. The job submission message and all spawns and publish messages are forwarded to a secondary master. The secondary master immediately applies these operations to build a hot standby version of the dynamic task graph. To maintain the same reliability guarantees, the master must wait until the secondary master acknowledges the job submission message before returning an acknowledgement to the client All other messages may be sent asynchronously. 29
30 Performance Comparison with production system Distributed Grep on Hadoop and Ciel 30
31 Performance of Iterative Algorithm K-means on Hadoop and Ciel with 20 workers 31
32 Related Work Pregel: Google s distributed execution engine for graph algorithms (designed primarily for graph algorithms) HaLoop: task scheduler is made loop-aware by adding caching mechanisms (lacks fault tolerance) Apache Mahout: Uses Hadoop as its execution engine and a driver program runs iterative algorithms (lacks master fault tolerance + requires driver program) Dryad: allows data flow to follow a more general directed acyclic graph (not support dynamic/data-dependent control flow) Naiad: A timely dataflow system. Distributed system for executing data parallel, cyclic dataflow programs 32
33 Conclusion CIEL and Skywriting are not good for: sharing large amounts of data fine-grain parallelization fully automatic parallelism relation algebra environment distributed operating system CIEL and Skywriting are good for: writing iterative algorithms data-dependent control using dynamic task graphs transparent fault tolerance and automatic distribution scaling across hundreds of machines 33
34 Reference Derek G. Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy, and Steven Hand. CIEL: a universal execution engine for distributed data-flow computing. In NSDI 2011: Proceedings of the 8th USENIX Symposium on Networked System Design and Implementation, page 00,
35 Reference 1. In what aspect(s) is CIEL different MapReduceand Dryad? (the first paragraph of Section 1) 2. What weaknesses of Pregeland MapReduceare addressed by CIEL, respectively? (the 4th, 5th, 6th paragraphs in Section 2) 35
Lecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationDISTRIBUTED COMPUTER SYSTEMS
DISTRIBUTED COMPUTER SYSTEMS Communication Fundamental REMOTE PROCEDURE CALL Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline Communication Architecture Fundamentals
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs
More informationHybrid MapReduce Workflow. Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US
Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US Outline Introduction and Background MapReduce Iterative MapReduce Distributed Workflow Management
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationCS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab
CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material
More informationMassive Online Analysis - Storm,Spark
Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R
More informationToday s content. Resilient Distributed Datasets(RDDs) Spark and its data model
Today s content Resilient Distributed Datasets(RDDs) ------ Spark and its data model Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing -- Spark By Matei Zaharia,
More informationResearch challenges in data-intensive computing The Stratosphere Project Apache Flink
Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationMapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia
MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,
More informationMultiprocessors 2007/2008
Multiprocessors 2007/2008 Abstractions of parallel machines Johan Lukkien 1 Overview Problem context Abstraction Operating system support Language / middleware support 2 Parallel processing Scope: several
More informationCommunication. Distributed Systems Santa Clara University 2016
Communication Distributed Systems Santa Clara University 2016 Protocol Stack Each layer has its own protocol Can make changes at one layer without changing layers above or below Use well defined interfaces
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationPractical Considerations for Multi- Level Schedulers. Benjamin
Practical Considerations for Multi- Level Schedulers Benjamin Hindman @benh agenda 1 multi- level scheduling (scheduler activations) 2 intra- process multi- level scheduling (Lithe) 3 distributed multi-
More informationMapReduce for Data Intensive Scientific Analyses
apreduce for Data Intensive Scientific Analyses Jaliya Ekanayake Shrideep Pallickara Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, 47405 5/11/2009 Jaliya Ekanayake 1 Presentation
More informationPREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING
PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski Google, Inc. SIGMOD 2010 Presented by Ke Hong (some figures borrowed from
More informationCHAPTER - 4 REMOTE COMMUNICATION
CHAPTER - 4 REMOTE COMMUNICATION Topics Introduction to Remote Communication Remote Procedural Call Basics RPC Implementation RPC Communication Other RPC Issues Case Study: Sun RPC Remote invocation Basics
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationMapReduce. U of Toronto, 2014
MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationLocality Aware Fair Scheduling for Hammr
Locality Aware Fair Scheduling for Hammr Li Jin January 12, 2012 Abstract Hammr is a distributed execution engine for data parallel applications modeled after Dryad. In this report, we present a locality
More informationSimpleChubby: a simple distributed lock service
SimpleChubby: a simple distributed lock service Jing Pu, Mingyu Gao, Hang Qu 1 Introduction We implement a distributed lock service called SimpleChubby similar to the original Google Chubby lock service[1].
More informationDryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Course: CS655 Rabiet Louis Colorado State University Thursday 19 September 2013 1 / 48 1 Motivation and Goal Why use Dryad? 2 Dryad
More informationProcesses and Threads. Processes: Review
Processes and Threads Processes and their scheduling Threads and scheduling Multiprocessor scheduling Distributed Scheduling/migration Lecture 3, page 1 Processes: Review Multiprogramming versus multiprocessing
More informationFLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568
FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected
More informationResilient Distributed Datasets
Resilient Distributed Datasets A Fault- Tolerant Abstraction for In- Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin,
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationCS5314 RESEARCH PAPER ON PROGRAMMING LANGUAGES
ORCA LANGUAGE ABSTRACT Microprocessor based shared-memory multiprocessors are becoming widely available and promise to provide cost-effective high performance computing. Small-scale sharedmemory multiprocessors
More informationClustering Lecture 8: MapReduce
Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data
More informationBefore proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors.
About the Tutorial Storm was originally created by Nathan Marz and team at BackType. BackType is a social analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short time, Apache
More informationTopics in Object-Oriented Design Patterns
Software design Topics in Object-Oriented Design Patterns Material mainly from the book Design Patterns by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides; slides originally by Spiros Mancoridis;
More informationEFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD
EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions
More informationDistributed Computations MapReduce. adapted from Jeff Dean s slides
Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)
More informationFollowing are a few basic questions that cover the essentials of OS:
Operating Systems Following are a few basic questions that cover the essentials of OS: 1. Explain the concept of Reentrancy. It is a useful, memory-saving technique for multiprogrammed timesharing systems.
More informationProgramming Systems for Big Data
Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There
More informationFor use by students enrolled in #71251 CSE430 Fall 2012 at Arizona State University. Do not use if not enrolled.
Operating Systems: Internals and Design Principles Chapter 4 Threads Seventh Edition By William Stallings Operating Systems: Internals and Design Principles The basic idea is that the several components
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationSpark: A Brief History. https://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf
Spark: A Brief History https://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf A Brief History: 2004 MapReduce paper 2010 Spark paper 2002 2004 2006 2008 2010 2012 2014 2002 MapReduce @ Google
More informationTensorFlow: A System for Learning-Scale Machine Learning. Google Brain
TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationLecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci, Charles Zheng.
CME 323: Distributed Algorithms and Optimization, Spring 2015 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Databricks and Stanford. Lecture 4, 04/08/2015. Scribed by Eric Lax, Andreas Santucci,
More informationEvolution of an Apache Spark Architecture for Processing Game Data
Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationDistributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationDistributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi
1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.
More informationDistributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs
1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationBigData and Map Reduce VITMAC03
BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationPregel: A System for Large- Scale Graph Processing. Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010
Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010 Presented by Chris Bunch Tuesday, October 12, 2010 1 Graphs are hard Poor locality of memory access Very
More informationShark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko
Shark: SQL and Rich Analytics at Scale Michael Xueyuan Han Ronny Hajoon Ko What Are The Problems? Data volumes are expanding dramatically Why Is It Hard? Needs to scale out Managing hundreds of machines
More informationFlat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897
Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks
More informationParallel Programming Concepts
Parallel Programming Concepts MapReduce Frank Feinbube Source: MapReduce: Simplied Data Processing on Large Clusters; Dean et. Al. Examples for Parallel Programming Support 2 MapReduce 3 Programming model
More informationStorm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter
Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Storm at Twitter Twitter Web Analytics Before Storm Queues Workers Example (simplified) Example Workers schemify tweets and
More informationOutline. CS-562 Introduction to data analysis using Apache Spark
Outline Data flow vs. traditional network programming What is Apache Spark? Core things of Apache Spark RDD CS-562 Introduction to data analysis using Apache Spark Instructor: Vassilis Christophides T.A.:
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central
More informationBig Data XML Parsing in Pentaho Data Integration (PDI)
Big Data XML Parsing in Pentaho Data Integration (PDI) Change log (if you want to use it): Date Version Author Changes Contents Overview... 1 Before You Begin... 1 Terms You Should Know... 1 Selecting
More informationYARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa
YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org About me Tsuyoshi Ozawa Research Engineer @ NTT Twitter: @oza_x86_64 Over 150 reviews in 2015
More informationLarge-Scale GPU programming
Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationScalable Computing: Practice and Experience Volume 10, Number 4, pp
Scalable Computing: Practice and Experience Volume 10, Number 4, pp. 413 418. http://www.scpe.org ISSN 1895-1767 c 2009 SCPE MULTI-APPLICATION BAG OF JOBS FOR INTERACTIVE AND ON-DEMAND COMPUTING BRANKO
More informationSocket attaches to a Ratchet. 2) Bridge Decouple an abstraction from its implementation so that the two can vary independently.
Gang of Four Software Design Patterns with examples STRUCTURAL 1) Adapter Convert the interface of a class into another interface clients expect. It lets the classes work together that couldn't otherwise
More informationMiddleware Mediated Transactions & Conditional Messaging
Middleware Mediated Transactions & Conditional Messaging Expert Topic Report ECE1770 Spring 2003 Submitted by: Tim Chen John C Wu To: Prof Jacobsen Date: Apr 06, 2003 Electrical and Computer Engineering
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationDRYAD: DISTRIBUTED DATA- PARALLEL PROGRAMS FROM SEQUENTIAL BUILDING BLOCKS
DRYAD: DISTRIBUTED DATA- PARALLEL PROGRAMS FROM SEQUENTIAL BUILDING BLOCKS Authors: Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly Presenter: Zelin Dai WHAT IS DRYAD Combines computational
More informationCS 6453: Parameter Server. Soumya Basu March 7, 2017
CS 6453: Parameter Server Soumya Basu March 7, 2017 What is a Parameter Server? Server for large scale machine learning problems Machine learning tasks in a nutshell: Feature Extraction (1, 1, 1) (2, -1,
More informationMultiprocessor Systems
White Paper: Virtex-II Series R WP162 (v1.1) April 10, 2003 Multiprocessor Systems By: Jeremy Kowalczyk With the availability of the Virtex-II Pro devices containing more than one Power PC processor and
More informationDistributed Object-Based Systems The WWW Architecture Web Services Handout 11 Part(a) EECS 591 Farnam Jahanian University of Michigan.
Distributed Object-Based Systems The WWW Architecture Web Services Handout 11 Part(a) EECS 591 Farnam Jahanian University of Michigan Reading List Remote Object Invocation -- Tanenbaum Chapter 2.3 CORBA
More informationCS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationCAS 703 Software Design
Dr. Ridha Khedri Department of Computing and Software, McMaster University Canada L8S 4L7, Hamilton, Ontario Acknowledgments: Material based on Software by Tao et al. (Chapters 9 and 10) (SOA) 1 Interaction
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationIntroduction to MapReduce
732A54 Big Data Analytics Introduction to MapReduce Christoph Kessler IDA, Linköping University Towards Parallel Processing of Big-Data Big Data too large to be read+processed in reasonable time by 1 server
More informationApril Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.
1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationDistributed Programming
Distributed Programming Marcel Heinz & Ralf Lämmel Software Languages Team University of Koblenz-Landau Motivation How can we achieve better performance? How can we distribute computations? How can we
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationSurvey on MapReduce Scheduling Algorithms
Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationIntroduction to Spark
Introduction to Spark Outlines A brief history of Spark Programming with RDDs Transformations Actions A brief history Limitations of MapReduce MapReduce use cases showed two major limitations: Difficulty
More informationOperating Systems. Computer Science & Information Technology (CS) Rank under AIR 100
GATE- 2016-17 Postal Correspondence 1 Operating Systems Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,
More informationThe Google File System
The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file
More informationChapter 4: Threads. Operating System Concepts 9 th Edit9on
Chapter 4: Threads Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads 1. Overview 2. Multicore Programming 3. Multithreading Models 4. Thread Libraries 5. Implicit
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [RPC & DISTRIBUTED OBJECTS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey XDR Standard serialization
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel
More information