Cloud Computing Application I: Virtualization, Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

Size: px
Start display at page:

Download "Cloud Computing Application I: Virtualization, Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University"

Transcription

1 Cloud Computing Application I: Virtualization, Hadoop Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University laiping@tju.edu.cn 1

2 Outline Virtualization GFS/HDFS Mapreduce: Part I 2

3 IaaS 3

4 Service Model Software as a Service (SaaS) Use provider's applications over a network SalesForce.com, Gmail etc. Platform as a Service (PaaS) Deploy customer-created applications to a cloud AppEngine, Hadoop, Azure etc. Infrastructure as a Service (IaaS) Rent processing, storage, network Capacity and other fundamental computing resources Amazon EC2, S3 etc. 4

5 Service Model IaaS PaaS- SaaS Use So<ware Applica8on use Use Responsibilityboundary Applica8on-run8me-environment Computer-resource- e.g.-machine,-storage Hardware Policy Policy Policy Parts-related-with-the-cloud-service-provider Parts-related-with-the-cloud-user 5

6 Service Model 6

7 IaaS IaaS (Infrastructure as a Service): The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. Example: 7

8 IaaS With IaaS: IaaS cloud provider takes care of all the IT infrastructure complexities. IaaS cloud provider provides all the infrastructure functionalities. IaaS cloud provider guarantees qualified infrastructure services. IaaS cloud provider charges clients according to their resource usage. Enabling techniques: Virtualization. 8

9 Virtualization 9

10 Virtualization Virtualization is an enabling technique to provide an abstraction of logical resources away from underlying physical resources. 10

11 Virtualization What is virtualization? Virtualization is the creation of a virtual (rather than physical) version of something, such as an operating system, a server, a storage device or network resources. It hides the physical characteristics of a resource from users, instead showing another abstract resource. But, where does virtualization come from? Virtualization is NOT a new idea of computer science. Virtualization concept comes from the component abstraction of system design, and it has been adapted in many system level. Now, let s take a look of our original system architecture!! 11

12 Layers of Abstraction System abstraction: Computer systems are built on levels of abstraction. Higher level of abstraction hide details at lower levels. Designer of each abstraction level make use of the functions supported from its lower level, and provide another abstraction to its higher one. Example Files are an abstraction of a disk. 12

13 Layers of Abstraction Machine level abstraction : For OS developers, a machine is defined by ISA (Instruction Set Architecture). This is the major division between hardware and software. Examples : CISC X86 RISC ARM RISC MIPS 13

14 Layers of Abstraction OS level abstraction : For compiler or library developers, a machine is defined by ABI (Application Binary Interface). This defines the basic OS interface which may be used by libraries or user. Examples : User ISA OS system call 14

15 Layers of Abstraction Library level abstraction : For application developers, a machine is defined by API (Application Programming Interface). This abstraction provides the well-rounded functionalities. Examples : User ISA Standard C library Graphical library 15

16 Layers of Abstraction The concept of virtualization is everywhere!! In IaaS, we focus the virtualization granularity at each physical hardware device. General virtualization implementation level : Virtualized instance Software virtualized hardware instance Virtualization layer Software virtualization implementation Abstraction layer Various types of hardware access interface Physical hardware Various types of infrastructure resources Different physical resources : Server, Storage and Network. 16

17 Virtual Machine What is Virtual Machine (VM)? VM is a software implementation of a machine (i.e. a computer) that executes programs like a real machine. Terminology : Host (Target) The primary environment where will be the target of virtualization. Guest (Source) The virtualized environment where will be the source of virtualization. Virtualization involves the construction of an isomorphism ( 同形 ) from guest state to host state. 17

18 System Virtual Machine System virtual machine Provide the entire operating system on same or different host ISA (Instruction Set Architecture). Constructed at ISA level. Examples: XEN, KVM, VMWare. 18

19 Virtual Machine Monitor (VMM) What s Virtual Machine Monitor (VMM)? VMM or Hypervisor is the software layer providing the virtualization. System architecture: 19

20 Virtualization Types Virtualization Types : Type 1 Bare metal ( 原 生虚拟化 ) VMMs run directly on the host's hardware as a hardware control and guest operating system monitor. Type 2 Hosted ( 寄宿虚拟化 ) VMMs are software applications running within a conventional operating system. 20

21 Virtualization Approaches Virtualization Approaches : Full-Virtualization ( 完全虚拟化 ) VMM simulates enough hardware to allow an unmodified guest OS. Para-Virtualization ( 部分虚拟化 ) VMM does not necessarily simulate hardware, but instead offers a special API that can only be used by the modified guest OS. 21

22 Virtualization Approaches Full-Virtualization 22

23 Virtualization Approaches Para-Virtualization ( 部分虚拟化 ) 23

24 Examples 24

25 CPU virtualization 25

26 Virtualization Technique Virtualization requirements from Popek and Goldberg : Popek and Goldberg provide a set of sufficient conditions for a computer architecture to efficiently support system virtualization. Popek and Goldberg provide guidelines for the design of virtualized computer architectures. In Popek and Goldberg terminology, a VMM must present all three properties : Equivalence (Same as real machine) 同质 Resource control (Totally control) 资源受控 Efficiency (Native execution) 高效 26

27 CPU Architecture Modern CPU status is usually classified as several modes. In general, we conceptually divide them into two modes: Kernel mode (Ring 0) CPU may perform any operation allowed by its architecture, including any instruction execution, IO operation, area of memory access, and so on. Traditional OS kernel runs in Ring 0 mode. User mode (Ring 1 ~ 3) CPU can typically only execute a subset of those available instructions in kernel mode. Traditional application runs in Ring 3 mode. 27

28 CPU Architecture By the classification of CPU modes, we divide instructions into following types : Privileged instruction ( 特权指令 ) Those instructions that trap if the machine is in user mode and do not trap if the machine is in kernel mode. Sensitive instructions ( 敏敏感指令 ) Those instructions that interact with hardware, which include controlsensitive and behavior-sensitive instructions. Innocuous instruction ( 无害指令 ) All other instructions. Critical instruction ( 临界指令 ) Those sensitive but not privileged instructions. 28

29 CPU Architecture X86 ISA RISC ISA 29

30 CPU Architecture What is trap ( 陷 入 )? When CPU is running in user mode, some internal or external events, which need to be handled in kernel mode, take place. Then CPU will jump to hardware exception handler vector, and execute system operations in kernel mode. Trap types : System Call ( 系统调 用 ) Invoked by application in user mode. For example, application ask OS for system IO. Hardware Interrupts ( 硬件中断 ) Invoked by some hardware events in any mode. For example, hardware clock timer trigger event. Exception ( 异常 ) Invoked when unexpected error or system malfunction occur. For example, execute privilege instructions in user mode. 30

31 Trap and Emulate Model Traditional OS : When application invoke a system call : CPU will trap to interrupt handler vector in OS. CPU will switch to kernel mode (Ring 0) and execute OS instructions. When hardware event : Hardware will interrupt CPU execution, and jump to interrupt handler in OS. 31

32 Trap and Emulate Model If we want CPU virtualization to be efficient, how should we implement the VMM? We should make guest binaries run on CPU as fast as possible. Theoretically speaking, if we can run all guest binaries natively, there will NO overhead at all. But we cannot let guest OS handle everything, VMM should be able to control all hardware resources. Solution : Ring Compression Shift traditional OS from kernel mode(ring 0) to user mode(ring 1), and run VMM in kernel mode. Then VMM will be able to intercept all trapping event. 32

33 Trap and Emulate Model VMM virtualization paradigm (trap and emulate) : Let normal instructions of guest OS run directly on processor in user mode. When executing privileged instructions, hardware will make processor trap into the VMM. The VMM emulates the effect of the privileged instructions for the guest OS and return to guest. 33

34 Trap and Emulate Model VMM and Guest OS : System Call CPU will trap to interrupt handler vector of VMM. VMM jump back into guest OS. Hardware Interrupt Hardware make CPU trap to interrupt handler of VMM. VMM jump to corresponding interrupt handler of guest OS. Privilege Instruction Running privilege instructions in guest OS will be trapped to VMM for instruction emulation. After emulation, VMM jump back to guest OS. 34

35 Virtualization Theorem Subset theorem : For any conventional third-generation computer, a VMM may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions. Recursive Emulation: A conventional third-generation computer is recursively virtualizable if It is virtualizable. VMM without any timing dependencies can be constructed for it. Under this theorem, x86 architecture cannot be virtualized directly. Other techniques are needed. 35

36 Virtualization Techniques How to virtualize unvirtualizable hardware : Para-virtualization ( 部分虚拟化 ) Modify guest OS to skip the critical instructions. Implement some hyper-calls to trap guest OS to VMM. Binary translation ( 二进制翻译 ) Use emulation technique to make hardware virtualizable. Skip the critical instructions by means of these translations. Hardware assistance Modify or enhance ISA of hardware to provide virtualizable architecture. Reduce the complexity of VMM implementation. Examples: Intel VT-x, AMD-V. 36

37 Hardware Assistance Intel VT-X: Include one more mode in X86 Root mode: Normal Instructions can run like the original mode; VMM runs in this mode Non-Root mode: All sensitive instructions are redefined; Sensitive instructions are trapped into Rood mode; Guest OS runs in this mode. 37

38 Hardware Assistance Intel VT-X: System call: Trapped into Guest OS. Hardware Interrupt: Handled by VMM. Sensitive instructions: Trapped into Non-root mode. 38

39 Memory Virtualization Memory management in OS Traditionally, OS fully controls all physical memory space and provide a continuous addressing space to each process. In server virtualization, VMM should make all virtual machines share the physical memory space without knowing the fact. Goals of memory virtualization: Address Translation Control table-walking hardware that accesses translation tables in main memory. Memory Protection Define access permission which uses the Access Control Hardware. Access Attribute Define attribute and type of memory region to direct how memory operation to be handled. 39

40 Memory Virtualization Memory virtualization architecture: 40

41 Memory Virtualization The performance drop of memory access is usually unbearable. VMM needs further optimization. VMM maintains shadow page tables: Direct virtual-to-physical address mapping. Use hardware Translation Lookaside Buffer (TLB) for address translation. 41

42 Shadow Page Table Map guest virtual address to host physical address Shadow page table Guest OS will maintain its own virtual memory page table in the guest physical memory frames. For each guest physical memory frame, VMM should map it to host physical memory frame. Shadow page table maintains the mapping from guest virtual address to host physical address. Page table protection VMM will apply write protection to all the physical frames of guest page tables, which lead the guest page table write exception and trap to VMM. 42

43 Shadow Page Table Construct shadow page table: Guest OS will maintain its own page table for each process. VMM maps each guest physical page to host physical page. Create shadow page tables for each guest page table. VMM should protect host frame which contains guest page table. 43

44 Other Issues Page fault and page protection issue When a physical page fault occurs, VMM need to decide whether this exception should be injected to guest OS or not. If the page entry in the page table of guest OS is still valid, VMM should prepare the corresponding page and not inject any exception to guest OS. If the page entry in the page table of guest OS is invalid either, then VMM should directly inject the virtual page fault to guest OS. When guest OS want to modify its page tables, VMM need to intercept this operation. When guest OS reload page table, CPU will trap to VMM due to the Ring Compression nature. VMM will walk the page table of guest OS and modify the related shadow page table to make MMU get host physical address. 44

45 Other Virtualization Server Virtualization CPU Virtualization Memory Virtualization I/O Virtualization Storage Virtualization Network Virtualization 45

46 PaaS 46

47 IaaS is not Enough IaaS provides many virtual or physical machines, but it cannot alter the quantity automatically. Consumers might: Require automatic make-decisions of dispatching jobs to available resources. Need a running environment or a development and testing platform to design their applications or services. 47

48 More Requirements Consumers require more and more... Large-scale resource abstraction and management Requirement of large-scale resources on demand Running and hosting environment Automatic and autonomous mechanism Distribution and management of jobs Access control and authentication... 48

49 PaaS Platform as a Service (PaaS) is a computing platform that abstracts the infrastructure, OS, and middleware to drive developer productivity. 49

50 PaaS Deliver the computing platform as a service Developing applications using programming languages and tools supported by the PaaS provider. Deploying consumer-created applications onto the cloud infrastructure. 50

51 Core Platform PaaS providers can provide a runtime environment for the developer platform. Runtime environment is automatic control such that consumers can focus on their services Dynamic provisioning On-demand resource provisioning Load balancing Distribute workload evenly among resources Fault tolerance Continuously operating in the presence of failures System monitoring Monitor the system status and measure the usage of resources. 51

52 PaaS PaaS Venders 52

53 Hadoop Distributed File System GFS vs HDFS Distributed Data Processing Mapreduce 53

54 Motivation: Large Scale Data Processing Many tasks composed of processing lots of data to produce lots of other data. Large-Scale Data Processing Want to use 1000s of CPUs. But don t want hassle of managing things. Storage devices fail 1.7% year 1-8.6% year 3(Google, 2007) 10,000 nodes, 7disks per node, year1,1190 failures/yr or 3.3failures/day. 54

55 Example: in Astronomy SKA Square Kilometer Array ( 平 方公 里里阵列列望远镜 ) Investment: $2 billion Data volume: over 12TB per second. 55

56 Motivation Data processing system provides: User-defined functions. Automatic parallelization and distribution. Fault tolerance. I/O scheduling. Status and monitoring. 56

57 Google Cloud Computing GFS (Google File System): Large Scale Distributed File System Paper: Interpreting the Data: Parallel Analysis with Sawzall (2005) Paper: MapReduce: Simplified Data Processing on Large Clusters (2004) White paper: The Datacenter as a Computer. An Introduction to the Design of Warehouse-Scale Machines (2009) Web Search Programming Language Sawzall Parallel Programming Model MapReduce Distributed Database BigTable Distributed File System Google File System (GFS) Distributed lock System Application Service Log Analysis Chubby Gmail Datacenter Construction Google s platform Google Maps Paxos Paper: Bigtable: A Distributed Storage System for Structured Data (2006) Paper: The Google File System (2003) Paper: The Chubby lock service for looselycoupled distributed systems (2006) Paper: Failure Trends in a Large Disk Drive Population (2007) 57

58 Google Cloud Computing Google Technologies vs Open Source Technologies MapReduce BigTable Hadoop MapReduce HBase Google File System Hadoop Distributed File System Google s Technologies Open Source Technologies 58

59 Motivation: GFS Google needs a file system supporting storing massive data. Buy one (Probably including both software and hardware) Expensive! 59

60 Motivation: GFS Why not using the existing file system [2003]? Examples: RedHat GFS, IBM GPFS, Sun Lustre etc. The problem is different: Different workload Huge files (100s of GB to TB) Data is rarely updated in place Reads and appends are common Running on commodity hardware. Compatible with Google s service. 60

61 Whether could we build a file system which runs on commodity hardware? 61

62 Motivation: GFS Design overview: The system is built from many inexpensive commodity components that often fail. The system stores a modest number of large files. The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. The workloads also have many large, sequential writes that append data to files. The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. High sustained bandwidth is more important than low latency. 62

63 Google File System Design of GFS Client: Implement file system API, communicate with master and chunk servers. Master: A single master maintains all file system metadata. Chunk Server: store data chunks on local disk as linux files. 63

64 Google File System Minimize the master s involvement in all operations. Decouple the flow of data from the flow of control to use the network efficiently. A large chunk size: 64MB Reduces clients need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information. A client is more likely to perform many operations on a large chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunkserver over an extended period of time. Reduces the size of the metadata stored on the master. 64

65 Google File System Master operations: Namespace management and locking Replica placement By default, each chunk is replicated for 3 times. Creation, Re-replication, Rebalancing Garbage collection After a file is deleted, GFS does not immediately reclaim the available physical storage. Stale replica detection ( 过期的数据副本 ) The master removes stale replicas in its regular garbage collection. 65

66 HDFS The Apache Hadoop software library: A framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. HDFS (Hadoop Distributed File System): A distributed file system that provides high-throughput access to application data. Written in Java programming language. 66

67 HDFS Namenode Corresponding to GFS s Master. Secondary Namenode Namenode s backup. Datanode Corresponding to GFS s Chunkserver. 67

68 Hadoop Hadoop Cluster 68

69 Heartbeat DataNode periodically NameNode 1. I am alive; 2. Blocks table. 69

70 HDFS Read RPC 70

71 HDFS Read Network topology and Hadoop 71

72 HDFS Write 72

73 HDFS Write HDFS replica placement A trade-off between reliability and write bandwidth and read bandwidth here. Default strategy: The first replica on the same node as the client. (random when client is out of the system) The second replica is placed on a different rack from the first (off-rack), chosen at random. The third replica is placed on the same rack as the second, but on a different node chosen at random. 73

74 Mapreduce Jeff Dean 74

75 Motivation 75

76 Motivation Every search: 200+ CPU 200TB Data 0.1 second response time 5 revenue 76

77 Motivation Web data sets can be very large: Tens to hundreds of terabytes Cannot mine on a single server Data processing examples: Word Count Google Trends PageRank 77

78 Motivation Simple problem, difficult to solve: How to solve the problem within bounded time. Divide and Conquer! 78

79 Mapreduce Mapreduce: A programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of realworld tasks. Map: takes an input pair and produces a set of intermediate key/value pairs. Reduce: accepts an intermediate key I and a set of values for that key. It merges these values together to form a possibly smaller set of values. 79

80 Mapreduce 80

81 Mapreduce 81

82 Example: WordCount Input: Page 1: the weather is good Page 2: today is good Page 3: good weather is good 82

83 Example: WordCount map output: Worker 1: (the 1), (weather 1), (is 1), (good 1). Worker 2: (today 1), (is 1), (good 1). Worker 3: (good 1), (weather 1), (is 1), (good 1). 83

84 Example: WordCount Input of Reduce: Worker 1: (the 1) Worker 2: (is 1), (is 1), (is 1) Worker 3: (weather 1), (weather 1) Worker 4: (today 1) Worker 5: (good 1), (good 1), (good 1), (good 1) 84

85 Example: WordCount Reduce output: Worker 1: (the 1) Worker 2: (is 3) Worker 3: (weather 2) Worker 4: (today 1) Worker 5: (good 4) 85

86 86

87 clear 87

88 88

89 Hadoop Mapreduce Programming Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> Maps input key/value pairs to a set of intermediate key/value pairs. map() method: Called once for each key/value pair in the input split. Most applications should override this. 89

90 Hadoop Mapreduce Programming Context object: Allows the Mapper/Reducer to interact with the rest of the Hadoop system. It includes configuration data for the job as well as interfaces which allow it to emit output. Applications can use the Context: to report progress to set application-level status messages update Counters indicate they are alive to get the values that are stored in job configuration across map/reduce phase. 90

91 Hadoop Mapreduce Programming Class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> Reduces a set of intermediate values which share a key to a smaller set of values. Reducer has 3 primary phases: Shuffle: Copy the sorted output from each Mapper using HTTP across the network. Sort: Sort Reducer inputs by keys. Reduce reduce() method is called. 91

92 Hadoop Mapreduce Programming reduce() method: This method is called once for each key. Most applications will define their reduce class by overriding this method. 92

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University PaaS and Hadoop Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University laiping@tju.edu.cn 1 Outline PaaS Hadoop: HDFS and Mapreduce YARN Single-Processor Scheduling Hadoop Scheduling

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36 The Challenges of X86 Hardware Virtualization GCC- Virtualization: Rajeev Wankar 36 The Challenges of X86 Hardware Virtualization X86 operating systems are designed to run directly on the bare-metal hardware,

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing ΕΠΛ 602:Foundations of Internet Technologies Cloud Computing 1 Outline Bigtable(data component of cloud) Web search basedonch13of thewebdatabook 2 What is Cloud Computing? ACloudis an infrastructure, transparent

More information

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Google File System (GFS) and Hadoop Distributed File System (HDFS) Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear

More information

Virtualization. Pradipta De

Virtualization. Pradipta De Virtualization Pradipta De pradipta.de@sunykorea.ac.kr Today s Topic Virtualization Basics System Virtualization Techniques CSE506: Ext Filesystem 2 Virtualization? A virtual machine (VM) is an emulation

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

MapReduce. U of Toronto, 2014

MapReduce. U of Toronto, 2014 MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability

More information

BigData and Map Reduce VITMAC03

BigData and Map Reduce VITMAC03 BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to

More information

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Virtual Machines Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today's Topics History and benefits of virtual machines Virtual machine technologies

More information

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant. 24-vm.txt Mon Nov 21 22:13:36 2011 1 Notes on Virtual Machines 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Tannenbaum, 3.2 Barham, et al., "Xen and the art of virtualization,"

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 25 RAIDs, HDFS/Hadoop Slides based on Text by Silberschatz, Galvin, Gagne (not) Various sources 1 1 FAQ Striping:

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

CSE 124: Networked Services Lecture-16

CSE 124: Networked Services Lecture-16 Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

CSE 124: Networked Services Fall 2009 Lecture-19

CSE 124: Networked Services Fall 2009 Lecture-19 CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but

More information

Map Reduce. Yerevan.

Map Reduce. Yerevan. Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate

More information

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy COMPUTER ARCHITECTURE Virtualization and Memory Hierarchy 2 Contents Virtual memory. Policies and strategies. Page tables. Virtual machines. Requirements of virtual machines and ISA support. Virtual machines:

More information

Operating Systems 4/27/2015

Operating Systems 4/27/2015 Virtualization inside the OS Operating Systems 24. Virtualization Memory virtualization Process feels like it has its own address space Created by MMU, configured by OS Storage virtualization Logical view

More information

Distributed Systems COMP 212. Lecture 18 Othon Michail

Distributed Systems COMP 212. Lecture 18 Othon Michail Distributed Systems COMP 212 Lecture 18 Othon Michail Virtualisation & Cloud Computing 2/27 Protection rings It s all about protection rings in modern processors Hardware mechanism to protect data and

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Chapter 5 C. Virtual machines

Chapter 5 C. Virtual machines Chapter 5 C Virtual machines Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

The Google File System. Alexandru Costan

The Google File System. Alexandru Costan 1 The Google File System Alexandru Costan Actions on Big Data 2 Storage Analysis Acquisition Handling the data stream Data structured unstructured semi-structured Results Transactions Outline File systems

More information

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela

More information

The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014)

The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014) The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014) ManolisMarazakis (maraz@ics.forth.gr) Institute of Computer Science (ICS) Foundation

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems

More information

Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

A brief history on Hadoop

A brief history on Hadoop Hadoop Basics A brief history on Hadoop 2003 - Google launches project Nutch to handle billions of searches and indexing millions of web pages. Oct 2003 - Google releases papers with GFS (Google File System)

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels Virtualization Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels 1 What is virtualization? Creating a virtual version of something o Hardware, operating system, application, network, memory,

More information

Virtualization. Virtualization

Virtualization. Virtualization Virtualization Virtualization Memory virtualization Process feels like it has its own address space Created by MMU, configured by OS Storage virtualization Logical view of disks connected to a machine

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 27 Virtualization Slides based on Various sources 1 1 Virtualization Why we need virtualization? The concepts and

More information

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System GFS: Google File System Google C/C++ HDFS: Hadoop Distributed File System Yahoo Java, Open Source Sector: Distributed Storage System University of Illinois at Chicago C++, Open Source 2 System that permanently

More information

Introduction to Virtual Machines. Carl Waldspurger (SB SM 89 PhD 95) VMware R&D

Introduction to Virtual Machines. Carl Waldspurger (SB SM 89 PhD 95) VMware R&D Introduction to Virtual Machines Carl Waldspurger (SB SM 89 PhD 95) VMware R&D Overview Virtualization and VMs Processor Virtualization Memory Virtualization I/O Virtualization Typesof Virtualization Process

More information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance

More information

Virtualization and memory hierarchy

Virtualization and memory hierarchy Virtualization and memory hierarchy Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department

More information

The Google File System

The Google File System The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file

More information

Virtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized

Virtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized Starting Point: A Physical Machine Virtualization Based on materials from: Introduction to Virtual Machines by Carl Waldspurger Understanding Intel Virtualization Technology (VT) by N. B. Sahgal and D.

More information

Virtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization

Virtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization Starting Point: A Physical Machine Virtualization Based on materials from: Introduction to Virtual Machines by Carl Waldspurger Understanding Intel Virtualization Technology (VT) by N. B. Sahgal and D.

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 9 MapReduce Prof. Li Jiang 2014/11/19 1 What is MapReduce Origin from Google, [OSDI 04] A simple programming model Functional model For large-scale

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

Distributed System. Gang Wu. Spring,2018

Distributed System. Gang Wu. Spring,2018 Distributed System Gang Wu Spring,2018 Lecture7:DFS What is DFS? A method of storing and accessing files base in a client/server architecture. A distributed file system is a client/server-based application

More information

Learning Outcomes. Extended OS. Observations Operating systems provide well defined interfaces. Virtual Machines. Interface Levels

Learning Outcomes. Extended OS. Observations Operating systems provide well defined interfaces. Virtual Machines. Interface Levels Learning Outcomes Extended OS An appreciation that the abstract interface to the system can be at different levels. Virtual machine monitors (VMMs) provide a lowlevel interface An understanding of trap

More information

Hadoop Distributed File System(HDFS)

Hadoop Distributed File System(HDFS) Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization. Virtualization Basics Motivation OS Virtualization CSC 456 Final Presentation Brandon D. Shroyer Types of Virtualization Process virtualization (Java) System virtualization (classic, hosted) Emulation

More information

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Yuval Carmel Tel-Aviv University Advanced Topics in Storage Systems - Spring 2013 Yuval Carmel Tel-Aviv University "Advanced Topics in About & Keywords Motivation & Purpose Assumptions Architecture overview & Comparison Measurements How does it fit in? The Future 2 About & Keywords

More information

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018 Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018 Today s Papers Disco: Running Commodity Operating Systems on Scalable Multiprocessors, Edouard

More information

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per

More information

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila

Xen and the Art of Virtualization. Nikola Gvozdiev Georgian Mihaila Xen and the Art of Virtualization Nikola Gvozdiev Georgian Mihaila Outline Xen and the Art of Virtualization Ian Pratt et al. I. The Art of Virtualization II. Xen, goals and design III. Xen evaluation

More information

The amount of data increases every day Some numbers ( 2012):

The amount of data increases every day Some numbers ( 2012): 1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect

More information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute

More information

2/26/2017. The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012): The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to

More information

Module 1: Virtualization. Types of Interfaces

Module 1: Virtualization. Types of Interfaces Module 1: Virtualization Virtualization: extend or replace an existing interface to mimic the behavior of another system. Introduced in 1970s: run legacy software on newer mainframe hardware Handle platform

More information

云计算入门 Introduction to Cloud Computing GESC1001

云计算入门 Introduction to Cloud Computing GESC1001 Lecture #6 云计算入门 Introduction to Cloud Computing GESC1001 Philippe Fournier-Viger Professor School of Humanities and Social Sciences philfv8@yahoo.com Fall 2017 1 Introduction Last week: how cloud applications

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

CS 345A Data Mining. MapReduce

CS 345A Data Mining. MapReduce CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes

More information

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E. 18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File

More information

Programming Systems for Big Data

Programming Systems for Big Data Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There

More information

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services

More information

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [VIRTUALIZATION] Shrideep Pallickara Computer Science Colorado State University Difference between physical and logical

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

Google File System. Arun Sundaram Operating Systems

Google File System. Arun Sundaram Operating Systems Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

Virtualization. Dr. Yingwu Zhu

Virtualization. Dr. Yingwu Zhu Virtualization Dr. Yingwu Zhu Virtualization Definition Framework or methodology of dividing the resources of a computer into multiple execution environments. Types Platform Virtualization: Simulate a

More information

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

CS60021: Scalable Data Mining. Sourangshu Bhattacharya CS60021: Scalable Data Mining Sourangshu Bhattacharya In this Lecture: Outline: HDFS Motivation HDFS User commands HDFS System architecture HDFS Implementation details Sourangshu Bhattacharya Computer

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 15. Distributed File Systems Paul Krzyzanowski Rutgers University Fall 2017 1 Google Chubby ( Apache Zookeeper) 2 Chubby Distributed lock service + simple fault-tolerant file system

More information

Lessons Learned While Building Infrastructure Software at Google

Lessons Learned While Building Infrastructure Software at Google Lessons Learned While Building Infrastructure Software at Google Jeff Dean jeff@google.com Google Circa 1997 (google.stanford.edu) Corkboards (1999) Google Data Center (2000) Google Data Center (2000)

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 15. Distributed File Systems Paul Krzyzanowski Rutgers University Fall 2016 1 Google Chubby 2 Chubby Distributed lock service + simple fault-tolerant file system Interfaces File access

More information

Hedvig as backup target for Veeam

Hedvig as backup target for Veeam Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...

More information

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Cloud Computing Virtualization

Cloud Computing Virtualization Cloud Computing Virtualization Anil Madhavapeddy anil@recoil.org Contents Virtualization. Layering and virtualization. Virtual machine monitor. Virtual machine. x86 support for virtualization. Full and

More information

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm?

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm? Virtual Machines To do q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm? *Partially based on notes from C. Waldspurger, VMware, 2010 and Arpaci-Dusseau s Three

More information

Nested Virtualization and Server Consolidation

Nested Virtualization and Server Consolidation Nested Virtualization and Server Consolidation Vara Varavithya Department of Electrical Engineering, KMUTNB varavithya@gmail.com 1 Outline Virtualization & Background Nested Virtualization Hybrid-Nested

More information

System Virtual Machines

System Virtual Machines System Virtual Machines Outline Need and genesis of system Virtual Machines Basic concepts User Interface and Appearance State Management Resource Control Bare Metal and Hosted Virtual Machines Co-designed

More information