PaaS and Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

Similar documents
Cloud Computing Application I: Virtualization, Hadoop. Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University

Distributed Systems 16. Distributed File Systems II

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Lecture 11 Hadoop & Spark

Introduction to MapReduce

Introduction to MapReduce

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

MapReduce. U of Toronto, 2014

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

BigData and Map Reduce VITMAC03

Chapter 5. The MapReduce Programming Model and Implementation

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Distributed File Systems II

CSE 124: Networked Services Lecture-16

A BigData Tour HDFS, Ceph and MapReduce

CSE 124: Networked Services Fall 2009 Lecture-19

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

The Google File System

The Google File System

CA485 Ray Walshe Google File System

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Big Data for Engineers Spring Resource Management

CLOUD-SCALE FILE SYSTEMS

Distributed Filesystem

CS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab

Map Reduce. Yerevan.

Big Data 7. Resource Management

The Google File System

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

A Glimpse of the Hadoop Echosystem

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

The Google File System

Clustering Lecture 8: MapReduce

The Google File System. Alexandru Costan

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Map-Reduce. Marco Mura 2010 March, 31th

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

The Google File System (GFS)

The Google File System

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

GFS: The Google File System. Dr. Yingwu Zhu

Introduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.

HADOOP FRAMEWORK FOR BIG DATA

MI-PDB, MIE-PDB: Advanced Database Systems

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Hadoop MapReduce Framework

L5-6:Runtime Platforms Hadoop and HDFS

Google File System. By Dinesh Amatya

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

The Google File System

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

The Google File System

2/26/2017. For instance, consider running Word Count across 20 splits

Hadoop An Overview. - Socrates CCDH

HDFS: Hadoop Distributed File System. Sector: Distributed Storage System

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

Map Reduce & Hadoop Recommended Text:

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Google File System. Arun Sundaram Operating Systems

Chase Wu New Jersey Institute of Technology

CS November 2017

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

GFS: The Google File System

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Hadoop Distributed File System(HDFS)

Database Applications (15-415)

The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012):

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Introduction to Data Management CSE 344

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

Improving the MapReduce Big Data Processing Framework

A brief history on Hadoop

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

MapReduce & BigTable

Hadoop. copyright 2011 Trainologic LTD

Distributed Computation Models

CSE 444: Database Internals. Lecture 23 Spark

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Introduction to the Hadoop Ecosystem - 1

Distributed Systems CS6421

CS427 Multicore Architecture and Parallel Computing

Programming Models MapReduce

CS 345A Data Mining. MapReduce

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Databases 2 (VU) ( / )

MOHA: Many-Task Computing Framework on Hadoop

Distributed System. Gang Wu. Spring,2018

Transcription:

PaaS and Hadoop Dr. Laiping Zhao ( 赵来平 ) School of Computer Software, Tianjin University laiping@tju.edu.cn 1

Outline PaaS Hadoop: HDFS and Mapreduce YARN Single-Processor Scheduling Hadoop Scheduling Dominant-Resource Fair Scheduling 2

PaaS Platform as a Service (PaaS) is a computing platform that abstracts the infrastructure, OS, and middleware to drive developer productivity. 3

PaaS Deliver the computing platform as a service Developing applications using programming languages and tools supported by the PaaS provider. Deploying consumer-created applications onto the cloud infrastructure. 4

Core Platform PaaS providers can provide a runtime environment for the developer platform. Runtime environment is automatic control such that consumers can focus on their services Dynamic provisioning On-demand resource provisioning Load balancing Distribute workload evenly among resources Fault tolerance Continuously operating in the presence of failures System monitoring Monitor the system status and measure the usage of resources. 5

PaaS PaaS Venders 6

Hadoop Distributed File System GFS vs HDFS Distributed Data Processing Mapreduce 7

Motivation: Large Scale Data Processing Many tasks composed of processing lots of data to produce lots of other data. Large-Scale Data Processing Want to use 1000s of CPUs. But don t want hassle of managing things. Storage devices fail 1.7% year 1-8.6% year 3(Google, 2007) 10,000 nodes, 7disks per node, year1,1190 failures/yr or 3.3failures/day. 8

Example: in Astronomy SKA Square Kilometer Array ( 平 方公 里里阵列列望远镜 ) Investment: $2 billion Data volume: over 12TB per second. 9

Motivation Data processing system provides: User-defined functions. Automatic parallelization and distribution. Fault tolerance. I/O scheduling. Status and monitoring. 10

Google Cloud Computing GFS (Google File System): Large Scale Distributed File System Paper: Interpreting the Data: Parallel Analysis with Sawzall (2005) Paper: MapReduce: Simplified Data Processing on Large Clusters (2004) White paper: The Datacenter as a Computer. An Introduction to the Design of Warehouse-Scale Machines (2009) Web Search Programming Language Sawzall Parallel Programming Model MapReduce Distributed Database BigTable Distributed File System Google File System (GFS) Distributed lock System Application Service Log Analysis Chubby Gmail Datacenter Construction Google s platform Google Maps Paxos Paper: Bigtable: A Distributed Storage System for Structured Data (2006) Paper: The Google File System (2003) Paper: The Chubby lock service for looselycoupled distributed systems (2006) Paper: Failure Trends in a Large Disk Drive Population (2007) 11

Google Cloud Computing Google Technologies vs Open Source Technologies MapReduce BigTable Hadoop MapReduce HBase Google File System Hadoop Distributed File System Google s Technologies Open Source Technologies 12

Motivation: GFS Google needs a file system supporting storing massive data. Buy one (Probably including both software and hardware) Expensive! 13

Motivation: GFS Why not using the existing file system [2003]? Examples: RedHat GFS, IBM GPFS, Sun Lustre etc. The problem is different: Different workload Huge files (100s of GB to TB) Data is rarely updated in place Reads and appends are common Running on commodity hardware. Compatible with Google s service. 14

Whether could we build a file system which runs on commodity hardware? 15

Motivation: GFS Design overview: The system is built from many inexpensive commodity components that often fail. The system stores a modest number of large files. The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. The workloads also have many large, sequential writes that append data to files. The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. High sustained bandwidth is more important than low latency. 16

Google File System Design of GFS Client: Implement file system API, communicate with master and chunk servers. Master: A single master maintains all file system metadata. Chunk Server: store data chunks on local disk as linux files. 17

Google File System Minimize the master s involvement in all operations. Decouple the flow of data from the flow of control to use the network efficiently. A large chunk size: 64MB/128MB Reduces clients need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information. A client is more likely to perform many operations on a large chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunkserver over an extended period of time. Reduces the size of the metadata stored on the master. 18

Google File System Master operations: Namespace management and locking Replica placement By default, each chunk is replicated for 3 times. Creation, Re-replication, Rebalancing Garbage collection After a file is deleted, GFS does not immediately reclaim the available physical storage. Stale replica detection ( 过期的数据副本 ) The master removes stale replicas in its regular garbage collection. 19

HDFS The Apache Hadoop software library: A framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. HDFS (Hadoop Distributed File System): A distributed file system that provides high-throughput access to application data. Written in Java programming language. 20

HDFS Namenode Corresponding to GFS s Master. Secondary Namenode Namenode s backup. Datanode Corresponding to GFS s Chunkserver. 21

Hadoop Hadoop Cluster 22

Heartbeat DataNode periodically NameNode 1. I am alive; 2. Blocks table. 23

HDFS Read RPC 24

HDFS Read Network topology and Hadoop 25

HDFS Write 26

HDFS Write HDFS replica placement A trade-off between reliability and write bandwidth and read bandwidth here. Default strategy: The first replica on the same node as the client. (random when client is out of the system) The second replica is placed on a different rack from the first (off-rack), chosen at random. The third replica is placed on the same rack as the second, but on a different node chosen at random. 27

Mapreduce Jeff Dean 28

Motivation 29

Motivation Every search: 200+ CPU 200TB Data 0.1 second response time 5 revenue 30

Motivation Web data sets can be very large: Tens to hundreds of terabytes Cannot mine on a single server Data processing examples: Word Count Google Trends PageRank 31

Motivation Simple problem, difficult to solve: How to solve the problem within bounded time. Divide and Conquer! 32

Mapreduce Mapreduce: A programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of realworld tasks. Map: takes an input pair and produces a set of intermediate key/value pairs. Reduce: accepts an intermediate key I and a set of values for that key. It merges these values together to form a possibly smaller set of values. 33

Mapreduce 34

Mapreduce 35

Example: WordCount Input: Page 1: the weather is good Page 2: today is good Page 3: good weather is good 36

Example: WordCount map output: Worker 1: (the 1), (weather 1), (is 1), (good 1). Worker 2: (today 1), (is 1), (good 1). Worker 3: (good 1), (weather 1), (is 1), (good 1). 37

Example: WordCount Input of Reduce: Worker 1: (the 1) Worker 2: (is 1), (is 1), (is 1) Worker 3: (weather 1), (weather 1) Worker 4: (today 1) Worker 5: (good 1), (good 1), (good 1), (good 1) 38

Example: WordCount Reduce output: Worker 1: (the 1) Worker 2: (is 3) Worker 3: (weather 2) Worker 4: (today 1) Worker 5: (good 4) 39

40

clear 41

42

Hadoop Mapreduce Programming Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> Maps input key/value pairs to a set of intermediate key/value pairs. map() method: Called once for each key/value pair in the input split. Most applications should override this. 43

Hadoop Mapreduce Programming Context object: Allows the Mapper/Reducer to interact with the rest of the Hadoop system. It includes configuration data for the job as well as interfaces which allow it to emit output. Applications can use the Context: to report progress to set application-level status messages update Counters indicate they are alive to get the values that are stored in job configuration across map/reduce phase. 44

Hadoop Mapreduce Programming Class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> Reduces a set of intermediate values which share a key to a smaller set of values. Reducer has 3 primary phases: Shuffle: Copy the sorted output from each Mapper using HTTP across the network. Sort: Sort Reducer inputs by keys. Reduce reduce() method is called. 45

Hadoop Mapreduce Programming reduce() method: This method is called once for each key. Most applications will define their reduce class by overriding this method. 46

Apache Hadoop The project include these modules: Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop Distributed File System (HDFS) Hadoop MapReduce Other Hadoop-related projects at Apache include: Ambari Avro Cassandra (A scalable database) HBase (A distributed database) Hive (data summarization and ad hoc querying) Pig (data-flow language) Spark (A fast and general compute engine) Tez (execute an arbitrary DAG of tasks) Chukwa Zookeeper (coordination service) 47

What is YARN? Yet Another Resource Negotiator. Provides resource management services Scheduling Monitoring Control Replaces the resource management services of the JobTracker. Bundled with Hadoop 0.23 and Hadoop 2.x. 48

Why YARN? 49

Why YARN? Hadoop JobTracker was a barrier for scaling: Primary reason Hadoop 1.x is recommended for clusters no larger than 4000 nodes. Thousands of applications each running tens of thousands of tasks. JobTracker not able to schedule resources as fast as they became available. Distinct map and reduce slots led to artificial bottlenecks and low cluster utilization. 50

Why YARN? MapReduce was being abused by other application frameworks Frameworks trying to work around sort and shuffle. Iterative algorithms were suboptimal. YARN strives to be application framework agnostic ( 中 立的, 不不可知的 ). Different application types can share the same cluster. Runs MapReduce out of the box as part of Apache Hadoop. 51

YARN High-Level Architecture ResourceManager Single, centralized daemon for scheduling containers. Monitors nodes and applications. NodeManager Daemon running on each worker node in the cluster. Launches, monitors, and controls containers. ApplicationMaster Provides scheduling, monitor, control for an application instance. RM launches an AM for each application submitted to the cluster. AM requests containers via RM; launches containers via NM. Containers Unit of allocation and control for YARN. ApplicationMaster and application-specific tasks run within containers. 52

YARN High-Level Architecture 53

Mapreduce on YARN 54

Mapreduce on YARN 55

Mapreduce on YARN 56

Mapreduce on YARN 57

Mapreduce on YARN 58

Mapreduce on YARN 59

Mapreduce on YARN 60

Scheduling 61

Why Scheduling? Multiple tasks to schedule The processes on a single-core OS. The tasks of a Hadoop job. The tasks of multiple Hadoop jobs. Limited resources that these tasks require Processor(s) Memory (Less contentious) disk, network Scheduling goals 1. Good throughput or response time for tasks (or jobs) 2. High utilization of resources. 62

Single Processor Scheduling 63

FIFO Scheduling/FCFS Maintain tasks in a queue in order of arrival. When processor free, dequeue head and schedule it. 64

FIFO/FCFS Performance Average completion time may be high. For our example on previous slides. Average completion time of FIFO/FCFS = (Task 1 + Task 2 + Task 3)/3 = (10+15+18)/3 = 43/3 = 14.33 65

STF Scheduling (Shortest Task First) Maintain all tasks in a queue, in increasing order of running time. When processor free, dequeue head and schedule. 66

STF Is Optimal! Average completion of STF is the shortest among all scheduling approaches! For our example on previous slides, Average completion time of STF = (Task 1 + Task 2 + Task 3)/3 = (18+8+3)/3 = 29/3 = 9.66 (versus 14.33 for FIFO/FCFS) In general, STF is a special case of priority scheduling. Instead of using time as priority, scheduler could use user-provided priority. 67

Round-Robin Scheduling Use a quantum (say 1 time unit) to run portion of task at queue head. Pre-empts processes by saving their state, and resuming later. After pre-empting, add to end of queue. 68

Round-Robin vs. STF/FIFO Round-Robin preferable for: Interactive applications. User needs quick responses from system. FIFO/STF preferable for Batch applications User submits jobs, goes away, comes back to get result. 69

Hadoop Scheduling Activities: Mappers, Reducers Resources: Tasktracker Scheduling goals: Time efficiency Scheduler: Jobtracker (MRv.1)/RM (YARN). Default scheduling algorithm: FIFO 70

FIFO in Hadoop Support 5 levels of priorities. Tasks are sorted according to their priority and submission time. Step 1: Select from the list of tasks with highest priority. Step 2: Select the task with earliest submission time in the list. Assign the selected task to a task tracker nearest to target data. 71

FIFO in Hadoop Improve data locality to reduce communications. Same% node Same% rack Remote% rack 72

FIFO in Hadoop A later submitted short task may have to wait a very long time if the previously submitted task is quite time-consuming. The$job$queue: User$1 User$2 User$3 User$4 73

Hadoop Fair Scheduler Job Scheduling for Multi-User MapReduce Clusters, 2009. Delay Scheduling : A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling, M. Zaharia et al. Eurosys2010 74

Hadoop Fair Scheduler Design goals: Isolation Give each user (job) the illusion of owning (running) a private cluster. Statistical Multiplexing Redistribute capacity unused by some users (jobs) to other users (jobs). 75

Hadoop Fair Scheduler A two-level hierarchy. At the top level, FAIR allocates task slots across pools. Each pool will receive its minimum share. At the second level, each pool allocates its slots among multiple jobs in the pool. Pools 1 and 3 have minimum shares of 60, and 10 slots, respectively. Because Pool 3 is not using its share, its slots are given to Pool 2. Each user can decide his own internal scheduling algorithm (FIFO or Fair). 76

Hadoop Fair Scheduler d:#the#demand#capacity m:#the#minimum#share 77

Hadoop Fair Scheduler FAIR operates in three phases. Phase 1: It fills each unmarked bucket, i.e., it satisfies the demand of each bucket whose minimum share is larger than its demand. Phase 2: It fills all remaining buckets up to their marks. With this step, the isolation property is enforced as each bucket has received either its minimum share, or its demand has been satisfied. Phase 3: FAIR implements statistical multiplexing by pouring the remaining water evenly into unfilled buckets, starting with the bucket with the least water and continuing until all buckets are full or the water runs out. 78

Hadoop Fair Scheduler FAIR uses two timeouts: One for guaranteeing the minimum share (Tmin), One for guaranteeing the fair share (Tfair) Tmin < Tfair. If a newly started job does not get its minimum share before Tmin expires, FAIR kills other pools tasks and re-allocates them to the job. Then, if the job has not achieved its fair share by Tfair, FAIR kills more tasks. Pick the most recently launched tasks in over-scheduled jobs to minimize wasted computation. 79

Estimating Task Lengths HCS/HFS use FIFO May not be optimal (as we know!) Why not use shortest-task-first instead? It s optimal (as we know!) Challenge: Hard to know expected running time of task (before it s completed) Solution: Estimate length of task Some approaches Within a job: Calculate running time of task as proportional to size of its input Across tasks: Calculate running time of task in a given job as average of other tasks in that given job (weighted by input size) Lots of recent research results in this area! 80

Dominant-Resource Fair Scheduling Ali Ghodsi, Matei Zaharia, et al., Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. NSDI 2011 81

Challenge What about scheduling VMs in a cloud (cluster)? Jobs may have multi-resource requirements: Job 1 s tasks: 2 CPUs, 8 GB Job 2 s tasks: 6 CPUs, 2 GB How do you schedule these jobs in a fair manner? That is, how many tasks of each job do you allow the system to run concurrently? What does fairness even mean? 82

Dominant Resource Fairness (DRF) Proposed by researchers from U. California Berkeley. Proposes notion of fairness across jobs with multiresource requirements. They showed that DRF is: Fair for multi-tenant systems. Strategy-proof: tenant can t benefit by lying. Envy-free: tenant can t envy another tenant s allocations. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. NSDI 2011 83

Where is DRF Useful? DRF is: Usable in scheduling VMs in a cluster. Usable in scheduling Hadoop in a cluster. DRF used in Mesos, an OS intended for cloud environments. DRF-like strategies also used some cloud computing company s distributed OS s. 84

How DRF Works? Our example: Job 1 s tasks: 2 CPUs, 8 GB => Job 1 s resource vector = <2 CPUs, 8 GB> Job 2 s tasks: 6 CPUs, 2 GB => Job 2 s resource vector = <6 CPUs, 2 GB> Consider a cloud with <18 CPUs, 36 GB RAM>. 85

How DRF Works? Our example Job 1 s tasks: 2 CPUs, 8 GB => Job 1 s resource vector = <2 CPUs, 8 GB> Job 2 s tasks: 6 CPUs, 2 GB => Job 2 s resource vector = <6 CPUs, 2 GB> Consider a cloud with <18 CPUs, 36 GB RAM>. Each Job 1 s task consumes % of total CPUs = 2/18 = 1/9 Each Job 1 s task consumes % of total RAM = 8/36 = 2/9 1/9 < 2/9 => Job 1 s dominant resource is RAM, i.e., Job 1 is more memory- intensive than it is CPU-intensive. 86

How DRF Works? Our example Job 1 s tasks: 2 CPUs, 8 GB => Job 1 s resource vector = <2 CPUs, 8 GB> Job 2 s tasks: 6 CPUs, 2 GB => Job 2 s resource vector = <6 CPUs, 2 GB> Consider a cloud with <18 CPUs, 36 GB RAM> Each Job 2 s task consumes % of total CPUs = 6/18 = 6/18 Each Job 2 s task consumes % of total RAM = 2/36 = 1/18 6/18 > 1/18 => Job 2 s dominant resource is CPU, i.e., Job 2 is more CPUintensive than it is memory-intensive. 87

DRF Fairness For a given job, the % of its dominant resource type that it gets cluster-wide, is the same for all jobs: Job1 s % of RAM = Job2 s % of CPU Can be written as linear equations, and solved. 88

DRF Solution DRF Ensures Job1 s % of RAM = Job2 s % of CPU Solution for our example: Job 1 gets 3 tasks each with <2 CPUs, 8 GB> Job 2 gets 2 tasks each with <6 CPUs, 2 GB> Job1 s % of RAM: = Number of tasks * RAM per task / Total cluster RAM = 3*8/36 = 2/3 Job2 s % of CPU: = Number of tasks * CPU per task / Total cluster CPUs = 2*6/18 = 2/3 89

Other DRF Details DRF generalizes to multiple jobs. DRF also generalizes to more than 2 resource types CPU, RAM, Network, Disk, etc. DRF ensures that each job gets a fair share of that type of resource which the job desires the most. Hence fairness. 90

Summary Scheduling very important problem in cloud computing Limited resources, lots of jobs requiring access to these jobs Single-processor scheduling FIFO/FCFS, STF, Priority, Round-Robin Hadoop scheduling FIFO scheduler, Fair scheduler Dominant-Resources Fairness. 91