CS370 Operating Systems

Similar documents
CS370 Operating Systems

CS370 Operating Systems

CS370 Operating Systems

CS370 Operating Systems

Chapter 5 C. Virtual machines

Final Review Spring 2018 Also see Midterm Review

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Distributed Systems COMP 212. Lecture 18 Othon Michail

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Spring 2016] Dept. Of Computer Science, Colorado State University

CS 470 Spring Virtualization and Cloud Computing. Mike Lam, Professor. Content taken from the following:

Virtual Machines. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Virtualization. ...or how adding another layer of abstraction is changing the world. CIS 399: Unix Skills University of Pennsylvania.

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

CS370 Operating Systems

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Introduction to Virtual Machines. Carl Waldspurger (SB SM 89 PhD 95) VMware R&D

Hadoop and HDFS Overview. Madhu Ankam

Virtualization. Pradipta De

Operating Systems 4/27/2015

CS370 Operating Systems

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

Lecture 11 Hadoop & Spark

Distributed Systems 16. Distributed File Systems II

Chapter 11: File System Implementation. Objectives

Module 1: Virtualization. Types of Interfaces

CS5460: Operating Systems Lecture 20: File System Reliability

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

CHAPTER 16 - VIRTUAL MACHINES

CSE 153 Design of Operating Systems

CA485 Ray Walshe Google File System

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

V. Mass Storage Systems

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

CS370 Operating Systems

CS370 Operating Systems

Chapter 10: Mass-Storage Systems

MI-PDB, MIE-PDB: Advanced Database Systems

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

The Architecture of Virtual Machines Lecture for the Embedded Systems Course CSD, University of Crete (April 29, 2014)

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

Distributed Computation Models

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Nested Virtualization and Server Consolidation

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Virtual Machines. To do. q VM over time q Implementation methods q Hardware features supporting VM q Next time: Midterm?

Virtualization. Dr. Yingwu Zhu

The Google File System. Alexandru Costan

CS 537 Fall 2017 Review Session

CLOUD-SCALE FILE SYSTEMS

Virtualization. Virtualization

Lecture 5: February 3

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

File systems CS 241. May 2, University of Illinois

Virtualization. Starting Point: A Physical Machine. What is a Virtual Machine? Virtualization Properties. Types of Virtualization

Virtualization. ! Physical Hardware Processors, memory, chipset, I/O devices, etc. Resources often grossly underutilized

CHAPTER 16 - VIRTUAL MACHINES

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Next-Generation Cloud Platform

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Virtualization. join, aggregation, concatenation, array, N 1 ühendamine, agregeerimine, konkateneerimine, massiiv

Cloud Networking (VITMMA02) Server Virtualization Data Center Gear

Distributed Filesystem

Lecture 18: Reliable Storage

Dept. Of Computer Science, Colorado State University

Virtualization Introduction

Service and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838

CS370 Operating Systems

Introduction to Virtualization. From NDG In partnership with VMware IT Academy

Virtualization. Guillaume Urvoy-Keller UNS/I3S

CS370 Operating Systems

Learning Outcomes. Extended OS. Observations Operating systems provide well defined interfaces. Virtual Machines. Interface Levels

Distributed File Systems II

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

CSC Operating Systems Spring Lecture - XIX Storage and I/O - II. Tevfik Koşar. Louisiana State University.

RAID Structure. RAID Levels. RAID (cont) RAID (0 + 1) and (1 + 0) Tevfik Koşar. Hierarchical Storage Management (HSM)

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

MapR Enterprise Hadoop

MANAGEMENT OF TASKS IN VIRTUALIZED ENVIRONMENTS

UNIT-IV HDFS. Ms. Selva Mary. G

Oracle-Regular Oracle-Regular

Virtualization and memory hierarchy

CS370 Operating Systems

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

CS370 Operating Systems

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

A brief history on Hadoop

Hadoop/MapReduce Computing Paradigm

Xen and CloudStack. Ewan Mellor. Director, Engineering, Open-source Cloud Platforms Citrix Systems

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Distributed System. Gang Wu. Spring,2018

HADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!

Triton file systems - an introduction. slide 1 of 28

How CloudEndure Disaster Recovery Works

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Transcription:

CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 25 RAIDs, HDFS/Hadoop Slides based on Text by Silberschatz, Galvin, Gagne (not) Various sources 1 1

FAQ Striping: Disks can be accessed in parallel. Example RAID 1. Higher performance and capacity but not higher reliability Parity: extra bit, which is redundant (based on actual information bits). Can be used for error detection or correction provided bad block is identified. Block vs file. Can a 1 TB disk be mirrored by two 500 GB disks? Is removing soft links easier? Softlink is a small file that contains a pointer. 2

CS370 Operating Systems Colorado State University Yashwant K Malaiya Reliability & RAIDs Various sources 3 3

Reliability Storage is inherently unreliable. How can it be made more reliable? Redundancy Complete mirrors of data: 2, 3 or more copies. Use a good copy when there is failure. Additional bits: Use parity. Use parity to reconstruct corrupted data Rollback and retry Go back to previously saved known good state and recompute. 4

RAID Structure RAID redundant array of inexpensive disks multiple disk drives provides Higher reliability Higher performance /storage capacity A combination Increases the mean time to failure Mean time to repair exposure time when another failure could cause data loss Mean time to data loss based on above factors If mirrored disks fail independently, consider disk with 100,000 hour mean time to failure and 10 hour mean time to repair Mean time to data loss is 100, 000 2 / (2 10) = 500 10 6 hours, or 57,000 years! Inverse of probability that the two will fail within 10 hours 5

RAID Techniques Striping uses multiple disks in parallel by splitting data: higher performance, no redundancy (ex. RAID 0) Mirroring keeps duplicate of each disk: higher reliability (ex. RAID 1) Block parity: One Disk hold parity block for other disks. A failed disk can be rebuilt using parity. Wear leveling if interleaved (RAID 5, double parity RAID 6). Ideas that did not work: Bit or byte level level striping (RAID 2, 3) Bit level Coding theory (RAID 2), dedicated parity disk (RAID 4). Nested Combinations: RAID 01: Mirror RAID 0 RAID 10: Multiple RAID 1, striping RAID 50: Multiple RAID 5, striping others 6

RAID Replicate data for availability RAID 0: no replication RAID 1: mirror data across two or more disks Google File System replicated its data on three disks, spread across multiple racks RAID 5: split data across disks, with redundancy to recover from a single disk failure RAID 6: RAID 5, with extra redundancy to recover from two disk failures 7

RAID 1: Mirroring Replicate writes to both disks Reads can go to either disk 8

Parity Parity block: Block1 xor block2 xor block3 10001101 block1 01101100 block2 11000110 block3 -------------- 00100111 parity block (ensures even number of 1s) Can reconstruct any missing block from the others 9

Parity Exercise Parity block: Block1 xor block2 xor block3 10001101 block1???????? block2 (bad) 11000110 block3 -------------- 00100111 parity block (ensures even number of 1s) Can you reconstruct the bad block using the others? 10

11 RAID 5: Rotating Parity

Read Errors and RAID recovery Example: RAID 5 10 one-tb disks, and 1 fails Read remaining disks to reconstruct missing data Probability of an error in reading 9 TB disks = 10-15 *(9 disks * 8 bits * 10 12 bytes/disk) = 7.2% Thus recovery probability = 92.8% Even better: RAID-6: two redundant disk blocks Can work even in presence of one bad disk Scrubbing: read disk sectors in background to find and fix latent errors 14

CS370 Operating Systems Colorado State University Yashwant K Malaiya Big Data: Hadoop HDFS and mapreduce 16 Various sources 16

Hadoop: Distributed Framework for Big Data Recent development. Not in the book. Distributed file system Distributed execution 18

Hadoop: Core components Hadoop (originally): MapReduce + HDFS MapReduce: A programming framework for processing parallelizable problems across huge datasets using a large number of commodity machines. HDFS: A distributed file system designed to efficiently allocate data across multiple commodity machines, and provide self-healing functions when some of them go down Commodity machines: lower performance per machine, lower cost, perhaps lower reliability compared with special high performance machines. RAID (redundant array of inexpensive disks): using multiple disks in a single enclosure for achieving higher reliability and/or higher performance. 19

Challenges in Distributed Big Data Common Challenges in Distributed Systems Node Failure: Individual computer nodes may overheat, crash, experience hard drive failures, or run out of memory or disk space. Network issues: Congestion/delays (large data volumes), Communication Failures. Bad data: Data may be corrupted, or maliciously or improperly transmitted. Other issues: Multiple versions of client software may use slightly different protocols from one another. Security 20

HDFS Architecture HDFS Block size: 64-128 MB ext4: 4KB HDFS file size: Big Single HDFS FS cluster can span many nodes possibly geographically distributed. datacenters-racks-blades Node: system with CPU and memory Metadata (corresponding to superblocks, Inodes) Name Node: metadata, where blocks are physically located Data (files blocks) Data Nodes: hold blocks of files (files are distributed) 21

HDFS Architecture http://a4academics.com/images/hadoop/hadoop-architecture-read-write.jpg 22

HDFS Write operation CERN 23

HDFS Fault-tolerance Disks use error detecting codes to detect corruption. Individual node/rack may fail. Data Nodes (on slave nodes): data is replicated. Default is 3 times. Keep a copy far away. Send periodic heartbeat (I m OK) to Name Nodes. Perhaps once every 10 minutes. Name node creates another copy if no heartbeat. 24

HDFS Fault-tolerance Name Node (on master node) Protection: Transaction log for file deletes/adds, etc (only metadata recorded). Creation of more replica blocks when necessary after a DataNode failure Standby name node: namespace backup In the event of a failover, the Standby will ensure that it has read all of the edits from the Journal Nodes and then promotes itself to the Active state Implementation/delay version dependent Name Node metadata is in RAM as well as checkpointed on disk. On disk the state is stored in two files: fsimage: Snapshot of file system metadata editlog: Changes since last snapshot 25

HDFS Command line interface hadoop fs help hadoop fs ls : List a directory hadoop fs mkdir : makes a directory in HDFS hadoop fs rm : Deletes a file in HDFS copyfromlocal : Copies data to HDFS from local filesystem copytolocal : Copies data to local filesystem Java code can read or write HDFS files (URI) directly https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/filesystemshell.html 26

Distributing Tasks MapReduce Engine: JobTracker splits up the job into smaller tasks( Map ) and sends it to the TaskTracker process in each node. TaskTracker reports back to the JobTracker node and reports on job progress, sends partial results ( Reduce ) or requests new jobs. Tasks are run on local data, thus avoiding movement of bulk data. Originally developed for search engine implementation. 27

Hadoop Ecosystem Evolution 28 Hadoop YARN: A framework for job scheduling and cluster resource management, can run on top of Windows Azure or Amazon S3. Apache spark is more general, faster and easier to program than map Reduce. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Berkeley, 2012

CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture T Virtualization Slides based on Various sources 29 29

Virtualization Why we need virtualization? The concepts and terms Brief history of virtualization Types of virtualization Implementation Issues 31

32 Virtualization

Virtualization Processors have gradually become very powerful Dedicated servers can be very underutilized (5-15%) Virtualization allow a single server to support several virtualized servers: typical consolidation ratio 6:1 Power cost a major expense for data centers Companies frequently locate their data centers in the middle of nowhere where power cost is low If a hardware server crashes, would be nice to migrate the load to another one. A key component of cloud computing 33

Virtual Machines (VM) Virtualization technology enables a single PC/server to simultaneously run multiple operating systems or multiple sessions of a single OS. A machine with virtualization can host many applications, including those that run on different operating systems, on a single platform. The host operating system can support a number of virtual machines, each of which has the characteristics of a particular OS. The software that enables virtualization is a virtual machine monitor (VMM), or hypervisor. 34

Virtual Machines (VM) Traditional physical machine Hypervisor with virtual machines 35

Virtualization Hypervisor based Full virtualization Para virtualization Host OS virtualization Container system Environment virtualization Terminology Java virtual machine, Dalvic virtual machine Software simulation of hardware/isa Android JDK SoftPC etc. Emulation using microcode 36

Brief history Early 1960s IBM experimented with two independently developed hypervisors - SIMMON and CP-40 In 1974, Gerald Popek and Robert Goldberg published a paper which listed what conditions a computer architecture should satisfy to support virtualization efficiently Privileged instructions: Those that trap if the processor is in user mode and do not trap if it is in system mode (supervisor mode). Sensitive instructions: that attempt to change the configuration of resources in the system or whose behavior or result depends on the configuration of resources Theorem 1. For any conventional third-generation computer, an effective VMM may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions. The x86 architecture that originated in the 1970s did not meet these for requirements for decades. 37

Brief history Stanford researchers developed a new hypervisor and then founded VMware first virtualization solution for x86 in 1999 Linux, windows Others followed Xen, 2003 University of Cambridge, Xen Project community KVM, 2012 VirtualBox (Innotek GmbH/Sun/Oracle), 2007 Hyper-V (Microsoft), 2008 38

Implementation of VMMs Type 1 hypervisors - Operating-system-like software built to provide virtualization. Runs on bare metal. Including VMware ESX, Joyent SmartOS, and Citrix XenServer Also includes general-purpose operating systems that provide standard functions as well as VMM functions Including Microsoft Windows Server with HyperV and RedHat Linux with KVM Type 2 hypervisors - Applications that run on standard operating systems but provide VMM features to guest operating systems Includiing VMware Workstation and Fusion, Parallels Desktop, and Oracle VirtualBox 39

Implementation of VMMs https://microkerneldude.files.wordpress.com/2012/01/type1-vs-2.png 40

Market share All 3 are Type 1 http://www.virtualizationsoftware.com/top-5-enterprise-type-1-hypervisors/ 41

User mode and Kernel (supervisor) mode Special instructions: Depending on whether it is executed in kernel/user mode Sensitive instructions Some instructions cause a trap when executed in usermode Privileged instructions A machine is virtualizable only if sensitive instructions are a subset of privileged instructions Intel s 386 did not always do that. Several sensitive 386 instructions were ignored if executed in user mode. Fixed in 2005 Intel CPUs: VT (Virtualization Technology) AMD CPUs: SVM (Secure Virtual Machine) 42

Virtualization support Create containers in which VMs can be run When a guest OS is started in a container, continues to run until it causes an exception and traps to the hypervisor For e.g. by executing an I/O instruction Set of operations that trap is controlled by a hardware bit map set by hypervisor trap-and-emulate approach becomes possible 43

Implementation of VMMs What problems do you see? What mode does hypervisor run in? Guest OSs? Are Guest OSs aware of hypervisor? How is memory managed? How do we know what is the best choice? 44

Guest Operating System Terms The OS running on top of the hypervisor Host Operating System For a type 2 hypervisor: the OS that runs on the hardware executions 45