Partitioning Computers. Rolf M Dietze,

Similar documents
IBM p5 and pseries Enterprise Technical Support AIX 5L V5.3. Download Full Version :

iseries Tech Talk Linux on iseries Technical Update 2004

Vendor: IBM. Exam Code: C Exam Name: Power Systems with POWER8 Scale-out Technical Sales Skills V1. Version: Demo

System i and System p. Creating a virtual computing environment

OPENSPARC T1 OVERVIEW

IBM's POWER5 Micro Processor Design and Methodology

BMC Capacity Optimization Extended Edition AIX Enhancements

CS370 Operating Systems

Live Partition Mobility Update

Planning for Virtualization on System P

CS 152 Computer Architecture and Engineering

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group

Power Systems with POWER8 Scale-out Technical Sales Skills V1

IBM p5 Virtualization Technical Support AIX 5L V5.3. Download Full Version :

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Computer-System Organization (cont.)

WHY PARALLEL PROCESSING? (CE-401)

Multiprocessor Systems. COMP s1

Planning and Sizing Virtualization. Jaqui Lynch Architect, Systems Engineer Mainline Information Systems

CS 152 Computer Architecture and Engineering

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

CS 152 Computer Architecture and Engineering

IBM EXAM QUESTIONS & ANSWERS

System i and System p. Capacity on Demand

IBM _` p5 570 servers

Multiprocessor Systems. Chapter 8, 8.1

June IBM Power Academy. IBM PowerVM memory virtualization. Luca Comparini STG Lab Services Europe IBM FR. June,13 th Dubai

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multithreaded Processors. Department of Electrical Engineering Stanford University

What SMT can do for You. John Hague, IBM Consultant Oct 06

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

CPU Scheduling: Objectives

February 5, 2008 Virtualization Trends On IBM s System P Unraveling The Benefits In IBM s PowerVM

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Performance Impacts of Using Shared ICF CPs

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 06: Multithreaded Processors

Chapter 18 Parallel Processing

CS425 Computer Systems Architecture

Lecture 9: MIMD Architectures

Virtualization Benefits IBM Corporation

CS 326: Operating Systems. CPU Scheduling. Lecture 6

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

IBM Exam A Virtualization Technical Support for AIX and Linux Version: 6.0 [ Total Questions: 93 ]

IBM Power Systems Technical Support for AIX and Linux. Download Full Version :

IBM SYSTEM POWER7. PowerVM. Jan Kristian Nielsen Erik Rex IBM Corporation

pseries Partitioning AIX 101 Jaqui Lynch USERblue 2004

WebSphere. Virtual Enterprise Version Virtualization and WebSphere Virtual Enterprise Version 6.1.1

p5 520 server Robust entry system designed for the on demand world Highlights

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

For use by students enrolled in #71251 CSE430 Fall 2012 at Arizona State University. Do not use if not enrolled.

IBM PowerVM. Virtualization without limits. Highlights. IBM Systems and Technology Data Sheet

CEC 450 Real-Time Systems

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Operating Systems: Internals and Design Principles. Chapter 2 Operating System Overview Seventh Edition By William Stallings

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

The Power of PowerVM Power Systems Virtualization. Eyal Rubinstein

CSE502: Computer Architecture CSE 502: Computer Architecture

Multi-Processor / Parallel Processing

Software Editions. Systems Software leadership for innovation. Mike Perry IT Specialist IBM Systems & Technology Group.

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

IBM Power Systems Sales for the IBM I Operating System. Download Full Version :

HMC ENHANCED GUI QUICK START GUIDE 1.0 Classic GUI to Enhanced GUI Mappings and Enhanced GUI Improvements

Threads, SMP, and Microkernels

Lecture 9: MIMD Architecture

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

IBM i5 iseries Technical Solutions Designer V5R3. Download Full Version :

How much energy can you save with a multicore computer for web applications?

Planning and Sizing Virtualization

Niagara-2: A Highly Threaded Server-on-a-Chip. Greg Grohoski Distinguished Engineer Sun Microsystems

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019

OPERATING SYSTEM. Chapter 9: Virtual Memory

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

AIX Power System Assessment

Handout 2 ILP: Part B

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

Application and Partition Mobility

CS370 Operating Systems

theguard! ApplicationManager System AIX Data Collector

Chapter 2. OS Overview

Chapter 9: Virtual Memory. Operating System Concepts 9 th Edition

Advanced Processor Architecture

IBM POWER8 100 GigE Adapter Best Practices

Utility Capacity on Demand: What Utility CoD Is and How to Use It

Threads, SMP, and Microkernels. Chapter 4

CS370 Operating Systems

Computer Organization

Introduction. What is an Operating System? A Modern Computer System. Computer System Components. What is an Operating System?

Lecture 9: MIMD Architectures

Simultaneous Multithreading on Pentium 4

Memory. Objectives. Introduction. 6.2 Types of Memory

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Transcription:

Partitioning Computers Rolf M Dietze rolf.dietze@dietze-consulting.de

Consolidation/Partitioning

Hardware Partitioning

Resource Direction

Resource Shareing

Soft Partitioning

Hypervisor

Classifying Server Partitioning Hardware Partitioning Applications Linux Software Partitioning Logical Partitioning Applications Apps Apps Apps Apps Apps Apps Windows Linux Windows Windows Linux z/os VSE/ESA Partitioning Software Partitioning Firmware CPU 1 CPU 2 CPU 3 CPU 4 CPU 1 CPU 2 CPU 3 CPU 4 CPU 1 CPU 2 CPU 3 CPU 4

Virtualisation for Partitioning

Advanced Virtualization Technologies Copyright IBM Corporation 2005

IVM/VIOS

HMC + VIOS

Capacity on Demand Business Peaks Planned Growth Permanent Capacity: CUoD pay when purchased (processors & memory) Temporary Capacity: On/Off CoD pay after use (processors & memory) Reserve Capacity: Trial Capacity: CoD pay before use (processors) CoD no-charge for use (processors & memory) Copyright IBM Corporation 2005

i5/os Dynamic Logical Partitioning New POWER Hypervisor for ~ i5 supports i5/os, AIX 5L and Linux and up to 254 partitions Up to 10 Partition per processor Increase server utilization rates across multiple workloads Dynamic resource movement Automatic processor balancing with uncapped partitions Copyright IBM Corporation 2005

Der Weg zur logischen CPU

Scheduling Processes

Spreading to Containers

MP Konzepte SMP: Symmetric MultiProcessing NUMA: Non-Uniform Memory Access Hardwaregesteuerte Cachecoherence COMA: Cache Only Memory Architecture variable RAM-Zugriffslatenz CC-Numa: Cache Coherent NUMA uniformer Zugriff aller CPUs auf RAM+I/O Hardwaregesteuerte Replikation und Coherence S-COMA: Simple-COMA Software-Replikation, Hardware-Coherence

SMP Topology Restrictions The bottleneck of the Von-Neumann Architecture

Die 80-20 Regel gilt auch für CPUs Relative Performance Per Core 1.0 0.5 0.5X 1X 10% 100% Core Size vs. Die Usage Copyright Sun Microsystems

Die Idee des Chip Multithreading 100% 100% x 1 = 1x 10% 100% 50% x 10 = 5x Copyright Sun Microsystems

Compute- vs. Memorycycle

Thread CPU Model

UltraSPARC T1 Pipeline Diagramm Fetch Thrd Sel Decode Execute Memory WB Regfile x4 Inst buf x 4 ICache Itlb Thread selects Thrd Sel Mux Thrd Sel Mux Decode Thread select logic Alu Mul Shft Div DCache Dtlb Stbuf x 4 Crossbar Interface Instruction type misses traps & interrupts resource conflicts PC logic x4 Copyright Sun Microsystems

Timeslice CPU Model

Multithreading Evolution Single thread Out of Order S80 Hardware Multi-thread FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL POWER5 2 Way SMT POWER7 4 Way SMT FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL No Thread Executing Thread 2 Executing Thread 0 Executing Thread 1 Executing Copyright IBM Thread 3 Executing

Understanding Shared Processors To understand Processing Units there are four main concepts 1. One single processor is equivalent to 1.00 Processing Units, 2 Processors = 2.00 Processing Units, etc. 0.5 processing units is NOT same as half a processor. 2. Shared processor pool. A processor must live in the shared processor pool (now the default) to become Processing units. 3. Virtual Processor how many processors do you want the partition to be able to use (run jobs/threads on) simultaneously. It s also the number of processors that the operating system thinks it has to use. Copyright IBM

10 Milliseconds Time Slice 4. The iseries processors run on 10 ms time slices 10 milliseconds 10 ms 10 ms 10 ms 10 ms Each Processor use is allocated within a 10 ms Cycle A partition s use of a processor is limited to its allocation during each 10 ms cycle. For example, 80% of a processor (.80 Processing Units) yields up to 8 ms of processing time out of this 10 ms time slice. It also yields.8 X CPW rating of the processor. Every 10 ms this cycle repeats itself. Copyright IBM

How Does a Job Get Into the Processor? 10 milliseconds 10 ms 10 ms 10 ams 10 ms a Cache a For a job to run it has to get it s data from DISK into memory, into cache, and finally into the processor. Memory Slowest - disk Fastest - cache Disk a Even if the data is already in memory it has to be loaded into cache to run this takes time. Copyright IBM

Example of Two Partitions Sharing a Processor ( capped ) 10 milliseconds 10 ms 1 a (0) 10 ms 1 10 ms 1 (1) b (2) cache a completes ba b 1 (3) 1 b released Partition dog jobs a,b,c allocated.6 Processing Units Partition cat jobs 1,2,3 allocated.4 Processing Units Copyright IBM

Potential Shared Processor Penalty There is a potential for a performance penalty (from 0 to 10%) when using shared processors, due to: Increasing the possibility that jobs won t complete, and Having to be redispatch and potentially reload cache, and Increasing the chance of a cache miss Reduce the chance for processor and memory affinity The POWER Hypervisor overhead of: Managing multiple processors Tracking each partitions use of its allotted milliseconds Managing time slices from multiple partitions All of the above are affected by how you allocate your virtual processors next couple of foils Copyright IBM

Desired Minimum/Maximum Processing Units How about 0.2 Processing Units Minimum of.1 Maximum of 2.00 Select Advanced Copyright IBM

Capped Partitions First time through we select Capped You can t have less than.10 processing units per virtual processor I ll allocate two virtuals for my.2 PUs What s a virtual processor? Copyright IBM

Introduction to Virtual Processors For every 10 milliseconds of wall clock time each processor in the shared pool is capable of 10 milliseconds of processing time If you give Partition x.5 processing units it could use (up to) 5 milliseconds of processing time capped. (more on capped soon) But you have ABSOLUTELY no control over which processors your jobs/threads run on All you CAN control is how many of the processors in the pool, your jobs/threads do run on (potentially) simultaneously, via Virtual Processors = 80 milliseconds Copyright IBM

Virtual Processors - Capped b b Job b a P1 1.5 processing unit default of 2 virtual processors max of 15 milliseconds capped. Each job potentially could get 7.5 milliseconds (15/2 = 7.5) b e c a f d P2 1.5 processing unit but using 6 virtual processors max of 15 milliseconds capped. But if all 6 Jobs ran at same time each may Copyright IBM get no more than 2.5 milliseconds per job. (15/6 = 2.5)

Uncapped - Introduction As of IBM eserver i5/os, and POWER5-based servers, it is now possible, by using the uncapped mode, to use more milliseconds than are allocated to a partition. An uncapped partition can use excess pool Processing Units. But even an uncapped partition could still be limited by setting the number of virtual processors too low. The number of processors it can use simultaneously is still limited by the number of virtual processors assigned. As of Power5 it s also possible to allocate more virtual processors than there are processors in the pool, since the actual number of processors in the pool is a floating number. However, you still cannot allocate less than 1 ms (.10 PUs) per processor per job (virtual processor). For example,.5 PUs and 6 virtuals is a dog that doesn t hunt. 5 (milliseconds)/6 (jobs) < 1 milliseconds per job. Copyright IBM

Uncapped - Configuring This time we deal with uncapped You can t have any less than.10 processing units per virtual processor Allocate two virtuals for my.2 PUs Select OK Copyright IBM

Virtual Processors (Limited) Uncapped b a b c e a f d P1 1.5 processing unit default of P2 1.5 processing unit 2 virtual processors max of 6 virtual processors max of 20 milliseconds uncapped because you 60 milliseconds uncapped Copyright IBM are limited to only use 2 Processors simultaneously

Example of Two Partitions Sharing a Processor ( uncapped ) 10 milliseconds 1 a (0) 10 ms 1 1 10 ms 1 1 (1) (2) cache a completes ba b 10 ms 1 1 1 (3) 1 b released b does an I/O Partition dog jobs a,b,c allocated.6 Processing Units Copyright IBM Partition cat jobs 1,2,3 allocated.4 Processing Units

iseries uncapped shared pool with CoD Shared Processor Pool mit 3 Prozessoren 120% 30% LPAR 1 (uncapped) 1 2 3 4 5 6 7 8 If workload demands < 0,4 CPUs, LPAR1 frees unused resources to PHYP 120% 1 2 3 4 5 6 7 8 70% 120% CE= 1,0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 10ms = 1 Dispatch Interval 1 2 3 4 5 6 7 8 65% 70% vp vp 60% 110% CE= 0,6 85% LPAR 3 (uncapped) vp vp vp 70% LPAR 2 (uncapped) Optimized capacity of 0.6 processors for LPAR3 CE= 0,4 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 55% Optimized capacity of an entire processor for LPAR 2 vp vp vp 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Ressources can be requested by any partition Unused resources can be released Priorities can be assigned Unused resources CPUs/MEM will automatically be used to solve failures in a running operating environment Optimized capacity of 0.4 processors for LPAR1 70% 120% 100% CoD Copyright IBM

Virtual Processors (Unlimited) - Uncapped b g e f c a d h P2 1.5 processing unit with 15 Virtual Processors (maximum allowed) gives Partition potentially ALL 80 milliseconds of Processing time for every 10 physical milliseconds BUT only as long as the other shared processor partitions DON T have jobs ready to run! Copyright IBM

Floating Processors You have eight processors on your system. Seven are in the pool and one partition uses a dedicated processor. Dedicated partitions can allow its processors to be used for uncapped capacity (returned to the shared pool) when the partition is powered off, or a processor is removed from the partition. This is the default. Copyright IBM

Dedicated or Shared / Capped or Uncapped?? The best performance may well be achieved by using dedicated processors. However, dedicated processors cannot utilize excess capacity. For both capped and uncapped partitions, setting the virtual processor number too high may degrade performance. Shared uncapped allows use of excess capacity of the processors in the shared pool. Setting virtual processors too low limits the amount of excess you can use. Setting too high may negatively impact performance. Also be aware for uncapped partitions the operating system sees the number of desired virtual processors as equal to the number of physical processors, you need an OS license (i5/os, Linux and AIX 5L) for the lesser of the number of virtual processors or the number of processors in the shared pool. So what could be recommended? The right answer depends on workload. Copyright IBM

Partitioning Computers Copyright IBM

Virtual I/O Example Client Partition Virtual Ethernet Virtual Ethernet Client Adapter DMA Buffer Server Partition Virtual Switch Virtual Ethernet POWER Hypervisor Virtual SCSI Physical Ethernet Proxy ARP Physical Network Virtual Disk NWSD Server Adapter Device Mapping Copyright IBM SCSI, SSA, FC Physical or Logical Disks Copyright IBM Corporation 2005 Material may not be reproduced in whole or in part without the prior written permission of IBM.

Virtual Serial i5/os Partition Linux Partition First 2 virtual slots in every partition reserved for virtual serial server adapters for system console in HMC For i5/os, virtual serial adapters provide 5250 console For Linux and AIX 5L, they provide Copyright IBM character console Copyright IBM Corporation 2005 Material may not be reproduced in whole or in part without the prior written permission of IBM.

Virtual Serial Copyright IBM Copyright IBM Corporation 2005 Material may not be reproduced in whole or in part without the prior written permission of IBM.

Adding SCSI Adapter via DLPAR Create a virtual SCSI client adapter via the partition creation wizard Create a virtual SCSI server adapter via Dynamic LPAR on the i5/os partition Does not require any restart Copyright IBM Copyright IBM Corporation 2005 Material may not be reproduced in whole or in part without the prior written permission of IBM.

Virtual SCSI Linux/AIX 5L i5/os Individual slot numbers do no matter, as long as they are configured in pairs Copyright IBM Copyright IBM Corporation 2005 Material may not be reproduced in whole or in part without the prior written permission of IBM.