Administrivia. Minute Essay From 4/11

Similar documents
378: Machine Organization and Assembly Language

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism

ENCM 501 Winter 2016 Assignment 1 for the Week of January 25

ENCM 501 Winter 2018 Assignment 2 for the Week of January 22 (with corrections)

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015

CSC258: Computer Organization. Memory Systems

EECS 482 Introduction to Operating Systems

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Parallel Programming: Background Information

Parallel Programming: Background Information

CPS104 Computer Organization Lecture 1. CPS104: Computer Organization. Meat of the Course. Robert Wagner

Online Course Evaluation. What we will do in the last week?

CPS104 Computer Organization Lecture 1

Chapter 17 - Parallel Processing

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

CS3350B Computer Architecture

COSC 2P91. Bringing it all together... Week 4b. Brock University. Brock University (Week 4b) Bringing it all together... 1 / 22

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

Introduction to Computing and Systems Architecture

Project 1 Balanced binary

WHY PARALLEL PROCESSING? (CE-401)

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design

Modern Processor Architectures. L25: Modern Compiler Design

Lecture Transcript While and Do While Statements in C++

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Multiprocessors - Flynn s Taxonomy (1966)

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

The Java Memory Model

CSE 120 Principles of Operating Systems

These are notes for the third lecture; if statements and loops.

CS377P Programming for Performance Multicore Performance Multithreading

Overview: Memory Consistency

CSC 2405: Computer Systems II

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

First, the need for parallel processing and the limitations of uniprocessors are introduced.

CSC258: Computer Organization. Microarchitecture

XP: Backup Your Important Files for Safety

COSC 2P95. Introduction. Week 1. Brock University. Brock University (Week 1) Introduction 1 / 18

AMath 483/583, Lecture 24, May 20, Notes: Notes: What s a GPU? Notes: Some GPU application areas

Introduction II. Overview

CHAPTER 16 - VIRTUAL MACHINES

CS 2630 Computer Organization. What did we accomplish in 15 weeks? Brandon Myers University of Iowa

Introduction to Embedded Systems. Lab Logistics

Lecture 1: Why Parallelism? Parallel Computer Architecture and Programming CMU , Spring 2013

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Intro. Scheme Basics. scm> 5 5. scm>

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28

CSE 333 Lecture 1 - Systems programming

printf( Please enter another number: ); scanf( %d, &num2);

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

The Art of Parallel Processing

Cache Coherence Tutorial

Introduction to Concurrency (Processes, Threads, Interrupts, etc.)

Processes & Threads. Process Management. Managing Concurrency in Computer Systems. The Process. What s in a Process?

Parallelism. CS6787 Lecture 8 Fall 2017

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

CSCI 8530 Advanced Operating Systems. Part 19 Virtualization

UC Berkeley CS61C : Machine Structures

Lecture #7: Implementing Mutual Exclusion

Principles of Algorithm Design

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets

Final Lecture. A few minutes to wrap up and add some perspective

! Readings! ! Room-level, on-chip! vs.!

If Statements, For Loops, Functions

ECE/CS 250 Computer Architecture. Summer 2016

Parallelism. Parallel Hardware. Introduction to Computer Systems

How to approach a computational problem

OpenACC Course. Office Hour #2 Q&A

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

Parallel Computing. Prof. Marco Bertini

GP-GPU. General Purpose Programming on the Graphics Processing Unit

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

ECE 313 Computer Organization FINAL EXAM December 13, 2000

Parallel Architectures

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011

Lecture 1: Gentle Introduction to GPUs

Lecture 8: RISC & Parallel Computers. Parallel computers

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

Discussion 1H Notes (Week 3, April 14) TA: Brian Choi Section Webpage:

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

CS 475: Parallel Programming Introduction

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

Lecture 12. Lecture 12: The IO Model & External Sorting

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.

Transcription:

Administrivia All homeworks graded. If you missed one, I m willing to accept it for partial credit (provided of course that you haven t looked at a sample solution!) through next Wednesday. I will grade Exam 2 soon and send out grade summaries. Slide 1 Extra-credit problems posted. Totally optional and can only help your grade (I add any points to the your points number without changing the total points number.) Due May 8 at 5:30pm (firm deadline). About office hours, I may be on campus a few days next week (details by e-mail early next week) and either way will plan to do some virtual office hours (times TBA). And I do respond to e-mail pretty well! Minute Essay From 4/11 (Not a lot of responses, but most were fairly sensible. Review question / my answer.) Slide 2

Virtual Machines Executive-Level Summary There s been increasing interest lately in virtual machines / virtualization. Some are purely software (e.g., Java Virtual Machine) but others involve or at least rely on hardware. Slide 3 Idea actually goes back a long time supported in 1970s by IBM s VM/370, which was (or is?) in some sense a stripped-down O/S that allowed running multiple guest O/S es side by side. Very useful in its time physical machines often needed to be shared among people with very different needs w.r.t. O/S. Current versions include the VMware ESX server (other examples in textbook, but this name I recognize). Sidebar: Dual-Mode Operation For simple single-user machines it s more or less reasonable to allow any application to do anything the hardware can do (access all memory, do I/O, etc.) though even there it means whatever O/S there is is vulnerable to malicious or buggy programs. Slide 4 For anything less simple, useful to have a notion of two modes, regular and privileged, where in regular mode some instructions are not permitted (attempts to execute them cause hardware exceptions) (e.g., instructions to do I/O). Normally at least some O/S code runs in privileged mode, and applications in regular mode. Makes it possible for the O/S to defend itself from malicious or buggy applications, and also avoids applications interfering with each other.

Virtual Machines Semi-Executive-Level Summary What the real hardware is running is a Virtual Machine Monitor, a.k.a. hypervisor (term analogous to supervisor, a term for O/S). Interrupts and exceptions transfer control to this hypervisor, which then decides which guest O/S they re meant for and does the right thing. Slide 5 This all works better with hardware support for dual-mode operation: Guest O/S s run in regular mode, and when they execute privileged instructions (as they more or less have to), the hypervisor gets control and then can simulate... Other than than, programs run as they do without this extra layer of abstraction they re just executing instructions, after all? Virtual Machines Semi-Executive-Level Summary, Continued Some architectures make this easier than others they re virtualizable. Slide 6 Interestingly enough(?), IBM s rather old 370 does, but for many newer architectures the needed support has had to be added on, not always neatly. Hm!? (Textbook has a few more details, in section 5.8.)

Parallel Computing Overview Support for things happening at the same time goes back to early mainframe days, in the sense of having more than one program loaded into memory and available to be worked on. If only one processor, at the same time actually means interleaved in some way that s a good fake. (Why? To hide latency.) Slide 7 Support for actual parallelism goes back almost as far, though mostly of interest to those needing maximum performance for large problems. Somewhat controversial, and for many years wait for Moore s law to provide a faster processor worked well enough. Now, however... Parallel Computing Overview, Continued Improvements in processing elements (processors, cores, etc.) seem to have stalled some years ago. Instead hardware designers are coming up with ways to provide more processing elements. One result is that multiple applications can execute really at the same time. Slide 8 Another result is that individual applications could run faster by using multiple processing elements. Non-technical analogy: If the job is too big for one person, you hire a team. But making this effective involves some challenges (how to split up the work, how to coordinate). In a perfect world, maybe compilers could be made smart enough to convert programs written for a single processing element to ones that can take advantage of multiple PEs. Some progress has been made, but goal is elusive.

Parallel Computing Hardware Platforms (Overview) Clusters multiple processor/memory systems connected by some sort of interconnection (could be ordinary network or fast special-purpose hardware). Examples go back many years. Slide 9 Multiprocessor systems single system with multiple processors sharing access to a single memory. Examples also go back many years. Multicore processors single processor with multiple independent PEs sharing access to a single memory. Relatively new, but conceptually quite similar to multiprocessors. SIMD platforms hardware that executes a single stream of instructions but operates on multiple pieces of data at the same time. Popular early on (vector processors, early Connection Machines) and now being revived (GPUs used for general-purpose computing). Parallel Programming Software (Overview) Key idea is to split up application s work among multiple units of execution (processes or threads) and coordinate their actions as needed. Non-trivial in general, but not too difficult for some special cases ( embarrassingly parallel ) that turn out to cover a lot of ground. Slide 10 Two basic models, shared-memory and distributed-memory. Shared-memory has two variants, SIMD ( single instruction, multiple data and MIMD ( multiple instruction, multiple data ). SPMD ( single program, multiple data ) can be used with either one, and often is, since it simplifies things.

Shared-Memory Model (MIMD) Units of execution are (typically) threads, all with access to common memory space, potentially executing different code. Slide 11 Convenient in a lot of ways, but sharing variables makes race conditions possible. (Now that you know more about how hardware works you may understand the issues better! A single line of HLL code may translate to multiple instructions... ) Typical programming environments include ways to start threads, split up work, synchronize. OpenMP extensions (C/C++/Fortran) somewhat low-level standard. Distributed-Memory Model Units of execution are processes, each with its own memory space, communicating using message passing, potentially executing different code. Less convenient, and performance may suffer if too much communication relative to amount of computation, but race conditions much less likely. Slide 12 Typical programming environments include ways to start processes, pass messages among them. MPI library (C/C++/Fortran) somewhat low-level standard.

SIMD Model Units of execution term may not make sense. Parallelism comes from all processing elements executing the same program in lockstep, but with different processing elements operating on different data elements. Slide 13 Excellent fit for some problems ( data-parallel ), not for others. Very convenient when it fits, pretty inconvenient when not. Typical programming environments feature ways to express data parallelism. OpenCL (C/C++) may emerge as somewhat low-level standard, especially suited for GPGPU. Shared-Memory Hardware, Revisited Figure 6.7 sketches basic idea multiple processing elements (call them processors, cores, whatever) connected to a single memory. Slide 14 Synchronization (locking) can be done with no hardware support, but it s tricky. Simple approach is something such as: while (lock!= 0) {}; lock = 1; which doesn t work because test and set are separate instructions. (What goes wrong?) Somewhat-tricky algorithms exist for solving this problem in software, but...

Shared-Memory Hardware Locking Locking is much easier if ISA provides some support, in the form of an instruction that allows... Well, essentially allows both read and write access to a location as a single atomic operation. Slide 15 Some architectures implement this directly, via a compare and swap or test and set instruction. But for MIPS that might be challenging (why?). So MIPS defines two instructions, load linked (ll) and store conditional (sc). Tricky, but an example may help some. Shared-Memory Hardware Locking, Continued Slide 16 Example from text, p. 122, to exchange a memory location with register contents, slightly modified (first line is wrong?!): again: add $t0, $zero, $s2 # $t0 <- $s2 ll $t1, 0($s1) # $t1 <- 0($s1) sc $t0, 0($s1) # $t0 -> 0($s1) IF unchanged beq $t0, $zero, again # try again if changed add $s2, $zero, $t1 # $s2 <- old value of 0($s2) How this works: Thell remembers the load-from address. The following sc succeeds only if the value at that address hasn t been changed (e.g., by another processor). Tricky!

Shared-Memory Hardware Memory Access to RAM can be reasonably straightforward only one processor at a time. Caches complicate things (next slide). Slide 17 Single memory may actually be multiple memories, with each processing element having access to all memory, but faster access to one section ( NUMA (Non-Uniform Memory Access)). Making good use of this can affect performance and may be non-trivial to accomplish, especially if programming environment doesn t give you appropriate tools. Shared-Memory Hardware Caches As noted, even if access to RAM is one-processor-at-a-time, if each processing element has its own cache, things may get tricky. Typically hardware provides some way to keep them all in synch (the cache coherency problem discussed in Chapter 5). Slide 18 Further, application programs may have to deal with false sharing multiple threads access distinct data in the same cache line. Cache coherency guarantees correctness of result, but performance may well be affected. (Example multithreaded program where each thread computes a partial sum. Having the partial sums as thread-local variables can be much faster than having a shared array of partial sums.)

Distributed-Memory Hardware, Revisited Figure 6.13 sketches basic idea multiple systems (processor(s) plus memory) communicating over a network. No special hardware required, though really high-end systems may provide a fast special-purpose network. Slide 19 SIMD Hardware Various ways to implement this idea in hardware. One approach: multiple processing elements sharing access to memory and all executing the same instruction stream, Slide 20 This is more or less how GPUs work. A complication they often have a separate memory, so data must be copied to/from RAM. Potential performance problem, may be cumbersome for programmers. Another approach: vector processing units that stream/pipeline operation on data elements to get the data-parallelism effect.

Other Hardware Support for Parallelism Slide 21 Instruction-level parallelism (discussed in not-assigned section(s) of Chapter 4) allows executing instructions from a single instruction stream at the same time, if it s safe to do so. Requires hardware and compiler to cooperate, and (sometimes?) involves duplicating parts of hardware (functional units). Hardware multithreading (discussed in Chapter 6) includes several strategies for speeding up execution of multiple threads by duplicating parts of processing element (as opposed to duplicating full PE, as happens with cores ). Course Recap Topics A little about performance. (It s not simple!) MIPS assembler language; translating C to MIPS assembler language. Compiling, assembling, and linking. Slide 22 Binary representation of instructions. Binary representation of data (integers, ASCII, floating-point numbers); basics of computer arithmetic. Gate-level logic design. Design of a processor ALU, datapath, control; a little about pipelining. A little about caches. Other schools spread this material over two or even three courses (though they presumably cover more in all). So, we have done a lot?

Course Recap, Continued Some topics representation of data, computer arithmetic, maybe finite state machines are review, or small extension of what you know. Slide 23 Others, though assembler language, gate-level logic, designing a processor in terms of AND/OR/NO and how it works are not familiar to most, and involve a new perspective, or mindset, or mental model. My observation some students take to it, others struggle. I do hope, however, that all of you have come away with more understanding of how things work under the hood than you had! Minute Essay How did you like having class via video lecture? my thinking is that the videos can be replayed, which could be a plus, but they don t offer an easy way to ask questions. Slide 24 If you ve taken another course in which lectures were all on video and class time was used for something else, how did that work for you? If you haven t, does it sound like something you d like? (Now that I know how to make video lectures I m speculating about how to use them as a supplement rather than substitute.) And best wishes for a good summer break!