ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Winter 2018 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2018 1
When and Where? When: Monday & Wednesday 5:00-6:50 PM Where: PCC Willow Creek 312 Office hours: After class, or by appointment TA: Aarti Rane Office hours; TBD (starting next week) Webpage: http://www.cecs.pdx.edu/~alaa/ece588/ Go to webpage for: Class slides and papers Homework and project assignments Changes in class schedule Portland State University ECE 588/688 Winter 2018 2
Course Description Parallel computing and multiprocessors Symmetric Multiprocessors (SMPs) Chip Multiprocessors (CMPs), aka multi-core processors Multithreading and parallel programming models Multiprocessor memory systems Emphasis on papers readings NOT on a textbook Tutorial papers Original sources and ideas papers Papers covering most recent trends Portland State University ECE 588/688 Winter 2018 3
Expected Background ECE 587/687 or equivalent Superscalar processor microarchitecture Branch prediction Cache organization Memory ordering Speculative execution Multithreading Programming experience in C Portland State University ECE 588/688 Winter 2018 4
Grading Policy Class Participation 5% Paper reviews 10% Homeworks 20% Project 25% Midterm Exam 20% Final Exam 20% Grading Scale: A: 92-100% A-: 86-91.5% B+: 80-85.5% B: 76-79.5% B-: 72-75.5% C+: 68-71.5% C: 64-67.5% C-: 60-63.5% D+: 57-59.5% D: 54-56.5% D-: 50-53.5% F: Below 50% Portland State University ECE 588/688 Winter 2018 5
Important Dates Midterm Exam: Monday, February 12 th, in class Final Exam: Wednesday March 14 th, in class Project Presentations: Friday March 16 th, 7-9 PM, location TBD Project Reports Due: Tuesday March 20 th (email) No class next Monday (holiday). Other classes may get combined/cancelled/moved. Portland State University ECE 588/688 Winter 2018 6
Why Study Computer Architecture Technology advancements require continuous optimization of cost, performance, and power Moore s law Original version: Transistor scaling exponential Popular version: Processor performance exponentially increasing Innovation needed to satisfy market trends User and software requirements keep on changing Software developers expecting improvements in computing power Portland State University ECE 588/688 Winter 2018 7
Why Study Parallel Computing Technology made multi-core processors both feasible AND necessary for performance Moore s law: too many transistors on a die than can be used (efficiently) for a single processor Traditional out-of-order processors face memory and power walls Software requirements need more computing power than a single processor Scientific computations Commercial applications Portland State University ECE 588/688 Winter 2018 8
Moore s Law (1965) #Transistors Per Chip (Intel) 10,000,000,000 1,000,000,000 100,000,000 10,000,000 1,000,000 100,000 10,000 1971 1974 1977 1980 1,000 1983 1986 1989 1992 1995 1998 2001 Year 2004 Almost 75% increase per year Portland State University ECE 588/688 Winter 2018 9
Memory Wall 1000 Time (ns) 100 10 1 0.1 1982 1984 1986 1988 1990 1992 1994 1998 2000 2002 2004 2006 1996 Year CPU cycle time: 500 times faster since 1982 DRAM Latency: Only ~5 times faster since 1982 CPU Cycle Time DRAM latency Portland State University ECE 588/688 Winter 2018 10
Why Not Build Larger OoO Memory Wall Processors? 1980 Memory Latency ~ 1 instruction Now ~ 1000 instructions Power Wall Dynamic power increases quadratically with frequency, static power increases exponentially ILP Limitations Serial programs don t have an infinite amount of instructions to execute in parallel Other Problems Temperature & cooling On-chip interconnect Complexity & testing Portland State University ECE 588/688 Winter 2018 11
Technology Trends: Growing #Transistors Causes Inflection Points Most famous inflection point enabled simple microprocessor 1971 Intel 4004 2300 transistors Could access 300 bytes of memory (0.0003 megabytes) Bigger and better-performing processors were enabled by more transistors 2006, Prof. Mark Hill 12 Portland State University ECE 588/688 Winter 2018
Microprocessor Design Complexity Millions of Transistors Bit-level Parallelism (BLP) 64-bit multiply in one/two cycles Instruction-Level Parallelism (ILP) Dozens of instruction in flight Branch prediction Out-of-order Dominating Use of Caches Level-one cache Level-two cache Emerging Chip Multiprocessing L1 Data Cache L2 Cache Tags PowerPC 750 (1998) L1 Instruction Cache 13 Portland State University ECE 588/688 Winter 2018
Multi-core Processors (Chip Replicate processor core & Caches Multiprocessors) CPU 0 + L1 Cache CPU 1 + L1 Cache Uses more transistors Tolerates memory wall Simpler lower-power cores Reduces complexity Shared L2 Cache 14 Portland State University ECE 588/688 Winter 2018 IBM Power 4
Transactions Per Minute (tpmc) 7000000 Software Trends 6000000 5000000 4000000 3000000 2000000 1000000 0 Dec-99 Apr-01 Sep-02 Jan-04 May-05 Oct-06 Feb-08 Jul-09 TPC-C Throughput increased by more than 75% per year over eight years What about desktop applications? What about scientific applications? 15 Portland State University ECE 588/688 Winter 2018
Multi-Core Performance Recall Popular Moore s Law: Microprocessor performance doubles every 1.5-2 years Future Multi-Core Performance Doublings Require Effective multithreaded programming Better communication Faster synchronization More cores Portland State University ECE 588/688 Winter 2018 16
Multi-core Processor Architecture Cache Cache Core0 Core0 Core1 Core4 Core5 Core8 Core9 Core12 Core13 Core2 Core6 Core10 Core14 Core1 Core3 Core7 Core11 Core15 Cache Cache Which design is better? Balance cores, caches, and pin bandwidth Portland State University ECE 588/688 Winter 2018 17
Programming Models Single-core One program runs on core Programmers write efficient programs Architects design for fast execution of instructions Multi-core Each core can run a thread/task/program Programmers can split up their programs? Multiprocessors Shared memory Message passing Threads Data Parallel Portland State University ECE 588/688 Winter 2018 18
Introduction to Parallel Computing Serial vs. Parallel Computation (tutorial pictures) Parallel computing includes Break up problem into smaller discrete parts that can be solved concurrently Break each part further into a sequence of (serial) instructions All parts are run simultaneously on different CPUs Why use parallel computing? Save time Solve larger problems Provide concurrency (do more than one thing at the same time) Save money (use multiple cheaper resources vs. a single expensive supercomputer) Use more memory than available on a single computer Use non-local resources Who uses parallel computing and why? (tutorial graphs) Portland State University ECE 588/688 Winter 2018 19
Reading Assignment Introduction to Parallel Computing, Lawrence Livermore National Laboratory tutorial Read before Wed class Wed 1/17: Olukotun et al., The Case for a Single-Chip Multiprocessor, ASPLOS 1996 Submit review before the beginning of class: Paper summary Strong points (2-4 points) Weak points (2-4 points) Review due before 5PM on day of class Send by email to aarane AT pdx DOT edu Plain text, no attachments Subject line has to be: ECE588_REVIEW_01_17 Portland State University ECE 588/688 Winter 2018 20