Computer Architecture ELEC3441 What is Computer Architecture? Introduction 2 nd Semester, 2017-18 Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Computer Architecture 2 Meltdown & Spectre Meltdown & Spectre Meltdown breaks the most fundamental isolation between user applications and the operating system. Spectre breaks the isolation between different applications. This is a very intricate attack but the root cause is unflushed speculative state from the cache resulting in a timing variations. 3 4
What is Computer Architecture? Design constraints Computer architecture is the study of the design and implementation of computing systems. Function Performance Cost Power Compatibility 5 Applica4ons Technology Improved technologies make new applica4ons possible Cost of so'ware development makes compa4bility a major force in market 6 Evolution of Computer Design Tradeoffs Architecture continually changing Applica4ons suggest how to improve technology, provide revenue to fund development 7 Function: Basic Arithmetic Performance: Low Cost: Low Power: Low 8
Calculators today Machine Learning è Architecture General Purpose High Performance Multi-user Low (?) Cost Compatible 9 Computer in the 60s Src: http://www.computerhistory.org 10 Today s Computer DEC s PDP-10 11 12
Uniprocessor Performance 100,000 Multi-Core CPU Performance (vs. VAX-11/780) 10,000 1000 100 10 AX-11/780, 5 MHz Intel Xeon 6 cores, 3.3 GHz (boost to 3.6 GHz) Intel Xeon 4 cores, 3.3 GHz (boost to 3.6 GHz) Intel Core i7 Extreme 4 cores 3.2 GHz (boost to 3.5 GHz) Intel Core Duo Extreme 2 cores, 3.0 GHz Intel Core 2 Extreme 2 cores, 2.9 GHz AMD Athlon 64, 2.8 GHz 11,865 14,38719,484 AMD Athlon, 2.6 GHz Intel Xeon EE 3.2 GHz 7,108 Intel D850EMVR motherboard (3.06 GHz, Pentium 4 processor with Hyper-Threading Technology) 6,043 6,681 IBM Power4, 1.3 GHz 4,195 Intel VC820 motherboard, 1.0 GHz Pentium III processor 3,016 Professional Workstation XP1000, 667 MHz 21264A 1,779 Digital AlphaServer 8400 6/575, 575 MHz 21264 1,267 993 AlphaServer 4000 5/600, 600 MHz 21164 Digital Alphastation 5/500, 500 MHz 649 481 Digital Alphastation 5/300, 300 MHz 280 22%/year Digital Alphastation 4/266, 266 MHz 183 IBM POWERstation 100, 150 MHz 117 Digital 3000 AXP/500, 150 MHz 80 HP 9000/750, 66 MHz 51 IBM RS6000/540, 30 MHz 24 52%/year MIPS M2000, 25 MHz MIPS M/120, 16.7 MHz 13 18 Sun-4/260, 16.7 MHz 9 VAX 8700, 22 MHz 5 Limited by Power, ILP, Memory speed 24,129 21,871 25%/year 1.5, VAX-11/785 1 1 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 13 Intel Lynnfield processor (source: AnandTech) 14 Warehouse Scale Computer n A new class of computer for massively parallel cluster of computers n Datacenter as a computer ~100,000 servers include design choices in electrical, electronic and building construction n Exploit service level parallelism n Designed for cloud-based services The End of the Uniprocessor Era Single biggest change in the history of computing systems 15 16
ELEC3441 Course Objectives n To make a simple processor Construct a workable processor n To make a uniprocessor runs fast Various techniques developed through the 1990s n To appreciate latest development in computer architecture research Techniques to overcome Power Wall, ILP Wall, Memory Wall A Quest into HW + SW n Computer architecture study requires deep understanding of both hardware and software n Software: Assume you know basic C/C++/Java Assume you know basic compilation flow Will learn assembly languages in homework/project/ tutorial Linux programming n Hardware: You need to know basic digital system design concept (will cover briefly in class) Good to take ELEC2302/3342 concurrently Will learn to use a hardware description language Chisel 17 18 Chisel simulators Chisel Design Flow n Chisel is a new hardware description language we developed at Berkeley based on Scala Constructing Hardware in a Scala Embedded Language Chisel Design Descrip4on n Homework will use RISC-V processor simulators derived from Chisel processor designs Gives you much more detailed information than other simulators Can map to FPGA or real chip layout C++ code Chisel Compiler FPGA Verilog ASIC Verilog n You need to learn some minimal Chisel in class, but we ll make Chisel RTL source available so you can see all the details of our processors C++ Compiler FPGA Tools ASIC Tools n Homework projects based on modifying the Chisel RTL code if desired C++ Simulator FPGA Emula4on GDS Layout 20 19
ELEC3441 Administrivia Instructors Dr. Hayden So TA Lectures Nina Engelhardt Mo 3:30-5:20pm Th 3:30-4:20pm LG036 Tutorials No Scheduled time. But will arrange additional office hours for homework Web http://www.eee.hku.hk/~elec3441 21 22 Textbook n Reading materials will be drawn from two main textbooks: Mostly from COD, with later modules from CAQ Computer Organization and Design: The Hardware/Software Interface David Patterson, John Hennessy 5 th Edition; Morgan Kaufmann (2013) ISBN-13: 978-0124077263 COD Computer Architecture: A Quantitative Approach John Hennessy, David Patterson 5 th Edition; Morgan Kaufmann (2011) ISBN-13: 978-0123838728 CAQ Assessments Homework 45% 3 homework assignments mini-project Include hands on with building real processors using Chisel Quiz 15% 5 in-class quizzes A way to make sure you follow class progress Exam 40% Cumulative of entire semester 23 24
Agenda for this semester n The Basics Single cycle processor n Improving CPI Memory Hierarchy Simple Pipelining n Breaking the CPI=1 barrier Out-of-order execution Super-scalar processor n Advanced Architectures Multi-core processors Vector processors VLIW Module 1 THE BASICS 25 26 Computer Architecture: HW/SW Interface Hardware Software Applications Compiler Assembler Operating System Instruction Set Architecture Microarchitecture Processor Digital Design Circuit Design Transistors Memory I/O Instruction Set Architecture (ISA) n The contract between software and hardware n Typically described by giving all the programmervisible state (registers + memory) plus the semantics of the instructions that operate on that state n IBM 360 was first line of machines to separate ISA from implementation (aka. microarchitecture) n Many implementations possible for a given ISA E.g., the Soviets build code-compatible clones of the IBM360, as did Amdahl after he left IBM. E.g.2., today you can buy AMD or Intel processors that run the x86-64 ISA. E.g.3: many cellphones use the ARM ISA with implementations from many different companies including TI, Qualcomm, Samsung, Marvell, etc. 27 28
ISA to Microarchitecture Mapping n ISA often designed with particular microarchitectural style in mind, e.g., Accumulator hardwired, unpipelined CISC microcoded RISC hardwired, pipelined VLIW fixed-latency in-order parallel pipelines JVM software interpretation n But can be implemented with any microarchitectural style Intel Ivy Bridge: hardwired pipelined CISC (x86) machine (with some microcode support) Simics: Software-interpreted SPARC RISC machine ARM Jazelle: A hardware JVM processor Notable ISAs Alpha MIPS 1992 DEC; in 21264 processors, etc 1986 research/mips PA-RISC 1986 HP; HP Workstations in the 90s PowerPC 1993 IBM/Motorola SPARC ARM x86 VAX IBM360 1987 Sun Microsystem 1985 ARM 1978 Intel 1977 DEC 1964 IBM; defined computer architecture See Appendix K of H&P for a survey of major ISAs Dominate Today Too complex for undergrad teaching 29 30 This Course: RISC-V ISA n RISC-V is a new simple, clean, extensible ISA that was originally developed at Berkeley for education and research RISC-I/II, first Berkeley RISC implementations Berkeley research machines SOAR/SPUR considered RISC-III/IV n Both of the dominant ISAs (x86 and ARM) are too complex to use for teaching n RISC-V ISA manual available on web page n Full GCC-based tool chain available 31 32
Electronic Numerical Integrator and Computer (ENIAC) n n n n Inspired by Atanasoff and Berry, Eckert and Mauchly designed and built ENIAC (1943-45) at the University of Pennsylvania The first, completely electronic, operational, general-purpose analytical calculator! 30 tons, 72 square meters, 200KW Performance Read in 120 cards per minute Addition took 200 µs, Division 6 ms 1000 times faster than Mark I Not very reliable! Application: Ballistic calculations angle = f (location, tail wind, cross wind, air density, temperature, weight of shell, propellant charge,... ) WW-2 Effort Electronic Discrete Variable Automatic Computer (EDVAC) n ENIAC s programming system was external Sequences of instructions were executed independently of the results of the calculation Human intervention required to take instructions out of order n Eckert, Mauchly, John von Neumann and others designed EDVAC (1944) to solve this problem Solution was the stored program computer program can be manipulated as data n First Draft of a report on EDVAC was published in 1945, but just had von Neumann s signature! In 1973 the court of Minneapolis attributed the honor of inventing the computer to John Atanasoff 33 34 Stored-Program Computer Program = A sequence of instructions Ex: Calculating Class Grades* grade = 0.1 lab + 0.2 mt +0.3 hw + 0.4 proj; Get the current instruction Execute the instruction Determines the next instruction to fetch grade = 0; tmp = 0.1 lab; grade = grade + tmp; tmp = 0.2 mt; grade = grade + tmp; tmp = 0.3 hw; grade = grade + tmp; tmp = 0.4 proj; grade = grade + tmp; Time 35 *This is not how we are going to calculate your grades 36
And in conclusion n The study of computer architecture allows us to construct better computer systems Performance, power n Computer architecture is a study that crosses software and hardware n We will use RISC-V as main ISA for class work, but design principles applicable to other computer designs n Stored programmed computer will be the basic computing model studied Acknowledgements n These slides contain material developed and copyright by: Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Patterson (UCB) n MIT material derived from course 6.823 n UCB material derived from course CS152, CS252 37 38