Computer Architecture ELEC3441 What is Computer Architecture? Introduction 2 nd Semester, 2016-17 Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Computer Architecture 2 What is Computer Architecture? Design constraints Computer architecture is the study of the design and implementation of computing systems. Function Cost Performance Power Compatibility 3 4
Architecture continually changing Machine Learning è Architecture Applica4ons suggest how to improve technology, provide revenue to fund development Applica4ons Technology Improved technologies make new applica4ons possible Cost of so'ware development makes compa4bility a major force in market Evolution of Computer Design Tradeoffs Function: Basic Arithmetic Performance: Low 5 6 Calculators today General Purpose High Performance Multi-user Low (?) Cost Cost: Low Power: Low Compatible 7 8
Computer in the 60s Src: http://www.computerhistory.org Today s Computer DEC s PDP-10 9 Uniprocessor Performance Performance (vs. VAX-11/780) 1000 100 10 Multi-Core CPU 100,000 10,000 Intel Xeon 6 cores, 3.3 GHz (boost to 3.6 GHz) Intel Xeon 4 cores, 3.3 GHz (boost to 3.6 GHz) Intel Core i7 Extreme 4 cores 3.2 GHz (boost to 3.5 GHz) 24,129 Intel Core Duo Extreme 2 cores, 3.0 GHz 21,871 19,484 Intel Core 2 Extreme 2 cores, 2.9 GHz 14,387 AMD Athlon 64, 2.8 GHz 11,865 AMD Athlon, 2.6 GHz Intel Xeon EE 3.2 GHz 7,108 Intel D850EMVR motherboard (3.06 GHz, Pentium 4 processor with Hyper-Threading Technology) 6,043 6,681 4,195 IBM Power4, 1.3 GHz 3,016 Intel VC820 motherboard, 1.0 GHz Pentium III processor 1,779 Professional Workstation XP1000, 667 MHz 21264A 1,267 Digital AlphaServer 8400 6/575, 575 MHz 21264 993 AlphaServer 4000 5/600, 600 MHz 21164 649 Digital Alphastation 5/500, 500 MHz 481 Digital Alphastation 5/300, 300 MHz 280 22%/year Digital Alphastation 4/266, 266 MHz 183 IBM POWERstation 100, 150 MHz 117 Digital 3000 AXP/500, 150 MHz 80 HP 9000/750, 66 MHz 51 IBM RS6000/540, 30 MHz MIPS M2000, 25 MHz 18 MIPS M/120, 16.7 MHz 13 Sun-4/260, 16.7 MHz 9 VAX 8700, 22 MHz 5 10 AX-11/780, 5 MHz 25%/year 24 52%/year Limited by Power, ILP, Memory speed 1.5, VAX-11/785 1 1 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 11 Intel Lynnfield processor (source: AnandTech) 12
Warehouse Scale Computer n A new class of computer for massively parallel cluster of computers n Datacenter as a computer ~100,000 servers include design choices in electrical, electronic and building construction n Exploit service level parallelism n Designed for cloud-based services The End of the Uniprocessor Era Single biggest change in the history of computing systems 13 14 ELEC3441 Course Objectives n To make a simple processor Construct a workable processor n To make a uniprocessor runs fast Various techniques developed through the 1990s n To appreciate latest development in computer architecture research Techniques to overcome Power Wall, ILP Wall, Memory Wall A Quest into HW + SW n Computer architecture study requires deep understanding of both hardware and software n Software: Assume you know basic C/C++/Java Assume you know basic compilation flow Will learn assembly languages in homework/project/ tutorial Linux programming n Hardware: You need to know basic digital system design concept (will cover briefly in class) Good to take ELEC2302/3342 concurrently Will learn to use a hardware description language Chisel 15 16
Chisel simulators Chisel Design Flow n Chisel is a new hardware description language we developed at Berkeley based on Scala Constructing Hardware in a Scala Embedded Language Chisel Design Descrip4on n Homework will use RISC-V processor simulators derived from Chisel processor designs Gives you much more detailed information than other simulators Can map to FPGA or real chip layout C++ code Chisel Compiler FPGA Verilog ASIC Verilog n You need to learn some minimal Chisel in class, but we ll make Chisel RTL source available so you can see all the details of our processors C++ Compiler FPGA Tools ASIC Tools n Homework projects based on modifying the Chisel RTL code if desired C++ Simulator FPGA Emula4on GDS Layout 18 17 ELEC3441 Administrivia Instructors Dr. Hayden So TA Lectures Nina Engelhardt Mo 3:30-5:20pm Th 3:30-4:20pm MB 113G Tutorials No Scheduled time. But will arrange additional office hours for homework Web http://www.eee.hku.hk/~elec3441 19 20
Administrivia n Thursday 1/26 ( 年 29) No class? Tutorial? Textbook n Reading materials will be drawn from two main textbooks: Mostly from COD, with later modules from CAQ Computer Organization and Design: The Hardware/Software Interface David Patterson, John Hennessy 5 th Edition; Morgan Kaufmann (2013) ISBN-13: 978-0124077263 COD Computer Architecture: A Quantitative Approach John Hennessy, David Patterson 5 th Edition; Morgan Kaufmann (2011) ISBN-13: 978-0123838728 CAQ 21 22 Assessments Homework 45% 3 homework assignments mini-project Include hands on with building real processors using Chisel Quiz 15% 5 in-class quizzes A way to make sure you follow class progress Exam 40% Cumulative of entire semester Agenda for this semester n The Basics Single cycle processor n Improving CPI Memory Hierarchy Simple Pipelining n Breaking the CPI=1 barrier Out-of-order execution Super-scalar processor n Advanced Architectures Multi-core processors Vector processors VLIW 23 24
Computer Architecture: HW/SW Interface Applications Software Compiler Assembler Module 1 THE BASICS Hardware Operating System Instruction Set Architecture Microarchitecture Processor Memory I/O Digital Design Circuit Design Transistors 25 26 Instruction Set Architecture (ISA) n The contract between software and hardware n Typically described by giving all the programmervisible state (registers + memory) plus the semantics of the instructions that operate on that state n IBM 360 was first line of machines to separate ISA from implementation (aka. microarchitecture) n Many implementations possible for a given ISA E.g., the Soviets build code-compatible clones of the IBM360, as did Amdahl after he left IBM. E.g.2., today you can buy AMD or Intel processors that run the x86-64 ISA. E.g.3: many cellphones use the ARM ISA with implementations from many different companies including TI, Qualcomm, Samsung, Marvell, etc. ISA to Microarchitecture Mapping n ISA often designed with particular microarchitectural style in mind, e.g., Accumulator hardwired, unpipelined CISC microcoded RISC hardwired, pipelined VLIW fixed-latency in-order parallel pipelines JVM software interpretation n But can be implemented with any microarchitectural style Intel Ivy Bridge: hardwired pipelined CISC (x86) machine (with some microcode support) Simics: Software-interpreted SPARC RISC machine ARM Jazelle: A hardware JVM processor 27 28
Notable ISAs Alpha 1992 DEC; in 21264 processors, etc MIPS 1986 research/mips PA-RISC 1986 HP; HP Workstations in the 90s PowerPC 1993 IBM/Motorola SPARC 1987 Sun Microsystem ARM 1985 ARM x86 1978 Intel VAX 1977 DEC IBM360 1964 IBM; defined computer architecture Dominate Today This Course: RISC-V ISA n RISC-V is a new simple, clean, extensible ISA that was originally developed at Berkeley for education and research RISC-I/II, first Berkeley RISC implementations Berkeley research machines SOAR/SPUR considered RISC-III/IV n Both of the dominant ISAs (x86 and ARM) are too complex to use for teaching See Appendix K of H&P for a survey of major ISAs Too complex for undergrad teaching n RISC-V ISA manual available on web page n Full GCC-based tool chain available 29 30 Electronic Numerical Integrator and Computer (ENIAC) n n n n Inspired by Atanasoff and Berry, Eckert and Mauchly designed and built ENIAC (1943-45) at the University of Pennsylvania The first, completely electronic, operational, general-purpose analytical calculator! 30 tons, 72 square meters, 200KW Performance Read in 120 cards per minute Addition took 200 µs, Division 6 ms 1000 times faster than Mark I Not very reliable! WW-2 Effort Application: Ballistic calculations angle = f (location, tail wind, cross wind, air density, temperature, weight of shell, propellant charge,... ) 31 32
Electronic Discrete Variable Automatic Computer (EDVAC) n ENIAC s programming system was external Sequences of instructions were executed independently of the results of the calculation Human intervention required to take instructions out of order n Eckert, Mauchly, John von Neumann and others designed EDVAC (1944) to solve this problem Solution was the stored program computer program can be manipulated as data n First Draft of a report on EDVAC was published in 1945, but just had von Neumann s signature! In 1973 the court of Minneapolis attributed the honor of inventing the computer to John Atanasoff Stored-Program Computer Get the current instruction Program = A sequence of instructions Execute the instruction Determines the next instruction to fetch 33 34 Ex: Calculating Class Grades* grade = 0.1 lab + 0.2 mt +0.3 hw + 0.4 proj; grade = 0; tmp = 0.1 lab; grade = grade + tmp; tmp = 0.2 mt; grade = grade + tmp; tmp = 0.3 hw; grade = grade + tmp; tmp = 0.4 proj; grade = grade + tmp; *This is not how we are going to calculate your grades Time 35 And in conclusion n The study of computer architecture allows us to construct better computer systems Performance, power n Computer architecture is a study that crosses software and hardware n We will use RISC-V as main ISA for class work, but design principles applicable to other computer designs n Stored programmed computer will be the basic computing model studied 36
Acknowledgements n These slides contain material developed and copyright by: Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Patterson (UCB) n MIT material derived from course 6.823 n UCB material derived from course CS152, CS252 37