ECE 154A Introduction to Computer Architecture Dmitri Strukov Lecture 1
Outline Admin What this class is about? Prerequisites ii Simple computer Performance Historical trends Economics 2
Admin Office Hours: W 3 pm 5 pm, HFH 5153 Course load and grading: ~5 projects (30%) ~5 HWs (10%) 2 midterms (15% each) Final (30 %) up to 5 % extra for participation and attendance TAs: Michael Klachko (M 8 am 1pm / W 8 am 12 pm / Th 8 am 11 am) Joseph McMahan (M 10am 3 pm / T 9 am 12 pm / W 10 am 3 pm / Th 9 am 12 pm / F 1pm 5 pm) Fan Lin F 9 am Rec (M Th 10 am 12 pm / M & W 1 pm 3 pm) Website: http://www.ece.ucsb.edu/~strukov/ (to be set up this weekend) No recitation this week will have extra at the end of course 3
Admin: Textbooks Required: Computer Organization and Design: The Hardware/Software Interface, Fourth Edition, Patterson and Hennessy (COD). The third edition is also fine. Recommended: Digital Design and Computer Architecture, David and Sarah Harris, 2012 (2 nd Ed). Recommended: Computer Architecture: From Microprocessors to Supercomputers, Bh Behrooz Parhami, 2005. C language manual webpage from Stanford University 4
Major computing platforms Application Specific Integrated Circuit Field Programmable Gate Array Microprocessor Density, speed Flexibility In this class, the focus is on the microprocessors only Chip cost = Non recurring engineering cost / volume + cost per chip
What is Computer Architecture? Application Physics Gap too large to bridge in one step (but there are exceptions, e.g. magnetic compass) In its broadest definition, computer architecture is the design of the abstraction layers that allow us to implement information processing applications efficiently i using available manufacturing technologies. 6
How do we handle complexity? Software Hardware Application (ex: browser) Compiler Assembler Processor Memory Operating System (Mac OSX) Datapath & Control Digital Design Circuit Design Transistors I/O system Instruction Set Architecture ECE 154A Coordination of many levels of abstraction 7 Dan Garcia
Levels of Representation High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g.,arm) Assembler Machine Language Program (ARM) Machine Interpretation Hardware Architecture Description (e.g., block diagrams) temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; ldr r0, [r2] ldr r1, [r2, #4] str r1, [r2] str r0, [r2, #4] 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Architecture Implementation Logic Circuit Description (Circuit Schematic Diagrams) 8 Dan Garcia
Prerequisites: Knowledge of Digital logic (ECE152A, 15A) Combinational logic (Logic gates, Critical path) Sequential logic (Clock cycle time, finite state machine) Basic logic circuits (muxs, registers, ALU, memories) Basic programming skills C language Procedures, pointers and arrays 9
Assignment for Next Week HW to be posted this weekend on the prerequisite + Chapter 1 material (due October 7 th 11 pm) Read chapter 1 from P&H Appendix C and D1D4(f D.1 D.4 (for prereq) 10
Simple Computer Keep data in memory Program algorithm as a sequence of steps Typical step is some operation on dt data Store this sequence in memory Execute steps one by one Control circuitry orchestrates steps Datapath implement steps 11
Simple Computer Store program (Von Neumann) computer Algorithm for F = A x B + C / D Step 1: Temp1 = A x B addresses Memory Step 2: Temp2 = C / D Control data Step 3: F = Temp1 + Temp2 operation Datapath Load first instructio n to control Read A and B from memory, compute temp1, write temp1 to memory Load second instructio n to control Read C and D from memory, compute temp2, Load write second temp2 to instructio memory n to control Read temp1 and temp2 from memory, compute F, write F to memory time 12
Components of Computer Computer Processor Control ( brain ) Datapath ( brawn ) Memory (where programs, data live when running) Devices Input Output Keyboard, Mouse Disk (where programs, data live when not running) Display, Printer 13
Performance Metrics Execution time per application Energy per application Throughput (# apps executed per unit time) Benchmarking Intended set of applications (SPEC) Geometric average n n Execution time ratio i i1 14
Performance Performance = 1 / execution time Execution time = Clock Cycle Time x # Cycles = #Cycles / Clock Rate 15
Ways to Improve Simple Computer? Reduce the number of clocks? Reduce clock cycle time? addresses Control Memory data operation Datapath Load first instructio n to control Read A and B from memory, compute temp1, write temp1 to memory Load second instructio n to control Read C and D from memory, compute temp2, Load write second temp2 to instructio memory n to control Read temp1 and temp2 from memory, compute F, write F to memory time 16
Performance Performance = 1 / Execution Time Execution Time = Clock Cycle Time x # Cycles = #Cycles / Clock Rate # Cycles = Instruction Count x (Average) Clocks Per Instruction Execution Time = CCT x IC x CPI = IC x CPI / Clock Rate 17
CPI Example Computer A: CCT = 250ps, CPI = 2.0 Computer B: CCT= 500ps, p, CPI = 1.2 Same ISA Which is faster, and by how much? CPU Time A CPU Time B CPU Time B CPU Time A Instruction Count CPI A I 20 2.0 250ps I 500ps Instruction Count CPI B I 12 1.2 500ps I 600ps I 600ps I 500ps 1.2 CCT A CCT B A is faster by this much 18
CPI in More Detail If different instruction classes take different numbers of cycles Clock Cycles n (CPI Instruction Count i1 i i ) Weighted average CPI CPI Clock Cycles Instruction ti Count n i1 CPI i Instruction Counti Instruction ti Count Relative frequency 19
Amdahl's Law Execution Time = CCT x IC x CPI = IC x CPI / Clock Rate Say there are instructions of type A and B CPI = (IC_A x CPI_A + IC_B x CPI_B) / IC Improving CPI_A only has limitations Corollary: make the common case fast T improved Taffected improvement factor T unaffected 20
Pitfall: MIPS as a Performance Metric MIPS: Millions of Instructions Per Second Doesn t account for Differences in ISAs between computers Differences in complexity between instructions Instruction count MIPS Execution time 10 6 Instruction count Clock rate 6 Instruction n count CPI 6 10 CPI 10 Clock rate CPI varies between programs on a given CPU 21
Ways to Improve Simple Computer? Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, T c addresses Control Memory data What is in datapath? Width of datapath? The number and type of instructions? Memory organization? operation Datapath The BIG Picture 22
Computing Devices Then EDSAC, University of Cambridge, UK, 1949 23
Sensor Nets Computing Devices Now Media Players Cameras Games Set top top boxes Laptops Servers Smart phones Routers Robots Automobiles Supercomputers 24
The lowest layer of hierarchy
Moore s Law Predicts: 2X Transistors / chip every 2 years # of tra ansistors on an integra ated circui it (IC) Year Gordon Moore Intel Cofounder B.S. Cal 1950! In 1965, Gordon Moore predicted that the number of transistors per chip would double every 18 months (1.5 years) en.wikipedia.org/wiki/moore's_law 26
Technology Scaling Road Map (ITRS) Year 2004 2006 2008 2010 2012 Feature size (nm) 90 65 45 32 22 Intg. Capacity (BT) 2 4 6 16 32 Fun facts about 45nm transistors 30 million can fit on the head of a pin You could fit more than 2,000 across the width of a human hair If car prices had fallen at the same rate as the price of a single transistor has since 1968, a new car today would cost about 1 cent
Intel Core I7 2600K Sandy Bridge Launched at 2011 1.16 billion 216 mm^2 32 nm 64 bit 3.4GHz 4 cores 8M cache
Power Trends In CMOS IC technology Power The power wall Capacitive load Voltage We can t reduce voltage further We can t remove more heat 2 Frequency 29
Solution #1: Single Processor Performance Move to multi-processor RISC Frequency ~ V, Power ~ V, With ideal parallelism the power can UCSB be decreased ECE 154A Fall for 2013the same execution time
Manufacturing ICs Yield: proportion of working dies per wafer 31
AMD Opteron X2 Wafer X2: 300mm wafer, 117 chips, 90nm technology X4: 45nm technology 32
Integrated Circuit Cost Cost per die Cost per wafer Dies per wafer Yield Dies per wafer Wafer area Die area Yield (1 (Defects per 1 areadie area/2)) 2 Nonlinear relation to area and defect rate Wafer cost and area are fixed Defect rate determined by manufacturing process Die area determined by architecture and circuit design 33
Acknowledgments Some of the slides contain material developed and copyrighted by Krste Asanovic (UCB) and instructor material for the textbook 34