Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from David Culler, UC Berkeley CS252, Spr 2002 course slides, 2002 UC Berkeley Some material adapted from Hennessy & Patterson / 2003 Elsevier Science
High Level Language Program Compiler Assembly Language Program Assembler temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) Machine Language Program Control Signal Specification Machine Interpretation 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 ALUOP[0:3] <= InstReg[9:11] & MASK Slide: David Patterson, UCB
S/W and H/W consists of hierarchical layers of abstraction, each hides details of lower layers from the above layer The instruction set arch. abstracts the H/W and S/W interface and allows many implementation of varying cost and performance to run the same S/W Applications Operating System Compiler Firmware Instruction Set Architecture Instruction Set Processor I/O System Datapath & Control Digital Design Circuit Design Layout Figure: David Patterson, UCB
Programming languages might encourage architecture features to improve performance and code size, e.g. Fortran and Java Operating systems rely on the hardware to support essential features such as semaphores and memory management Technology always raises the bar for what could be done and changes design s focus Applications usually derive capabilities and constrains History provides the starting point, filters out mistakes Applications Operating Systems Technology Computer Architecture Programming Languages History Figure: David Patterson, UCB
Processor logic capacity: about 30% increase per year clock rate: about 20% increase per year Higher logic density gave room for instruction pipeline & cache Memory DRAM capacity: about 60% increase per year (4x / 3 years) Memory speed: about 10% increase per year Cost per bit: about 25% improvement per year Performance optimization no longer implies smaller programs Disk Capacity: about 60% increase per year Computers became lighter and more power efficient
100000000 10000000 Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million SPARC Ultra: 5.2 million R10000 Pentium R4400 T r a n s i s t o r s 1000000 100000 i80286 i80386 i80486 R3010 CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs 10000 i8086 SU MIPS i80x86 M68K MIPS Alpha i4004 1000 1965 1970 1975 1980 1985 1990 1995 2000 2005 Figure: David Patterson, UCB
350 Alpha 21264 exceeds 1200 300 Performance 250 200 150 100 RISC introduction RISC Intel x86 50 35%/yr 0 1982 1984 1986 1988 1990 1992 1994 Year Performance now improves ~ 50% per year (2x every 1.5 years) Slide: David Patterson, UCB
Relative Performance Architecture+ Technology Technology Relying on technology alone would have kept us 8 years behind
1 0 0, 0 0 0, 0 0 0 B i t - l e v e l p a r a l l e l i s m I n s t r u c t i o n - l e v e l T h r e a d - l e v e l Transistors 1 0, 0 0 0, 0 0 0 1, 0 0 0, 0 0 0 1 0 0, 0 0 0 i 8 0 2 8 6 R 10000 P e n t i u m i 80386 R 3000 R 2000 i 8 0 8 6 1 0, 0 0 0 i 8080 i 8008 i 4004 1, 000 1970 1975 1980 1985 1990 1995 2000 2005 Figure: David Culler, UCB
100,000 DRAM capacity 4x / 3 yrs; 16,000x in 20 yrs! Programming concern: cache not RAM size Processor organization becoming main focus for performance optimization HW designer focus not only performance but functional integration and power consumption (e.g. system on a chip) 10,000 4M 16M 64M Year Size Cyc Time 1980 64 K 250 s 1000 256K 1M 1983 256 K 220 s 1986 1 M 90 s 100 16K 10 1976 1978 64K 1980 1982 1984 1986 1988 1990 1992 1994 1996 1989 4 M 165 ns 1992 16 M 145 ns 1996 64 M 120 ns Year of introduction 2000 256 M 100 ns
Implementation Complexity Technology Trends Benchmarks Workloads Cost and performance are the main evaluation metrics for a design quality
Chips begins with silicon, found in sand Silicon does not conduct electricity well and thus called semiconductor A special chemical process can transform tiny areas of silicon to either: Excellent conductors of electricity (like copper) Excellent insulator from electricity (like glass) Areas that can conduct or insulate under a special condition (a switch) A transistor is simply an on/off switch controlled by electricity Integrated circuits combines dozens of hundreds of transistors in a chip
Technology innovations over time Year Technology used in computers Relative performance/unit cost 1951 Vacuum tube 1 1965 Transistor 35 1975 Integrated circuits 900 1995 Very large-scale integrated circuit 2,400,000 Advances of the IC technology affect H/W and S/W design philosophy
Silicon Ingot Slices Slicer Blank wafers 20-30 20 to processing 30 processing steps Bond die to package Package Tested dies Die Test Die tester Individual dies (one wafer) Dicer Patterned wafers Packaged dies Tested packaged dies Package Part tester Test Ship to customers Ship Silicon ingots: 6-12 inches in diameter and about 12-24 inches long Impurities in the wafer can lead to defective devices and reduces the yield
Dies_per_Wafer = (Wafer_diameter/2)2 Die_Area Wafer_Diameter 2 Die_Area Die_Yield = Wafer_Yield 1 + Defects_per_Unit_Area * Die_Area - Die_Cost = Wafer_Cost Dies_per_Wafer Die_Yield Die cost roughly goes with die area 4 IC_Cost = Die_Cost + Testing_Cost + Packing_Cost Final_Test_Yield