EE382 Processor Design Stanford University Winter Quarter 1998-1999 Instructor: Michael Flynn Teaching Assistant: Steve Chou Administrative Assistant: Susan Gere Lecture 1 - Introduction Slide 1 Class Objectives Learn theoretical analysis and limits develop intuition project long-term trends and bound design space more efficiently than simulation Learn models for VLSI component cost tradeoffs emphasis on microprocessor Learn modeling techniques for computer system performance emphasis on queuing Put it all together to balance system performance and cost Emphasis on multiprocessors, memory, and I/O Practical examples and design targets Slide 2 1
Course Prerequisites Computer Architecture and Organization (EE282) Instruction Set Architecture Machine Organization Basic Pipeline Design Cache Organization Branch Prediction Superscalar Execution In-Order Out-of-Order Statistics Basic probability distribution functions statistical measures Familiarity with stochastic processes and Markov models is helpful, but not required Slide 3 Course Information Access to the course web page is necessary http://www-leland.stanford.edu/class/ee382/ Course info, assignments, old exams, design tools,faqs,... Textbook and reference material Computer Architecture: Pipelined and Parallel Processor Design, Michael J. Flynn Problem set and design problem philosophy Learn by doing: maximize learning/effort Exam philosophy Extend what you have learned Open-book, not a speed or trick contest You are expected to give us feedback Questions, office hours, email, surveys Slide 4 2
Grading Problem Sets and Design Problems 40% 6 problem sets, 2 design problems Midterm 20% Final Exam 40% Covers entire course Scheduled March 15, 8:30-11:30AM Slide 5 Key Concepts of Abstraction Instruction Set Architecture (ISA) Functional interface for assembly-language programmer Examples: SGI MIPS, Sun SPARC, PowerPC, HPPA, DEC Alpha, Intel (x86), IBM System/390, IBM AS/400 Implementation (Machine Organization) Partitioning into units and logic design Examples Intel386 CPU, Intel486 CPU, Pentium Processor, Pentium Pro Processor Alpha 21064, 21164, 21264 Realization Physical fabrication and assembly Examples IBM 709( 54) built with vacuum tubes and 7090( 59) built with transistors Pentium Processor in 0.8 µm, 0.6µm, 0.35 µm BiCMOS/CMOS Slide 6 3
Instruction Set Architecture... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flow and controls, the logical design, and the physical implementation. Amdahl, Blaauw, and Brooks, 1964 Consists of: Organization of storage Data types Encodings and representations (instruction formats) Instruction (or Operation Code) Set Modes for addressing data Items and instructions Program visible exceptional conditions Specifies requirements for binary compatibility across implementations Slide 7 Instruction Set Types Load/Store (L/S) Only load and store instructions refer to memory no memory ALU ops used by several microprocessors Power PC, HP, DEC Alpha Register/Memory (R/M) ALU operations can have either source or destination in memory Used by mainframes and most microprocessors IBM System/370, Intel Architecture (x86), all x86 compatables Register or Memory (R+M) ALU operations can have any/all operands in memory Not used commonly now DEC Vax Slide 8 4
L/S ISA General Characteristics 32 GPR x 32b...more recently 64b instr size: 32b... more recently 64b instr types R 1 <- R 2 op R 3 for ALU ops R 1 <-> MEM [R B,D] for LD/ST Slide 9 R/M ISA General Characteristics 16 GPR x 32b instr size...16b, 32b, 48b instr types RR R 1 <- R 1 op R 2 RM R 1 <- R 1 op MEM [R B,R X,D] MM MEM 1 [R B,R X,D] <- MEM 1 [R B,R X,D] op MEM 2 [R B,R X,D] used for character, decimal ops only. Slide 10 5
ISA Syntax Terminology OP.type destination, source1,source2 eg ADD.F R1,R2,R3 puts result of floating pt. add in floating reg 1. OP without type implies integer type unless fp is clear from the context. destination is always first operand, so that store is ST MEM [R B,R X,D], R2 Slide 11 ISA Assumptions assume all i.s. have a PSW and condition codes...cc Branch is BC.CC target, target is either R or Mem. unconditional branch is BR, even though it s implemented with BC other branches BCT, BAL (branch and link) Slide 12 6
Moore s Law Transistors Per Die 10 8 Memory Microprocessor 10 7 10 6 10 5 10 4 10 3 10 2 1K 4004 4K 16K 64K 8080 8086 16M 4M 1M Pentium 256K Intel486 Processor Intel386 Processor 80286 Processor 10 1 1 1970 1975 1980 1985 1990 1995 2000 Moore s Law: No. Tx per chip increases 4X every 3 years CAGR = 60% Source: Intel Slide 13 Die Size Growth 1000 Die Size (mm 2 ) 100 8086 LOGIC 80386 68020 80286 68000 64K 256K Pentium (tm) 80486 68040 1M 4M 16M DRAM 10 1975 1980 1985 1990 1995 2000 Year Source: Intel Slide 14 7
Finer Lithography 10 Resolution Overlay Resolution (µ m) 1 0.1 1.0 0.8 0.5 0.35 CD Control Generation 0.25 0.01 '83 '86 '89 '92 '95 '98 '01 YEAR Source: Intel Slide 15 Limits on scaling As device sizes get smaller there are difficulties maintaining the rate of down sizing of feature sizes It currently appears that around 50nm several factors may limit scaling hot carrier effects time dependent dielectric breakdown gate tunneling current short channel effects and effect on V T Slide 16 8
Beyond CMOS MOSFETs If limits prove real; there are alternative technologies with system s implications low temperature CMOS sub threshold logic new gate oxide materials SOI Slide 17 Fabrication Facility Costs Dollars in Millions 10000 1000 100 10 1 1965 1970 1975 1980 1985 1990 1995 2000 Moore s Second Law: Fab Costs Grow 40% Per Year Source: VLSI Research, Inc. Slide 18 9
Microprocessor Business Model New generation of silicon technology every 2.5-3 years 30% reduction in linear dimensions => 50% in area 30% reduction in device delay => 50% increase in speed Used to reduce cost and improve performance on previous generation microprocessor Used to enable new generation of microprocessor with wider, more parallel, more functional machine organization Incremental changes between generations Business growth enables investment in new technology Driven by performance, new applications, and dancing bunny people Slide 19 Performance Growth 1200 1100 DEC Alpha 21264/600 1000 900 800 Performance 700 600 500 400 DEC Alpha 5/500 300 200 100 SUN-4/ 260 0 1987 Figure 1.20 from P&H MIPS M/120 1988 MIPS M2000 1989 IBM RS6000 1990 1991 DEC Alpha 4/266 IBM POWER 100 DEC AXP/500 HP 9000/750 1992 Year 1993 1994 1995 DEC Alpha 5/300 1996 Workstation Performance Improving 54% per year That s almost 1% per week! 1997 Slide 20 10
PC Shipment Growth Performance Growth and New Applications Drive Volume Source: Dataquest by A. Yu in IEEE Micro 12/96 Slide 21 System Price/Performance 1965 1977 1998 IBM System 360/50 0.15 MIPS 64 KB $1M $6.6M per MIPS DEC VAX11/780 1 MIPS 1 MB $200K $200K per MIPS Dell Dimension XPS-300 725 MIPS 64 MB $2412 (1/4/98) $3.33 per MIPS Photographs from Virtual Computing History Group Slide 22 11
Representative System L2 Cache L1 Icache L1 Dcache Pipelines Registers CPU CPU Chipset Memory I/O Bus(es) Slide 23 Summary Current architectures exploit parallelism for performance Multiple pipelines and caches Multiprocessors Technology costs are increasing rapidly High volume is critical to recover costs interface standards and evolution necessary Product success depends on cost-effective area allocation and partitioning Technology capacity and performance increasing rapidly Critical to evaluate broad space of design options at each generation Opportunity to learn from the past and to innovate Theoretical analysis and modeling combined with design targets are powerful tools for developing computer systems. This course will help prepare you to apply those for your future career in theory or practice. Slide 24 12
This Week Check access to the web page Make sure you can read and print First problem set will be posted by Friday Reading Scan Chapter 1 Sections 2.1,2.2 Room Change move to Gates B03 no festival Friday lecture Slide 25 13