ECE 475/CS 416 Computer Architecture - Introduction Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Today s Agenda Question 1: What is this course about? What will I learn from it? Question 2: How will the course be run? What do I need to know? 1
Title = Computer Architecture What is Computer Architecture? Old definition (80s)= Today s architects must do more; implementation hurdles are more challenging than those in instruction set design Role of the Computer Architect To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost. Architect must be aware of application characteristics and benchmarks measures of cost and performance technology trends software and hardware interaction 2
Performance? Desktop computers Web servers Largest market in dollar terms Amazon.com had $1.35MM revenue / hour (2005) Embedded / mobile computers Single-Processor Performance From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th Edition, 2006 3
Technology If [ ] history [ ] teaches us anything, it is that man, in his quest for knowledge and progress, is determined and cannot be deterred. John F. Kennedy (1962) Amazing yearly advances ~60% more devices per chip (doubles every 18 months) ~15% faster devices (doubles every 5 years) disks increase ~60% in capacity circuit boards increase ~5% in wire density Faster devices and advances in circuit design improve performance Clock Frequency Growth Rate 30% per year Source: Intel 4
Architecture Contribution Part I. Single-Core Processors What kinds of architectural innovations enabled the uni-processor performance improvement over the past 20 years? Same program (binary) Runs 1.58x faster each year!! 5
Moore s Law: 2X transistors / year Cramming More Components onto Integrated Circuits, Gordon Moore, Electronics, 1965 # of transistors / cost-effective integrated circuit double every N months (12 N 24) Source UCB EECS 252 notes CPUs: Archaic vs. Modern 1982 Intel 80286 2001 Intel Pentium 4 12.5 MHz 1500 MHz 2 MIPS (peak) 4500 MIPS (peak) Latency 320 ns Latency 15 ns 134,000 xtors, 47 mm2 42,000,000 xtors, 217 mm2 16-bit data bus, 68 pins 64-bit data bus, 423 pins Microcode interpreter, 3-way superscalar, separate FPU chip (no caches) (120X) (2250X) (20X) (310X) Dynamic translate to RISC, Superpipelined (22 stage), Out-of-Order execution On-chip 8KB Data caches, 96KB Instr. Trace cache, 256KB L2 cache Source UCB EECS 252 notes 6
Memory: Archaic vs. Modern 1980 DRAM (async) 0.06 Mbits/chip 64,000 xtors, 35 mm2 16-bit data bus per module, 16 pins/chip 13 Mbytes/sec Latency: 225 ns (no block transfer) 2000 DDR52 SDRAM (clocked) 256.00 Mbits/chip (4000X) 256,000,000 xtors, 204 mm2 64-bit data bus per DIMM, 66 pins/chip (4X) 1600 Mbytes/sec (120X) Latency: 52 ns (4X) Block transfers (page mode) Source UCB EECS 252 notes Disk: Archaic vs. Modern CDC Wren I, 1983 3600 RPM 0.03 GBytes capacity Tracks/Inch: 800 Bits/Inch: 9550 Three 5.25 platters Bandwidth: 0.6 MBytes/sec Latency: 48.3 ms Cache: none Seagate 373453, 2003 15000 RPM 73.4 GBytes (4X) (2500X) Tracks/Inch: 64000 (80X) Bits/Inch: 533,000 (60X) Four 2.5 platters (in 3.5 form factor) Bandwidth: 86 MBytes/sec (140X) Latency: 5.7 ms Cache: 8 MBytes (8X) Source UCB EECS 252 notes 7
LANs: Archaic vs. Modern Ethernet 802.3, 1978 Bandwidth: 10 Mbits/s Latency: 3000 msec Shared media Coaxial cable Ethernet 802.3ae, 2003 Bandwidth: 10,000 Mbits/s (1000X) Latency: 190 msec (15X) Switched media Category 5 copper wire Coaxial Cable: Plastic Covering Braided outer conductor Insulator Copper core "Cat 5" is 4 twisted pairs in bundle Twisted Pair: Copper, 1mm thick, twisted to avoid antenna effect Source UCB EECS 252 notes How Did We Get Performance? Trade-off transistors and bandwidth for latency Take advantage of parallelism. Principle of locality 8
Pipelined Instruction Execution Time (clock cycles) I n s t r. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Ifetch Reg Ifetch ALU Reg DMem ALU Reg DMem Reg O r d e r Ifetch Reg Ifetch ALU Reg DMem ALU Reg DMem Reg Why Slowdown? From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th Edition, 2006 9
Architecture at a Crossroads How many cores does your computer have? Uniprocessor performance now 2x / 5(?) years Power wall : power consumption limits the transistors that can be turned on ILP wall : law of diminishing returns on more HW for ILP Memory wall : off-chip memory accesses take hundreds of CPU cycles Change in chip design: multiple cores : Thread Level Parallelism (TLP) All microprocessor companies switch to multiprocessors (AMD, Intel, IBM, Sun; all new Apples 2 CPUs) We are dedicating all of our future product development to multicore designs. This is a sea change in computing Paul Otellini, President, Intel (2004) A Peek at the Syllabus Cost and performance In-order processors Memory hierarchy Out-of-order processors Branch prediction Speculative execution Superscalar processors VLIW, Vector Simultaneous multithreading (a.k.a. Hyperthreading ) Multicore hardware, parallel processing Virtual machines, I/O, networks 10
Labs Verilog design projects incredibly useful language to know industry loves Verilog projects done in teams of two Expand on a basic MIPS R3000 processor Lab 0: Welcome to Verilog (not graded) Lab 1: Get used to processor model, fix bugs, add instructions Lab 2: Pipeline model, add forwarding logic Lab 3: Add caches and cache controller Lab 4 Final Lab (next) Final Lab Superscalar (dual-issue) pipeline Design a processor extension of your choosing branch prediction dynamic scheduling hardware prefetchers speculative loads multiple-level caches instruction set extensions [your idea here] Project report required 11
What You Will Learn How to evaluate architectural decisions? You will need to choose among different designs Architectural techniques in modern microprocessors Go from 1986 (314) to 2002 Apply to your down designs Why processors are moving towards multi-cores Problems and solutions in multi-core systems What Do I Need to Know? You are expected to know MIPS ISA and Verilog alternatively, you are expected to learn them quickly and on your own What about C/C++? as a computer engineer you should know C we use small C programs to test Verilogdesigns What about Unix/Linux? basic Unix skills you should have or acquire: elementary tasks: logging in, changing password, manipulating files, etc. familiarity with a Unix text editor of your choosing (e.g., vi, emacs) 12
ECE 475/CS 416 Requirements Prerequisites ENGRD 230 or equivalent, and ECE 314 or equivalent logic design, FSM design basic computer organization Assets passion for computer hardware prior exposure to Unix and/or Verilog ability to work nonstop for extended periods of time You should not take this course if any of these apply you do not meet the prerequisites your schedule and/or lifestyle won t fit a(nother) high-workload course Staff Instructor: Edward Suh, 338 Rhodes, office hours TuTh 11am-Noon Teaching assistants: (office hourstuwth7-10pm, PH329) Jiho Choi Mark Cianchetti Richard Hough Yuan Ning KK Yu If you must, use the staff s email: ece475@csl.cornell.edu but we may post your question (and our answer) on Blackboard 13
Computing Resources Blackboard is used for course communication Announcements (e.g. errata, date changes, etc.) Handouts and lecture notes : print out before coming to lectures Questions / Answers http://blackboard.cornell.edu/ All assignments handled through CMS http://cscms.cit.cornell.edu/ ECE Computing Labs for lab assignments Course Components Lectures: TuTh 2:55-4:10, PH219 Download notes from Blackboard Course Documents 5 min break in the middle 4 Homeworks Individual assignment 4 Labs Group of (one or) two 2 Exams Prelim & Final 14
Grading Grade distribution: Homework 15% Individual Midterm 15% (Oct. 16) Final 25% (TBA) Verilog projects 40% (5% + 5% + 10% + 20%) Group of one or two Class participation 5% / Half grade at my discretion Late policy: 1min late = not submitted = zero (I m not kidding) but you have onelifeline on one assignment 24 hours all parties involved must have lifeline available A Few Rules When in trouble with the material Use Blackboard! It s likely your question has been asked and answered do not send me questions by email Observe office hours we are all very busy do not randomly drop by Ask in class! good citizen s hallmark: in-class participation I have a keen eye and no tolerance for cheating disciplinary hearings are no fun check Cornell s Code of Academic Integrity http://cuinfo.cornell.edu/academic/aic.html 15
Textbook Computer Architecture: A Quantitative Approach, 4 th Ed. by John L. Hennessy and David A. Patterson Morgan Kaufmann Publishers FAQ I have a question about ECE 475/CS 416 office hours: TuTh 11-Noon, 338 Rhodes Hall I have a question about conducting research in your group office hours: TuTh 11-Noon, 338 Rhodes Hall What courses complement ECE 475/CS 416? ECE 474 (VLSI), CS 412/413 (Compilers), CS 414 (OS) 16