Do we need more chips (ASICs)?

Similar documents
6.375 Complex Digital System Spring 2006

6.375: Complex Digital Systems. Something new and exciting as well as useful. Fun: Design systems that you never thought you could design in a course

6.375: Complex Digital Systems. February 3, L01-1. Something new and exciting as well as useful

6.375: Complex Digital Systems. Something new and exciting as well as useful

Why take CSE

6.884 Complex Digital Systems Spring 2005

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

CS310 Embedded Computer Systems. Maeng

FABRICATION TECHNOLOGIES

Spiral 2-8. Cell Layout

Introduction to ICs and Transistor Fundamentals

ECE520 VLSI Design. Lecture 1: Introduction to VLSI Technology. Payman Zarkesh-Ha

6.884 Complex Digital Systems Spring 2005

Design Methodologies and Tools. Full-Custom Design

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Design Methodologies. Full-Custom Design

A VARIETY OF ICS ARE POSSIBLE DESIGNING FPGAS & ASICS. APPLICATIONS MAY USE STANDARD ICs or FPGAs/ASICs FAB FOUNDRIES COST BILLIONS

COE 561 Digital System Design & Synthesis Introduction

More Course Information

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology

FPGA Programming Technology

ECE 154A. Architecture. Dmitri Strukov

ADVANCED FPGA BASED SYSTEM DESIGN. Dr. Tayab Din Memon Lecture 3 & 4

ECE 261: Full Custom VLSI Design

ELCT 501: Digital System Design

Chapter 1 Overview of Digital Systems Design

Design Methodologies

Chapter 5: ASICs Vs. PLDs

Introduction to Microprocessor

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

Digital Integrated CircuitDesign

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

L2: Design Representations

CMPEN 411 VLSI Digital Circuits. Lecture 01: Introduction

Digital Integrated Circuits (83-313) Lecture 2: Technology and Standard Cell Layout

An Introduction to Programmable Logic

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

FPGA Based Digital Design Using Verilog HDL

UNIT 4 INTEGRATED CIRCUIT DESIGN METHODOLOGY E5163

Cell Libraries and Design Hierarchy. Instructor S. Demlow ECE 410 February 1, 2012

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

INTRODUCTION TO FPGA ARCHITECTURE

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Microprocessor Systems

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE5780 Advanced VLSI CAD

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

Memory in Digital Systems

discrete logic do not

Memory in Digital Systems

Thomas Polzer Institut für Technische Informatik

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

Design Metrics. A couple of especially important metrics: Time to market Total cost (NRE + unit cost) Performance (speed latency and throughput)

VLSI Design Automation

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information

The QR code here provides a shortcut to go to the course webpage.

An Overview of Standard Cell Based Digital VLSI Design

What is this class all about?

ECE 459/559 Secure & Trustworthy Computer Hardware Design

Memory in Digital Systems

3. Implementing Logic in CMOS

Functional Programming in Hardware Design

CS61C Machine Structures. Lecture 1 Introduction. 8/27/2006 John Wawrzynek (Warzneck)

Problem 2 If the cost of a 12 inch wafer (actually 300mm) is $3500, what is the cost/die for the circuit in Problem 1.

Computer Architecture

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction

CS61C Machine Structures. Lecture 1 Introduction. 8/25/2003 Brian Harvey. John Wawrzynek (Warznek) www-inst.eecs.berkeley.

EEM 486: Computer Architecture

PLAs & PALs. Programmable Logic Devices (PLDs) PLAs and PALs

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

CS250 VLSI Systems Design Lecture 9: Memory

VLSI Design Automation

CMPE 415 Programmable Logic Devices FPGA Technology I

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 10: Three-Dimensional (3D) Integration

Lecture #1: Introduction

What is this class all about?

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

Design Verification Lecture 01

CMPEN411 Memory Chip Design Project Report

FPGA BASED SYSTEM DESIGN. Dr. Tayab Din Memon Lecture 1 & 2

Computer Architecture!

MEMORIES. Memories. EEC 116, B. Baas 3

Outline Marquette University

Computer Architecture!

Physical Implementation

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Lecture 4: MIPS Processor Example

E 4.20 Introduction to Digital Integrated Circuit Design

Announcements. Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project

CS 250 VLSI Design Lecture 11 Design Verification

What is this class all about?

Introduction. Summary. Why computer architecture? Technology trends Cost issues

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Lecture #1. Teach you how to make sure your circuit works Do you want your transistor to be the one that screws up a 1 billion transistor chip?

The Microprocessor as a Microcosm:

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Programmable Logic Devices Introduction CMPE 415. Programmable Logic Devices

Transcription:

6.375 Complex Digital System Spring 2007 Lecturers: Arvind & Krste Asanović TAs: Myron King & Ajay Joshi Assistant: Sally Lee L01-1 Do we need more chips (ASICs)? ASIC=Application-Specific Integrated Circuit L01-2 1

Wide Variety of Products Rely on ASICs Sensor Nets Cameras Media Players Smart phones Set-top boxes QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Laptops Servers Routers Games Robots Automobiles Supercomputers L01-3 What s required? ICs with dramatically higher performance, optimized for applications Source: http://www.intel.com/technology/silicon/mooreslaw/index.htm and at a size and power to deliver mobility cost to address mass consumer markets L01-4 2

Let s take a look at current CMOS technology... L01-5 Chip = Transistors + Wires Vias connect one layer to another Wiring added in layers on top of wafer Glass on top seals chip Thicker wires on higher layers used for power and ground, and long range signals Thinner wires on lower layers used for dense local wiring Transistors fabricated first on original surface of wafer Bulk of wafer Cross-section through IBM 90nm process, 10 metal layers [ISSCC 2004] L01-6 3

FET = Field-Effect Transistor A four terminal device (gate, source, drain, bulk) Surface of wafer Source diffusion Reverse side of wafer gate E h E v bulk inversion happens here Drain diffusion Inversion: A vertical field creates a channel between the source and drain. Conduction: If a channel exists, a horizontal field causes a drift current from the drain to the source. L01-7 Simplified FET Model Binary logic values represented by voltages: High = Supply Voltage, Low = Ground Voltage G S D PFET connects S and D when G= low =0V Supply Voltage = V DD G PFET only good at pulling up G D S NFET connects D and S when G= high =V DD G NFET only good at pulling down Ground = GND = 0V L01-8 4

NAND Gate A B (A.B) B (A.B) A When both A and B are high, output is low When either A or B is low, output is high L01-9 NAND Gate Layout Parallel PMOS Transistors P-Diffusion V DD (in N-well) B (A.B) A Poly wire connects PMOS & NMOS gates Metal 1-Diffusion Contact GND N-Diffusion Series NMOS Transistors L01-10 A B (A.B) Output on Metal-1 5

Exponential growth: Moore s Law Intel 8080A, 1974 3Mhz, 6K transistors, 6u Intel 8086, 1978, 33mm 2 10Mhz, 29K transistors, 3u Intel 80286, 1982, 47mm 2 12.5Mhz, 134K transistors, 1.5u Intel 386DX, 1985, 43mm 2 33Mhz, 275K transistors, 1u Intel 486, 1989, 81mm 2 50Mhz, 1.2M transistors,.8u Intel Pentium, 1993/1994/1996, 295/147/90mm 2 66Mhz, 3.1M transistors,.8u/.6u/.35u Intel Pentium II, 1997, 203mm 2 /104mm 2 300/333Mhz, 7.5M transistors,.35u/.25u Shown Shown with with approximate approximate relative relative sizes sizes http://www.intel.com/intel/intelis/museum/exhibit/hist_micro/hof/hof_main.htm L01-11 Intel Penryn (2007) Dual core Quad-issue out-of-order superscalar processors 6MB shared L2 cache 45nm technology Metal gate transistors High-K gate dielectric 410 Million transistors 3+? GHz clock frequency Could fit over 500 486 processors on same size die. L01-12 6

.. But Design Effort Growing Nvidia Graphics Processing Units Design Effort per Chip 120 100 80 60 40 Transistors (M) Relative staffing on back-end 9x growth in back-end staff 5x growth in front-end staff 20 0 Relative staffing on front-end 1993 1995 1996 1997 1998 1999 2000 2001 2001 2002 2002 Front-end is designing the logic (RTL) Back-end is fitting all the gates and wires on the chip; meeting timing specifications; wiring up power, ground, and clock L01-13 Design Cost Impacts Chip Cost 90nm ASIC cost breakdown, $30M total (Altera study): 59% chip design (architecture, logic & I/O design, product & test engineering) 30% software and applications development 11% prototyping (masks, wafers, boards) If we sell 100,000 units, Non-Recurring Engineering (NRE) costs add $30M/100K = $300 per chip! Example above is for design using automated tools Similar to what we ll be using in 6.375 Hand-crafted IBM-Sony-Toshiba Cell microprocessor achieves 4GHz in 90nm, but development cost was >$400M L01-14 7

Topics to address in 6.375 How can we design complex billion transistor ASICs with reasonable effort? How good are our designs? Performance, area, power L01-15 Designer s Dilemma ASIC Complexity 2000: 1M+ logic gates 2005: 10M+ logic gates 2010: 100M+ logic gates Constants 10-30 person design team size 18 month design schedule Design flow -- unchanged for 10+ years! Designer must take shortcuts Conservative design No time for exploration Educated guess & code Gates are free mentality [ICCAD 04] Static (2) What happens when a designer must implement a 1M gate block? Sub-optimal implementations! Alternatives? LPM Pipeline Area example: Speed Pipeline (gates) (ns) Which is best? Static Linear Circular L01-16 8,898 15,910 8,170 2,391 3.60 4.70 3.67 3.32 Memory Util (%) 63.5 99.9 99.9 63.5 8

6.375 Course Philosophy Effective abstractions to reduce design effort High-level design language rather than logic gates Control specified with Guarded Atomic Actions rather than with finite state machines Guarded module interfaces automatically ensure correctness of composition of existing modules Design discipline to avoid bad design points Decoupled units rather than tightly coupled state machines Design space exploration to find good designs Architecture choice has largest impact on solution quality A unified view of languages, disciplines and tools that supports rapid design space exploration to find best area, power, and performance point with reduced design effort L01-17 6.375 Objectives By end of term, you should be able to: Decompose system requirements into a hierarchy of sub-units that are easy to specify, implement, and verify, and which can be reused Develop efficient verification and test plans Select appropriate microarchitectures for a unit and perform microarchitectural exploration to meet price, performance, and power goals Use industry-standard tool flows Complete a working million gate chip design! Make millions $$$ at a new chip startup (Don t forget your alma mater!) L01-18 9

6.375 Prerequisites You must be familiar with undergraduate (6.004) logic design : Combinational and sequential logic design Dynamic Discipline (clocking, setup and hold) Finite State Machine design Binary arithmetic and other encodings Simple pipelining ROMs/RAMs/register files Additional circuit knowledge (6.002, 6.374) useful but not vital Architecture knowledge (6.823) helpful for projects L01-19 6.375 Structure First half of term (before Spring Break) Lecture or tutorial MWF, 2:30pm to 4:00pm in 32-124 Three labs (on Athena, lab machines in 38-301) Form project teams (2-3 students); prepare project proposal (watch website for project ideas) Closed-book 90 minute quiz (Friday before Spring Break) Second half of term (after Spring Break) Weekly project milestones, with 1-2 page report Weekly project meeting with the instructors and TAs Final project presentations in last week of classes Final project report (~15-20 pages) due Thursday May 17 (no extensions) L01-20 10

6.375 Grade Breakdown Three labs 30% Quiz 20% Five project milestones 25% Final project report 25% (including presentation) L01-21 6.375 Collaboration Policy We strongly encourage students to collaborate on understanding the course material, BUT: Each student must turn in individual solutions to labs Students must not discuss quiz contents with students who have not yet taken the quiz If you re inadvertently exposed to quiz contents before the exam, by whatever means, you must immediately inform the instructors or TA L01-22 11

ASIC Design Styles L01-23 Hardware Design Abstraction Levels Application Algorithm Unit-Transaction Level (UTL) Model Guarded Atomic Actions (Bluespec) Register-Transfer Level (Verilog RTL) Gates Circuits Devices Physics L01-24 12

ASIC Design Styles Full-Custom (every transistor hand-drawn) Best possible performance: as used by Intel μps Semi-Custom (Some custom + some cell-based design) Reduced design effort: AMD μps plus recent Intel μps Cell-Based ASICs (Only use cells in standard library) High-volume, moderate performance: Graphics chips, network chips, cellphone chips This is what we ll use in 6.375 Mask-Programmed Gate Arrays/Structured ASICs Medium-volume, moderate performance applications Field-Programmable Gate Arrays Low-volume, low-moderate performance applications, and prototyping Comparing styles: how many design-specific mask layers per ASIC? how much freedom to develop own circuits? what design methods and tools are needed? L01-25 Custom and Semi-Custom Usually, in-house design team develops own libraries of cells for commonly used components: memories register files datapath cells random logic cells repeaters clock buffers I/O pads In extreme cases, every transistor instance can be individually sized ($$$$) approach used in Alpha microprocessor development The trend is towards greater use of semi-custom design style use a few great circuit designers to create cells redirect most effort at microarchitecture and cell placement to keep wires short L01-26 13

Custom Designer works with Low-Level Design Rules Surround rule Exclusion rule Extension rules Width rules Spacing rules An abstraction of the fabrication process that specify various geometric constraints on how different masks can be drawn Design rules can be absolute measurements (e.g. in nm) or scaled to an abstract unit, the lambda. The value of lambda depends on the manufacturing process finally used. L01-27 Standard Cell ASICs aka Cell-Based ICs (CBICs) Fixed library of cells + memory generators, often provided by fabrication foundry or third-party library providers Cells can be synthesized from HDL, or entered in schematics Cells placed and routed automatically Requires complete set of custom masks for each design Currently most popular hard-wired ASIC type (6.375 will use this) Cells arranged in rows Mem 1 Mem 2 Generated memory arrays L01-28 14

Standard Cell Library Components Clock Rail (not typical) Clock Rail Well Contact under Power Rail Power Rails in M1 VDD Rail Cell I/O on M2 NAND2 GND Rail Flip-flop Cells have standard height but vary in width Designed to connect power, ground, and wells by abutment L01-29 6.375 Standard Cell Design Flow Bluespec SystemVerilog source Bluespec Compiler Blueview C Verilog 95 RTL Bluespec C sim Cycle Accurate Verilog sim RTL synthesis VCD output gates Legend files Bluespec tools Debussy Visualization 3 February rd party tools 7, 2007 http://csg.csail.mit.edu/6.375/ L01-30 15

Standard Cell Design Examples Channel routing for 1.0mm 2-metal stdcells Over cell routing for 0.18mm 6-metal stdcells L01-31 Mask-Programmed Gate Arrays GND Can cut mask costs by prefabricating arrays of fixed size transistors on wafers Only customize metal layer for each design NMOS PMOS VDD PMOS Two kinds: Channeled Gate Arrays Leave space between rows of transistors for routing Sea-of-Gates Route over the top of unused transistors NMOS GND [ OCEAN Sea-of-Gates Base Pattern ] L01-32 16

Gate Array Personalization Isolating transistors by shared GND contact Isolating transistors with off gate GND L01-33 Gate Array Pros and Cons Cheaper and quicker since less masks to make Can stockpile wafers with diffusion and poly finished Memory inefficient when made from gate array Embedded gate arrays add multiple fixed memory blocks to improve density (=>Structured ASICs) Cell-based array designed to provide efficient memory cell (6 transistors in basic cell) Logic slow and big due to fixed transistors and wiring overhead Advanced cell-based arrays hardwire logic functions (NANDs/NORs/LUTs) which are personalized with metal L01-34 17

Field-Programmable Gate Arrays (FPGAs) Arrays mass-produced and programmed by customer after fabrication Can be programmed by blowing fuses, loading SRAM bits, or loading FLASH memory Each cell in array contains a programmable logic function Array has programmable interconnect between logic functions Overhead of programmability makes arrays expensive and slow but startup costs are low, so much cheaper than ASIC for small volumes L01-35 Xilinx Configurable Logic Block L01-36 18

FPGA Pros and Cons Advantages Dramatically reduce the cost of errors Remove the reticle costs from each design Disadvantages (as compared to an ASIC) [Kuon & Rose, FPGA2006] Switching power around ~12X worse Performance up 3-4X worse Area 20-40X greater Still requires tremendous design effort at RTL level L01-37 19