1 INTRODUCTION (PART II) Maeng
Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for embedded systems Processor technology ogy IC technology Design technology From Chapter 1, Embedded system design: A unified HW/SW introduction, Frank Vahid/Tony Givargis.
3 Processor Technology
Processor technology 4 The architecture of the computation engine used to implement a system s desired functionality Processor does not have to be programmable Processor not equal to general-purpose processor Controller Datapath Controller Datapath Controller Datapath Control logic and State register Register file General IR PC ALU Control logic and State register IR PC Registers Custom ALU Control logic State register index total + Data memory Data memory Program memory Assembly code for: Data memory Program memory Assembly code for: total = 0 for i =1 to General-purpose ( software ) total = 0 for i =1 to Application-specific Single-purpose ( hardware )
Processor technology 5 Processors vary in their customization for the problem at hand Desired functionality total = 0 for i = 1toN loop total += M[i] end loop General-purpose processor Application-specific processor Single-purpose processor
General-purpose processors 6 Programmable device used in a variety of Controller applications Control Also known as microprocessor logic and State Features register Program memory General datapath with large register file and IR PC general ALU User benefits Program memory Low time-to-market t t and NRE costs Assembly High flexibility code for: Pentium the most well-known, but there are hundreds of others total = 0 for i =1 to Datapath Register file General ALU Data memory
Single-purpose processors 7 Digital it circuit it designed d to execute exactly one program a.k.a. coprocessor, accelerator or peripheral Features Contains only the components needed to execute a single program Controller Control logic State register Datapath index total + No program memory Benefits Data memory Fast Low power Small size
8 Application-specific processors Programmable processor optimized for a Controller Dt Datapathth particular class of applications having common Control Registers characteristics logic and Compromise between general-purpose and single-purpose processors Features Program memory Optimized datapath Special functional units Benefits Some flexibility, good performance, size and power State register IR PC Program memory Assembly code for: total = 0 for i =1 to Custom ALU Data memory TI s TMS320 digital signal processor
9 IC Technology
IC technology 10 The manner in which a digital (gate-level) implementation is mapped onto an IC IC: Integrated circuit, or chip IC technologies differ in their customization to a design IC s consist of numerous layers (perhaps 10 or more) IC technologies differ with respect to who builds each layer and when IC package IC source gate oxide channel drain Silicon substrate
IC technology 11 Three types of IC technologies Full-custom/VLSI Semi-custom ASIC (gate array and standard cell) PLD (Programmable Logic Device)
Outline 12 Anatomy of integrated circuits Full-Custom (VLSI) IC Technology Semi-Custom (ASIC) IC Technology Programmable Logic Device (PLD) IC Technology
MOS transistor 13 Source, Drain Gate Diffusion area where electrons can flow Can be connected to metal contacts (via s) Polysilicon area where control voltage is applied Oxide Si O 2 Insulator so the gate voltage can t leak
NMOS Transistor fabrication process(1) 14 NMOS Transistor(NMOS FET) S i O 2 Silicon dioxide(0.6 micron) is grown all over the surface P type silicon Ultra-violet light Mask P type silicon Photo-resist material S i O 2 Photolithography S i O 2 Silicon dioxide( 산화막 )(about 0.6 micron) P type silicon
NMOS Transistor fabrication process(2) 15 gate oxide(about 0.05 micron) is grown Polysilicon is deposited (Low Pressure Chemical Vapor Deposition) Diffuse A S (n type) Source, drain structures are formed n+ n+
NMOS Transistor fabrication process(3) 16 n+ n+ SiO2 is grown deposit metal(aluminium) to make contact points n+ n+ Length unit --- λ (micron) 2λ λ
Four views 17 Logic Transistor Layout Physical
NAND 18 Metal layers for routing (~10) A stick diagram form the basis for mask sets (layout)
IC manufacturing steps 19 Structural t design from functional descriptions to the optimized i circuits it at gate level Layout design from the gate level descriptions to the physical layout Tape out Send design to manufacturing Photolithography Drawing patterns by using photo-resist to form barriers for deposition Tape-out
Full Custom 20 Very Large Scale Integration ti (VLSI) Placement Place and orient transistors Routing Connect transistors Sizingi Make fat, fast wires or thin, slow wires May also need to size buffer Design Rules simple i l rules for correct circuit it function Metal/metal spacing, min poly width
Full Custom 21 Best size, power, performance Hand design Horrible time-to-market/flexibility/nre cost Reserve for the most important units in a processor ALU, Instruction fetch Physical design tools Less optimal, but faster Vdd
Semi-custom 22 Lower layers are fully or partially built Designers are left with routing of wires and maybe placing some blocks Benefits Good performance, good size, less NRE cost than a full-custom implementation (perhaps $10k to $100k) Drawbacks Still require weeks to months to develop
Semi-Custom 23 Gate Array Array of prefabricated gates place and route Higher density, faster time-to-market Does not integrate as well with full-custom Standard Cell A library of pre-designed cell Place and route Lower density, higher complexity Integrate great with full-custom
ASea-of-gates gate array 24 f 1 x 1 x 2 x 3 The logic function f 1 = x 2 x 3 +x 1 x 3 in the gate array
A section of two rows in a standard cell 25 x 1 f 2 x 2 x 3 f 1 f 1 = x 1 x 2 +x 1 x 3 +x 1 x 2 x 3 f2 = x 1x 2+x 1x 2x 3+x 1x 3
Semi-Custom 26 Most popular design style Jack of all trade Good Power, time-to-market, performance, NRE cost, per-unit cost, area Master of none Standard-cell integrated with full custom for critical regions of design
Programmable Logic Devices 27 Programmable Logic Device Programmable Logic Array, Programmable Array Logic, Field Programmable Gate Array All layers already exist Designers can purchase an IC To implement desired functionality Connections on the IC are either created or destroyed to implement Benefits Very low NRE costs Great Time to Market Drawbacks High unit cost, bad for large volume Power Except special PLA Slower 1600 usable gate, 7.5 ns $7 list price
Programmable Logic Array (PLA) 28 Pre-fabricated building block of many AND/OR gates personalized by making or breaking connections among the gates Programmable array block diagram for sum of products form x 1 x 2 x n Input buffers and inverters x 1 x 1 x n x n AND plane P 1 P k OR plane f 1 f m
Gate-level Diagram of a PLA 29 x 1 x 2 x 3 Programmable connections P 1 OR plane P2 P 3 P 4 Product terms AND plane f 1 f 2 f 1 and f 2?? Sum of Product terms
Programmable Array Logic (PAL) 30 x 1 x 2 x 3 Programmable P 1 Fixed Hardwired P 2 f 1 P 3 P 4 f 2 What is the difference? AND plane
Field-Programmable Gate Arrays (FPGAs) 31 FPGAs are programmable devices that support relatively large circuits Macrocell of PLDs : 20 gates PAL : 8 macrocell (160 gates) CPLD : 500 macrocell (10,000 gates) Altera 40nm Stratix IV in 2008 Over 2.5 billion TRs, 8.1 M ASIC gate equivalent Different from CPLDs since they do not contain AND and OR planes Provide logic blocks for implementing the logic functions Three main types of resources Logic blocks I/O blocks Interconnection wires
Structure of an FPGA 32
Logic Blocks 33 Each block has a small number of inputs and one output Usually use lookup tables (LUT) Contains storage cells used to implement a small logic function Each storage cell can hold a 0 or a 1 Stored value is produced as the output of the storage cell
Atwo-input lookup table 34 x 1 0/1 x 2 0/1 f x 1 x 2 f 1 0/1 0 0 1 0/1 0 1 0 1 0 0 1 1 1 (a) Circuit for a two-input LUT (b) f 1 = x 1 x 2 + x 1 x 2 x 1 If x 1 =0 1 0 0 1 f 1 0 x 2 If x 2 =1 2 (c) Storage cell contents in the LUT
A three-input LUT 35 x 1 x 2 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 f x 3
Inclusion of a flip-flop flop in an FPGA logic block 36 Select Flip-flop In 1 D Q Out In 2 In 3 LUT Clock
A section of a programmed FPGA 37 x 3 f f=x 1 x 2 +x 2 x 3 x 1 x 2 x 1 0 x 2 0 0 f 1 0 1 f 0 2 x 2 x 1 3 0 f 1 f 2 0 1 1 1 f
Xilinx FPGA 38
Configurable Logic Block (CLB) 39
I/O Block 40
41 Cyclone II FPGA
Altera Cyclone II Device 42 Features 90-nm low-k dielectric process High density architecture with 4,608 to 68,416 LEs Up to 1.1 Mbits of RAM Variable port configurations (x1, x2, x4, x8, x16, x32, and x36) True dual-port operations Up to 260MHz operation Embedded Multipliers Advanced I/O support Flexible clock management circuitry Hierarchical clock network for up to 402.5 MHz Up to 4 PLLs Up to 16 global clock lines
Cyclone II FPGA Family Features 43-6 is the fastest
Cyclone II Architecture 44 Two-dimensional row- and column-based architecture Logic Array Logic Array Block (LAB): 16 logic elements (LEs) LE : small unit of logic providing efficient programming of user logic functions 4,608 LEs (288 LABs) to 68,416 LEs (4,276 LABs) Input/Output element (IOE) Global network and up to 4 PLLs M4K memory blocks Embedded multiplier blocks Advanced I/O pins
45 Cyclone II EP2C20 Block Diagram
Logic Elements 46 Af four-input look-up table (LUT) Can implement any function of four variables A programmable register A carry chain connection A register chain connection The ability to drive all types of interconnections Local, row, column, register chain, and direct link interconnects Support for register packing Support for register feedback
Cyclone II LE diagram 47
LE Operating Modes 48 Normal Mode
LE Operating Modes, cont d 49 Arithmetic Mode
Logic Array Blocks 50 LAB 16 LEs LAB control signals LE carry chains Register chains Local Interconnect Transfer signals between LEs in the same LAB
LAB Structure 51
LAB interconnects 52 Each LE can drive 48 LEs through fast local and direct link interconnects
Multi-track track Interconnect 53 Connections between LEs, M4K memory blocks, embedded multipliers, and I/O pins Consists of row and column interconnects that span fixed distances Row direct link, R4, and R24 Column register chain, C4, and C16
Row interconnects 54
55 Column interconnects
Global Clock Networks & PLLs 56
I/O Structure and Features 57 Differential and single-ended I/O standards 3.3-V, 64- and 32-bit, 66- and 33-MHz PCI compliance JTAG boundary-scan test (BST) support
IOE Structure 58
59 Independence of processor and IC technologies Basic tradeoff General vs. custom With respect to processor technology or IC technology The two technologies are independent General, providing improved: Flexibility Maintainability NRE cost Time- to-prototype Time-to-market Cost (low volume) General- Singlepurpose ASP purpose processor processor Customized, providing improved: Power efficiency Performance Size Cost (high volume) PLD Semi-custom Full-custom