COS VLSI Design IPS Processor Example Outline Design Partitioning IPS Processor Example Architecture icroarchitecture Logic Design Circuit Design Physical Design Fabrication, Packaging, Testing Slide 2
Review Sketch a stick diagram for a 4-input NOR gate Slide 3 Review Sketch a stick diagram for a 4-input NOR gate VDD A B C D Y GND Slide 4 2
Coping with Complexity How to design System-on-Chip? any millions (soon billions!) of transistors Tens to hundreds of engineers Structured Design Design Partitioning Slide 5 Structured Design Hierarchy: Divide and Conquer Recursively system into modules Regularity Reuse modules wherever possible Ex: Standard cell library odularity: well-formed interfaces Allows modules to be treated as black boxes Locality Physical and temporal Slide 6 3
Design Partitioning Architecture: User s perspective, what does it do? set, registers IPS, x86, Alpha, PIC, AR, icroarchitecture Single cycle, multi-cycle, pipelined, superscalar? Logic: how are functional blocks constructed Ripple carry, carry lookahead, carry select adders Circuit: how are transistors used Complementary COS, pass transistors, domino Physical: chip layout Datapaths, memories, random logic Slide 7 Gajski Y-Chart Slide 8 4
IPS Architecture Example: subset of IPS processor architecture Drawn from Patterson & Hennessy s textbook IPS is a 32-bit architecture with 32 registers Consider 8-bit subset using 8-bit datapath Only implement 8 registers ($ - $7) $ hardwired to 8-bit program counter Illustrate the key concepts in VLSI design Slide 9 IPS icroarchitecture ulticycle μarchitecture from Patterson & Hennessy PCEn PCWriteCond PCWrite IorD Outputs emread emwrite Control emtoreg PCSource ALUOp ALUSrcB ALUSrcA RegWrite PC ux Address Write data emory emdata [3:26] [25 : 2] [2 : 6] [5 : ] register [7: ] IRWrite[3:] [5: ] Op [5: ] u x u x RegDst [5: ] 6 8 Shift left 2 Read register Read register 2 Write register Write data Registers Read data Read data 2 A B u 2 x 3 ux Zero ALU ALU result Jump address ALUOut 2 u x emory data register ALU control ALUControl [5 : ] Slide 5
ulticycle Controller fetch emread ALUSrcA = IorD = IRWrite3 ALUSrcB = ALUOp = PCWrite PCSource = emread ALUSrcA = IorD = IRWrite2 ALUSrcB = ALUOp = PCWrite PCSource = 2 emread ALUSrcA = IorD = IRWrite ALUSrcB = ALUOp = PCWrite PCSource = 3 emread ALUSrcA = IorD = IRWrite ALUSrcB = ALUOp = PCWrite PCSource = decode/ register fetch 4 ALUSrcA = ALUSrcB = ALUOp = Reset emory address computation 5 ALUSrcA = ALUSrcB = ALUOp = (Op = 'L B ') or (Op = 'S B ') Execution 9 ALUSrcA = ALUSrcB = ALUOp= (Op = R-type) Branch Jump completion completion 2 ALUSrcA = ALUSrcB = PCWrite ALUOp = PCSource = PCWriteCond PCSource = (Op = 'BEQ') (Op = 'J') 6 (Op = 'LB') (Op = 'S B') emory access 8 emory access R-type completion emread IorD = emwrite IorD = RegDst = RegWrite emtoreg = 7 Write-back step RegDst= RegWrite emtoreg = Slide Logic Design Start at top level Hierarchically decompose IPS into units Top-level interface crystal oscillator 2-phase clock generator ph ph2 reset memread memwrite IPS processor adr writedata memdata 8 8 8 external memory Slide 2 6
PC PCEn ux Address Write data emory emdata register emory data register PCWriteCond PCWrite IorD emread emwrite emtoreg IRWrite[3:] Outputs Control Op [5: ] u x PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst [5: ] 6 8 Shift left 2 [3:26] [25 : 2] [2 : 6] [5 : ] ux [5: ] [7 : ] Read register Read Read register 2 data Registers Write Read register data 2 Write data [5: ] A B 2 3 u x ALU control Zero ALU ALU result ALUControl Jump address ALUOut u x 2 Block Diagram memwrite memread u x controller aluop[:] alucontrol ph op[5:] zero alusrca alusrcb[:] pcen pcsource[:] memtoreg regdst iord regwrite irwrite[3:] funct[5:] alucontrol[2:] ph2 reset adr[7:] datapath writedata[7:] memdata[7:] Slide 3 Hierarchical Design mips controller alucontrol datapath standard cell library bitslice zipper alu inv4x flop ramslice fulladder or2 and2 mux4 nor2 inv nand2 mux2 tri Slide 4 7
HDLs Hardware Description Languages Widely used in logic design Verilog and VHDL Describe hardware using code Document logic functions Simulate logic before building Synthesize code into gates and layout Requires a library of standard cells Slide 5 Verilog Example module fulladder(input a, b, c, output s, cout); a b a b c cout c carry sum sum s(a, b, c, s); s fulladder carry c(a, b, c, cout); cout s endmodule module carry(input a, b, c, output cout) assign cout = (a&b) (a&c) (b&c); endmodule Slide 6 8
Circuit Design How should logic be implemented? NANDs and NORs vs. ANDs and ORs? Fan-in and fan-out? How wide should transistors be? These choices affect speed, area, power Logic synthesis makes these choices for you Good enough for many applications Hand-crafted circuits are still better Slide 7 Example: Carry Logic assign cout = (a&b) (a&c) (b&c); Transistors? Gate Delays? Slide 8 9
Example: Carry Logic assign cout = (a&b) (a&c) (b&c); a b a c b c g g2 g3 x y z g4 cout Transistors? Gate Delays? Slide 9 Gate-level Netlist module carry(input a, b, c, output cout) wire x, y, z; and g(x, a, b); and g2(y, a, c); and g3(z, b, c); or g4(cout, x, y, z); endmodule a b a c b c g g2 g3 x y z g4 cout Slide 2
SPICE Netlist.SUBCKT CARRY A B C COUT VDD GND N I A GND GND NOS W=U L=.8U AD=.3P AS=.5P N2 I B GND GND NOS W=U L=.8U AD=.3P AS=.5P N3 CN C I GND NOS W=U L=.8U AD=.5P AS=.5P N4 I2 B GND GND NOS W=U L=.8U AD=.5P AS=.5P N5 CN A I2 GND NOS W=U L=.8U AD=.5P AS=.5P P I3 A VDD VDD POS W=2U L=.8U AD=.6P AS= P P2 I3 B VDD VDD POS W=2U L=.8U AD=.6P AS=P P3 CN C I3 VDD POS W=2U L=.8U AD=P AS=P P4 I4 B VDD VDD POS W=2U L=.8U AD=.3P AS=P P5 CN A I4 VDD POS W=2U L=.8U AD=P AS=.3P N6 COUT CN GND GND NOS W=2U L=.8U AD=P AS=P P6 COUT CN VDD VDD POS W=4U L=.8U AD=2P AS=2P CI I GND 2FF CI3 I3 GND 3FF CA A GND 4FF CB B GND 4FF CC C GND 2FF CCN CN GND 4FF CCOUT COUT GND 2FF.ENDS Slide 2 Physical Design Floorplan Standard cells Place & route Datapaths Slice planning Area estimation Slide 22
IPS Floorplan I/O pads mips (4.6 λ2) 5 λ I/O pads 35 λ 69 λ control 5 λ x 4 λ (.6 λ2) wiring channel: 3 tracks = 24 λ zipper 27 λ x 25 λ datapath 27 λ x 5 λ (2.8 λ2) alucontrol 2 λ x λ (2 kλ2) I/O pads bitslice 27 λ x λ 27 λ 35 λ I/O pads 5 λ Slide 23 IPS Layout Slide 24 2
Standard Cells Uniform cell height Uniform well height V DD and GND rails 2 Access to I/Os Well / substrate taps Exploits regularity Slide 25 Synthesized Controller Synthesize HDL into gate-level netlist Place & Route using standard cell library Slide 26 3
Pitch atching Synthesized controller area is mostly wires Design is smaller if wires run through/over cells Smaller = faster, lower power as well! Design snap-together cells for datapaths and arrays Plan wires into cells A A A A B Connect by abutment A A A A B Exploits locality A A A A B Takes lots of effort A A A A B C C D Slide 27 IPS Datapath 8-bit datapath built from 8 bitslices (regularity) Zipper at top drives control signals to datapath Slide 28 4
IPS ALU Arithmetic / Logic Unit is part of bitslice Slide 29 Design Verification Fabrication is slow & expensive Specification OSIS.6μm: $, 3 months State of art: $, month Architecture Debugging chips is very hard Design Limited visibility into operation Logic Prove design is right before building! Design Logic simulation Circuit Design Ckt. simulation / formal verification Layout vs. schematic comparison Physical Design Design & electrical rule checks Verification is > 5% of effort on most chips! = = = = Function Function Function Function Timing Power Slide 3 5
Fabrication & Packaging Tapeout final layout Fabrication 6, 8, 2 wafers Optimized for throughput, not latency ( weeks!) Processed wafers are sliced into dice (chips) and packaged. Packaging Bond gold wires from die I/O pads to package Slide 3 Testing Test that chip operates Design errors anufacturing errors A single dust particle or wafer defect kills a die Yields from 9% to < % Depends on die size, maturity of process Test each part before shipping to customer Slide 32 6