Introduction Sungho Kang Yonsei University
Outline VLSI Design Styles Overview of Optimal Logic Synthesis Model Graph Algorithm and Complexity Asymptotic Complexity Brief Summary of MOS Device Behavior 2
VLSI Manufacturing Technology Minimum length of MOS channels The number of interconnection layers Design Technology CAD tools Why VLSI New markets Operation speed Protection investments in design 3
Design Style - Decomposition Behavioral Synthesis Resource allocation; Pipelining; Control flow parallelization; Communicating Sequential Processes; Partitioning Sequential Synthesis Register Movement and Retiming; State Minimization; State Assignment; Synthesis for Testable FSM s; State Machine Verification Logic Synthesis Extraction of combinational logic to HDL; Two-level minimization; Algebraic Decomposition; Multilevel Logic Minimization; Synthesis for Multi-fault Testability; Test Generation via Minimization; Technology mapping; Timing optimization Technology Mapping Mapping to Library of Logic Gates; Timing Optimization Physical Design Synthesis Cell Placement; Routing; Fabrication; Engineering Changes 4
Logic Design Styles Full custom design Every circuit part is especially optimized for the purpose it must serve in the design Semi-custom design The circuit is designed by assembling pre-designed and precharacterized sub-circuits Manufacturing may use a pre-diffused substrate Programmed design The design is obtained by programming a standard part Some circuits may be programmed only once while others may be programmed an unlimited number of times 5
Design Methodology SEMICUSTOM CELL-BASED ARRAY-BASED STANDARD CELLS Hierarchical cells MACRO CELLS Generatiors: Memory PLA Sparse logic Gate matrix PREDIFFUSED Gate arrays Sea of gates Compacted arrays PREWIRED Anti-fuse based Memory-based 6
SOC Design Paradigm Emergence of Very large transistor counts on a single chip Mixed technologies on the same chip Logic, Analog, Memory, Processor Creation of Intellectual Property (IP) Reusable core-based design Cores replacing standard parts, such as DSP, DRAM, MCU, Flash, and FPGA 7
SOC Evolution Original Design Block ASIC User Defined Logic (UDL) Original Design Block SRAM ROM Block-Based SOC up (IP Core) User Defined Logic (UDL) Original Design Block DRAM Analog ROM ATM (IP Core) MPEG (IP Core) Core-Based SOC 8
IP Core types Hard Core(Technology dependent layout) Predictable area and performance Lack flexibility Soft Core(RTL) leave much of the implementation to the designer Flexible and process-independent Firm Core(Netlist) Each type of core has different modeling and test requirements 9
Design Tradeoffs Factors to be optimized in chip design: Area Delay Power Testability These competing objectives require Tradeoffs Synthesis tools automate tradeoff According to the commands used by the designer, area or delay (or power, or testability) is reduced 10
Area vs Delay:The Bit-Serial Adder A typical tradeoff is area versus delay With just one full adder, this circuit can do 32-bit addition But, it is 32x slower than a parallel adder (32 full adders, 1 bit output per clock cycle) 11
Design Tradeoff Curve Holding other factors constant, the Area vs Delay tradeoff curve is typically parabolic The first design requirement is meet Constraints on Chip Area and Critical Path Delay (0 to 1) 12
Design Tradeoffs The next priority is to optimize a feasible design Design 2 is optimal, in the sense that area and delay cannot both be decreased from this point Tradeoff is now necessary, according to policy 13
Design Tradeoffs 4 A typical design policy is to optimize area subject to a delay constraint (2 to 3) Often a preferred policy is to optimize delay subject to an area constraint (2 to 4) 14
Prioritizing Testability Sometimes other factors, such as testability or power, take priority Typically this moves the area-delay tradeoff curve up and to the right Thus designs 1 and 2 are optimal 15
Area Optimization Typically performed in a technology independent view of the circuit In this view gates are regarded as logic functions These functions are converted to physical gates by Technology Mapping 16
Technology Independent View a = xi yi b = xi yi e = a b zi = ec i-1 +e c i-1 c = xi yi d = xi + y + i f = dc i-1 ci = c + f In this view the gates of the full adder circuit are just logic equations 17
Optimization and Technology Mapping Common subfunctions shared Functions Technology Mapped to negative gates 18
Testing Faults Models Stuck-at faults Delay faults Test vectors Fault simulation Automatic test pattern generation Diagnosis Testable Design 19
Delay Optimization First step is to identify the Critical Path Simplest delay model: number of logic levels 20
Critical Path Analyzers Static Delay Models: Levels of Logic Delay function of size, load Worst, best case models Dynamic Delay Models Simplified device models Full Spice analysis 21
Model Behavioral Represents the behavior of the system Behavior can be specified as a mapping of logic values or of data words, etc. Functional model is a representation of its logic function Behavioral model consists of a function model with a representation associated with timing relations Advantages from the separation between function and timing Circuits with the same function but different timing Function and timing can be dealt in design verification Structural Collection of interconnected elements Primitive elements Block diagram : CPU, RAM, etc Schematic diagram : AND, OR etc 22
Model External Model viewed by the user Graphic or text-based HDL (Hardware Description Language) RTL (Register Transfer Level) Internal Data structures and programs inside 23
Truth Tables Simplest way n variables requires 2 n entry X1 X2 Y 0 0 0 0 1 0 1 0 0 1 1 1 24
Cube A cube of Z(x1,x2,x3) has the form (v1,v2,v3 v z ) where v z =(v1,v2,v3) An implicant of g of Z can be represented by a cube constructed Set vi=1(0) if xi(xi ) appears in g Set vi=x if neither xi nor xi appears in g Set v z =1 If the cube q can be obtained from the cube p by replacing one or more x values in p by 0 or 1 then p covers q Primitive cube Cube representing a prime implicant Primitive cubes provide a compact representation of a function 25
Intersection Operator Inconsistency : Consistency : Except 0 1 X 0 0 0 1 1 1 X 0 1 X Compatible Values whose intersection is consistent Intersection of 2 cubes is compatible iff all corresponding values are compatible To determine Z Form the cube (v1,v2, vn x) Intersect this cube with the primitive cube of Z until a consistent intersection is obtained The value of z is obtained in the rightmost position 26
State Tables Represents synchronous sequential circuits Row corresponding to every internal state Column corresponding to every possible input N(qi,Im), Z(qi, Im) Entry in row qi and column Im represents the next state and the output produced if Im is applied when the machine is in state qi N : next state Z : output function x 0 1 1 2,1 3,0 q 2 2,1 4,0 3 1.0 4,0 4 3,1 3,0 27
State Tables Canonical Structure x combinational circuit C z y Y F/F F/F CLOCK 28
Flow Tables Represents a behavior of asynchronous State transition may involve a sequence of state changes caused by a single input change to I j until a stable configuration is reached, denoted by N(q i, I j )=q i x 1 x 2 00 01 11 10 1 1,0 5,1 2,0 1,0 2 1,0 2,0 2,0 5,1 3 3,1 2,0 4,0 3,0 4 3,1 5,1 4,0 4,0 5 3,1 5,1 4,0 5,1 x y combinational circuit C z 29
Iterative Array C(0) pseudo pseudo pseudo C(0) C( i ) F/F F/F F/F 30
Programs as Functional Models Assembly LDA A /* load accumulator with value of A*/ AND B /* compute A.B */ AND C /* compute A.B.C */ STA E /* store partial result */ LDA D /* load accumulator with value of D */ INV /* compute D OR E /* compute A.B.C + D */ STA Z /* store result */ C E = A & B & C F = ~D Z = E \ F 31
RTL Constructs 8-bit register IR register IR[0 7] 256-word memory ABC with 16-bit/word memory ABC[0 255; 0 15] When control is 1, C=A+B if X then C=A+B Decoder test (IR[0 3]) case 0 : operation0 case 1 : operation1 case 15 : operation15 testend 32
Timing Models in RTL Procedural Languages Similar to conventional programming language where statements are sequentially executed such that the result of a statement is immediately available C = B A = B C = A Describes a system at the instruction set level of abstraction Nonprocedural Languages Statements are conceptually executed in parallel Exchange A and B A = B B = A Examples C=A+B, delay = 100 delay C 100 33
Structural Models Fanout Fanout-free Reconvergent fanput Inversion Parity 34
Graph Models and FSM Graph Edge Vertex Undirected graph Digraph (directed graph) All edges are directed Mixed graph Directed and undirected DAG (directed acyclic graph) If digraph has no cycles 35
Graph Terminology Graph: ordered set of two sets G= (V,E) V : a set of vertices or nodes E : a set of edges or arcs ->4 : the successors (fanouts) of node 4 4->={1,3} ->4 : the predecessors (fanins) of node 4 ->4={1,2} 36
Graph Models Transitive closure The extended edge relation E*(u,v) derived from a given edge relation E(u,v) is called transitive closure of E fanout fanin Source v has no predecessors Sink v has no successors 37
Products of Sets of Sets Intersecting 2 sets of sets Takes as inputs the sets G and H and computes the product P=GXH of sets G and H Procedure SET_CARTESIAN_PRODUCT(G,H) { Ops Times/call Best Worst m = G ; n= H ; c1 1 1 P = NULL c2 1 1 for (i = 1, 2,, m) { c3m 1 1 for (j = 1, 2,, n) { c4n m m P = P (Gi Hj) c5 mn mnq } } return(p) c6 1 1 } 38
Computing Critical Path Length This problem is modeled as that of finding the longest path in a DAG (Directed Acyclic Graph) The Key point is that every edge is traversed exactly once T(m,n,q) = O(m) 39
Longest Paths 40
Backtracing The slack of an edge(a,v) is the slack of v plus the difference between the length of the longest path to v and the longest path to v through (a,v) In formula, slack a,v = slack v + (λ v - (λ a + L a,v )) Here λ v is the length of the longest path to v, and (λ a + L a,v ) is the length of the longest path to v that passes through the edge (a,v) The slack of a node u is defined as the minimum of its fanout edge slacks, so slack a = min slack v An edge (a,v) is critical if it connects two nodes a and v with slack value 0 41
Asymptotic Complexity A function F(n) is in the set O(g(n)) if and only if there exist positive constants c o and n o such that F(n) c o g(n) for all n n o This means that F(n) is asymptotically bounded from above by a linear function of g(n) A function F(n) is in the set Ω(g(n)) if and only if there exist positive constants c Ω and n Ω such that F(n) c Ω g(n) for all n n Ω This means that F(n) is asymptotically bounded from below by a linear function of g(n) 42
Conclusion Design time spent in each design phase Verification : 39% RTL design and synthesis : 17% IC layout : 13% System design and integration : 11% Test vector creation : 11% Evaluation and Procurement : 6% Other : 3% 43