Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund RTL Synthesis
Purpose of HDLs Purpose of Hardware Description Languages: Capture design in Register Transfer Language form i.e. All registers specified Use to simulate design so as to verify correctness Pass through Synthesis tool to obtain reasonably optimal gate-level design that meets timing Design productivity Automatic synthesis Capture design as RTL instead of schematic Reduces time to create gate level design by an order of magnitude Synthesis FOCUS OF THIS LECTURE Basically, a Boolean Combinational Logic optimizer that is timing aware
Hardware Implementations HDLs can be compiled to semi-custom and programmable hardware implementations Full Custom Semi- Custom Programmable Manual VLSI Standard Cell Gate Array FPGA less work, faster time to market implementation efficiency PLD 3
ASIC Application Specific Integrated Circuit A chip designed to perform a particular operation as opposed to General Purpose integrated circuits An ASIC is generally NOT software programmable to perform a wide variety of different tasks An ASIC will often have an embedded CPU to manage suitable tasks An ASIC may be implemented as an FPGA Sometimes considered a separate category
Examples of ASICs Video processor to decode or encode MPEG-2 digital TV signals Low power dedicated DSP/controller /convergence device for mobile phones Encryption processor for security Many examples of graphics chips Network processor for managing packets, traffic flow, etc.
ASIC Styles Full Custom ASICs Every transistor is designed and drawn by hand Typically only way to design analog portions of ASICs Gives the highest performance but the longest design time Full set of masks required for fabrication
ASIC Styles (Contd.) Standard-Cell-Based ASICs or Cell Based IC (CBIC) or semi-custom Standard Cells are custom designed and then inserted into a library These cells are then used in the design by being placed in rows and wired together using place and route CAD tools Some standard cells, such as RAM and ROM cells, and some datapath cells (e.g. a multiplier) are tiled together to create macrocells D-flip-flop: NOR gate:
Standard Cell ASICs Sample ASIC floorplan: Standard Cell designs are usually synthesized from an RTL (Register Transfer Language) description of the design Intellectual Property Blocks (IPs) are often used to decrease Time to Market Hard IP (like SRAM): Technology Dependent, GDSII and libs Soft IP (DW library): Tech independent, delivered as RTL, with synthesis and verification scripts Standard-cell area (Soft Macro) Fixed blocks (Hard Macros) I/O cells
Standard Cell ASICs 3D view 2D view
Standard Cell ASICs
Logic Synthesis Automatic synthesis is used to turn the RTL into a gatelevel description ie. AND, OR gates, etc. Chip-test features are usually inserted at this point Gate level design verified for correctness Output of synthesis is a net-list i.e. List of logic gates and their implied connections NOR2 U36 (.Y(n107),.A0(n109),.A1(\value[2] ) ); NAND2 U37 (.Y(n109),.A0(n105),.A1(n103) ); NAND2 U38 (.Y(n114),.A0(\value[1] ),.A1(\value[0] ) ); NOR2 U39 (.Y(n115),.A0(\value[3] ),.A1(\value[2] ) );
Logic Synthesis Timing/Logic Library IP Library(DW) Physical Library RTL Timing Constraints Floorplan Synthesis (DC, DCT) residue = 16 h0000; if (high_bits == 2 b10) else residue = state_table[index]; state_table[index] = 16 h0000; Hardware Description Language (HDL) Synthesis HDL Translation Mapping Static Timing (DC/DCT/PC/PT) Formal Equivalence (FM) Power Analysis (DC/DCT/PC/PT-PX) Static Timing Placement Routing Estimation Meets Spec? No Optimization Design Rule Fixing Scan-Ready Netlist DFT Yes Target Technology (standard cells)
Floorplanning Corner cell I/O cell P/G buses Pad Die edge Bonding wire SOFT Macro Digital core Std Cell PLL RAM Leadframe Die Bonding Wire Core Area Resin mould Leadframe
Placement Physical Design tools used to turn the gate-level design into a set of chip masks (for photolithography) or a configuration file for downloading to an FPGA Floorplanning and Power Planning Positioning of major functions Placement of the Standard cells Gates arranged in rows
Clock Tree Synthesis (CTS) Clock and buffer Insertion Distribute clocks to cells and locate buffers for use as amplifiers in long wires
Routing Routing Logic Cells wired together Clock Routing Global Routing Detailed routing
Signoff & Chip Finishing Route Database Timing/Logic Library Hard Macro Library Physical Library Chip Finishing (Astro) Metal Fill Chip Finishing Double Via Insertion Filler Cell Insertion Critical Area Optimization Route Optimization STA with SI and SSTA for variations Antenna Fixing Route DRC Fixing IR drop and EM Analysis Design Rule Fixing Static Timing (PT/Star-RCXT) Formal Equiv (FM) Power Analysis (PTPX/PrimeRail) Route DRC (Hercules) LVS (Hercules) GDSII Meets Spec? Yes Mask Synthesis No GDSII
Synthesis/Mapping/Optimization Synthesis Converting the RTL into a generic logic netlist Mapping Mapping the generic netlist into standard cells from the core library Optimisation Optimising the logic to meet timing, area and power constraints RTL module counter( input clk, rstn, load, input [1:0] in, output reg [1:0] out); always @(posedge clk) if (!rstn) out <= 2'b0; else if (load) out <= in; else out <= out + 1; endmodule Constraints Synthesis Netlist module counter ( clk, rstn, load, in, out ); input [1:0] in; output [1:0] out; input clk, rstn, load; wire N6, N7, n5, n6, n7, n8; HDDFFPQ1 out_reg_1 (.D(N7),.CK(clk),.Q(out[1])); HDDFFPQ1 out_reg_0 (.D(N6),.CK(clk),.Q(out[0])); HDNAN2DL U8 (.A1(out[0]),.A2(n5),.Z(n8)); HDNAN2DL U9 (.A1(n5),.A2(n7),.Z(n6)); HDINVDL U10 (.A(load),.Z(n5)); HDOA211DL U11 (.A1(in[0]),.A2(n5),.B(rstn),.C(n8),.Z(N6)); HDOA211DL U12 (.A1(in[1]),.A2(n5),.B(rstn),.C(n6),.Z(N7)); HDEXNOR2DL U13 (.A1(out[1]),.A2(out[0]),.Z(n7)); endmodule 19
Synchronous RTL design Pseudo outputs Pseudo inputs Primary outputs Primary inputs
RTL Synthesis RTL file Gate level select always @(select,a,b) Sel begin if(select= 1 ) a Xor pin1 q q<= a xor b; b else MUX q<=a and b; end And pin0
always @(posedge clk, posedge reset) begin if(reset== 1 b1) q<= 0; else q<=a or b; end Sequential parts
Synthesis Recap The RTL The Register Transfer Level code can be written in VHDL, Verilog, SystemVerilog, or even SystemC RTL is a particular coding style, which defines io s, clocked sequential statements, and combinational logic RTL module counter( input clk, rstn, load, input [1:0] in, output reg [1:0] out); always @(posedge clk) if (!rstn) out <= 2'b0; else if (load) out <= in; else out <= out + 1; endmodule Constraints Synthesis Netlist module counter ( clk, rstn, load, in, out ); input [1:0] in; output [1:0] out; input clk, rstn, load; wire N6, N7, n5, n6, n7, n8; HDDFFPQ1 out_reg_1 (.D(N7),.CK(clk),.Q(out[1])); HDDFFPQ1 out_reg_0 (.D(N6),.CK(clk),.Q(out[0])); HDNAN2DL U8 (.A1(out[0]),.A2(n5),.Z(n8)); HDNAN2DL U9 (.A1(n5),.A2(n7),.Z(n6)); HDINVDL U10 (.A(load),.Z(n5)); HDOA211DL U11 (.A1(in[0]),.A2(n5),.B(rstn),.C(n8),.Z(N6)); HDOA211DL U12 (.A1(in[1]),.A2(n5),.B(rstn),.C(n6),.Z(N7)); HDEXNOR2DL U13 (.A1(out[1]),.A2(out[0]),.Z(n7)); endmodule 23
Synthesis Recap The Netlist The netlistis always written in verilogformat Other languages are not supported by the tools, or only poorly supported by the tools. Specifying the top level interface, Connectivity between library instances and Logical hierarchy (Usually) RTL module counter( input clk, rstn, load, input [1:0] in, output reg [1:0] out); always @(posedge clk) if (!rstn) out <= 2'b0; else if (load) out <= in; else out <= out + 1; endmodule Constraints Synthesis Netlist module counter ( clk, rstn, load, in, out ); input [1:0] in; output [1:0] out; input clk, rstn, load; wire N6, N7, n5, n6, n7, n8; HDDFFPQ1 out_reg_1 (.D(N7),.CK(clk),.Q(out[1])); HDDFFPQ1 out_reg_0 (.D(N6),.CK(clk),.Q(out[0])); HDNAN2DL U8 (.A1(out[0]),.A2(n5),.Z(n8)); HDNAN2DL U9 (.A1(n5),.A2(n7),.Z(n6)); HDINVDL U10 (.A(load),.Z(n5)); HDOA211DL U11 (.A1(in[0]),.A2(n5),.B(rstn),.C(n8),.Z(N6)); HDOA211DL U12 (.A1(in[1]),.A2(n5),.B(rstn),.C(n6),.Z(N7)); HDEXNOR2DL U13 (.A1(out[1]),.A2(out[0]),.Z(n7)); endmodule 24
Clock Insertion delay T period T ck T d T su T id-launch T id-capture = Target clock period = Popagation delay of launching flip-flop = Propagation delay of combinational cells = Setup time of capture flip-flop = Clock insertion delay of launch path = Clock insertion delay of capture path T su T d T ck T id-launch T id-capture Setup Check T id-capture -T id-launch -T ck +T period -T d > T su Hold Check T id-launch -T id-capture +T ck +T d > T h Setup Slack SS = T id-capture -T id-launch -T ck +T period -T d - T su Hold Slack HS = T id-launch -T id-capture +T ck +T d - T h
Scan Test Illustrated Demonstrates two test vectors being applied. Test vector 1 = 101, test vector 2 = 001 Test Mode Shift Mode Capture Mode Shift Mode Shift in test vector with scan chain working as a shift register. Takes N clock cycles (N = # of scan cells) 1. Hold and let ScanEnable signal settle at 0 2. Capture responses into scan registers 3. Hold and let ScanEnable signal settle at 1 Shift out responses with scan chain working as a shift register. Takes N clock cycles (N = # of scan cells) Time Step 1 2 3 4 5 6 7 8 9 ScanEnable 01 Compare PO with expected output 1 DataIn ScanIn 0 0 1 Clk 1 0 SE D Si Q Combinational Logic Out1 Out2 Out3 SE D Si Q Combinational Logic SE D Si Q Combinational Logic SE D Si Q DataOut ScanOut 55 Mark Wilmott - STFC 55