1 VLSI Programming 2016: Lecture 3 Course: 2IMN35 Teachers: Kees van Berkel c.h.v.berkel@tue.nl Rudolf Mak r.h.mak@tue.nl Lab: Kees van Berkel, Rudolf Mak, Alok Lele www: Lecture 2 fpgas, verilog, lab assignment /0/16

2 VLSI Programming (2IMN35): time table in Tue: h5-h8; MF.07 out 2016 in Thu: h1-h; Gemini-Z3A-08/10/13 out 19-Apr introduc/on, DSP graphs, bounds, 21-Apr pipelining, re/ming, transposi/on, J-slow, unfolding 26-Apr tools Introduc/ons to L1: audio filter L1 28-Apr T1 unfolding, look-ahead, L1 cntd installed FPGA and Verilog simula/on L2 + T2 strength reduc/on 3-May folding L2: audio filter 5-May on XUP board 10-May T3 + T DSP processors L2 cntd L3 12-May L3: sequen/al FIR + strength-reduced FIR 17-May L3 cntd 19-May L3 cntd L 2-May systolic computa/on T5 26-May L3 L 31-May T5 L: 2-Jun L cntd L5 audio sample rate convertor 7-Jun L5: 102x audio sample rate convertor 9-Jun L L5 cntd 1-Jun 16-Jun L5 deadline report L5 T1 + T2 T3 + T 2 26/0/16

3 Note on course literature Lectures VLSI programming are loosely based on: Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science This book is recommended, but not mandatory Accompanying slides can be found on: Mandatory reading: Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp Keshab K. Parhi. High-Level Algorithm and Architecture Transformations for DSP Synthesis. Journal of VLSI Signal Processing, 9, (1995), Kluwer Academic Publishers. 3 26/0/16

4 Outline Lecture 3 Introduction to FPGAs Introduction to Verilog Introduction to Lab assignment 1 Hands on! 26/0/16

5 FPGA IC on a Xilinx XUP Board (Atlys) FPGA Xilinx Spartan /0/16

6 Building an FPGA: Logic First Xilinx slide [A,B,C,D] A -input lookup table (LUT) can implement any function of inputs. 16 words x 1 bit memory F For example, a 1-bit adder needs 2 LUTs: B A A.B.C i A B C i Co S Ci 6 26/0/16

7 Add FF to make a Logic Cell Xilinx slide In 16 words x 1 bit memory FF M M CE RST M Out Clk CE Rst 7 26/0/16

8 Arithmetic, Distributed RAM Xilinx slide Cout Carry M Din WE Cin 16 words x 1 bit memory FF M M CE RST M Fast carry ripple to neighbor. 8 26/0/16

9 Add Interconnect Xilinx slide 0 Group logic cells to reduce overhead. Add H, V routing channels with switchboxes. Add input, output MUXing between logic and routing. 9 26/0/16

10 Build an Array Xilinx slide /0/16

11 Putting the R in Reconfigurable Computing Fine-grained FPGAs are the platform of choice for Reconfigurable Computing. Xilinx slide User Logic Configuration RAM State Configuration 11 26/0/16

12 Add Bells & Whistles Xilinx slide Hard Processor Gigabit Serial 18 Bit 36 Bit 18 Bit I/O Multiplier Programmable Termination BRAM Clock Mgmt 12 26/0/16

13 Spartan DSP slice EEtimes slide Useful for P := P + A (B + D) and sub-expressions like P := A B Note: A, B, C, D [18b], multiplier output [36b], and P [8b] 13 26/0/16

14 Spartan-6 FPGA 1 slice = LUTs [6-input each] + 8 flipflops 1 DSP slice = 18b 18b multiplier + adder + accumulator 1 BRAM = 1k 18b (OR 2 0.5k 18b) 1 26/0/16

15 Atlys board, based on Xilinx Spartan 6 FPGA Xilinx Spartan /0/16

16 FPGA comparison table [Xilinx] Spartan-6 Artix-7 Kintex-7 Virtex-7 Kintex Kintex Virtex Virtex UltraScale UltraScale+ UltraScale UltraScale + Feature size [nm] Logic Cells (K) ,955 1, ,33 2,863 UltraRAM (Mb) Block RAM (Mb) DSP Slices ,920 3,600 5,520 3,528 2,880 11,90 DSP Performance [GMACs] ,85 5,335 8,180 6,287,268 21,213 Transceiver Count Maximum Transceiver Speed (Gb/s) Total Transceiver bw (full duplex) (Gb/s) ,78 2,086 2,78 5,886 8,38 Memory Interface (DDR3 ) 800 1,066 1,866 1,866 2,133 2,133 2,133 2,133 PCI Express x1 gen1 x gen2 x8 gen2 x8 gen3 x8 gen3 x16 gen 3 x8 gen3 x16 gen3 I/O Pins , , I/O Voltage V V V V V V V V 16 26/0/16

17 Introduction to Verilog 17 26/0/16

18 Verilog (IEEE Std ). Verilog is a Hardware Description Language (HDL) Verilog is a text-based way to describe and exchange designs Verilog designs can be simulated, and mapped onto gate-level designs ( logic synthesis ), and subsequently translated to silicon/fpga primitives. Berkeley tutorial CS61c: Verilog Tutorial by J. Wawrzynek Verilog Golden Reference Guide by Doulos (VHDL is an alternative HDL; Verilog is easier to learn and use, mainly due its C-like syntax) 18 26/0/16

19 Verilog Despite C-like syntax,... Verilog is NOT an imperative programming language (C, C++, Java, Pascal, FORTRAN ) Implicit notion of global time (e.g. picoseconds) time units can be used to express delays ( postpone by N units ) action can be triggered by events Popular language to describe digital circuits (e.g. circuits derived from data-flow graphs) as well as their test environments 19 26/0/16

20 Mux2: a 2-way multiplexor 20 26/0/16

21 Mux2: a 2-way multiplexor (behavioral) module mux2 (in0, in1, select, out); input in0, in1, select; output out; assign out = select? in1 : in0 ; endmodule // mux2 Verilog s continous assignment: Alternative, with delay of 3 time units: assign #3 out = select & in1 select & in0 ; 21 26/0/16

22 Mux2: a 2-way multiplexor (gate level) module mux2 (in0, in1, select, out); input in0, in1, select; output out; wire s0, w0, w1; not #1 (s0, select); // inverter, with 1 unit delay and #1 (w0, s0, in0), // and gate, with 1 unit delay (w1, select, in1); // and gate, with 1 unit delay or #1 (out, w0, w1); // OR gate, with 1 unit delay endmodule // mux /0/16

23 Mux2: a 2-way multiplexor (test bench) module testmux; reg a, b, s; reg expected; wire f; mux2 mymux (.select(s),.in0(a),.in1(b),.out(f)); initial begin #0 s=0; a=0; b=1; expected=0; #10 a=1; b=0; expected=1; #10 s=1; a=0; b=1; expected=1; #10 $stop; end initial $monitor("select=%b in0=%b in1=%b out=%b, expected out=%b time=%d", s, a, b, f, expected, $time); endmodule // testmux 23 26/0/16

24 Mux2: a 2-way multiplexor (test results) select=0 in0=0 in1=1 out=0, expected out=0 time=0 select=0 in0=1 in1=0 out=1, expected out=1 time=10 select=1 in0=0 in1=1 out=1, expected out=1 time= /0/16

25 Behavioral model of -bit Register // positive edge-triggered, // synchrounous active-high reset. module reg (CLK,Q,D,RST); input [3:0] D; input CLK, RST; output [3:0] Q; reg [3:0] Q; (posedge CLK) If (RST) #1 Q = 0; else #1 Q = D; endmodule // reg 25 26/0/16

26 Two possible assignment syntaxes: a = b and a <= b a <= b b <= a swaps the values of a and b a = b b = a simply sets both a and b to the previous value of b Beware! 26 26/0/16

27 Designing a clock signal reg CLK // clock is state variable! initial begin CLK=1 b0; // clock initially 0 (low) forever #5 CLK = CLK; // clock period = 10 end 27 26/0/16

28 A 22-stage FIR filter Comprising 22 identical FIR stages x(n) x(n) D x(n-1) D x(n-20) D x(n-21) D h 0 h 1 h 20 h 21 b y(n) 22 stages 28 26/0/16

29 FIRstage.. as building block of the 22-stage FIR filter a in h in x a out module FIRstage reg signed [0:DWIDTH-1] x; assign a_out = x; assign b_out = b_in + (a_in * h_in); b in + b out clk) begin if (enabled) x <= a_in; end endmodule 29 26/0/16

30 Module FIRstage module FIRstage #( parameter DWIDTH = 16, parameter DDWIDTH = 2 * DWIDTH) ( input clk, input enabled, input signed [0:DWIDTH-1] a_in, input signed [0:DDWIDTH-1] b_in, output signed [0:DWIDTH-1] a_out, output signed [0:DDWIDTH-1] b_out, input signed [0:DWIDTH-1] h_in); reg signed [0:DWIDTH-1] x; // Internal registers and wires assign a_out = x; assign b_out = b_in + (a_in * h_in); clk) begin // Process for the internal register if (enabled) x <= a_in; end endmodule 30 26/0/16

31 Module FIR (parameters and interface) module FIR #(parameter NR_STAGES = 22, parameter DWIDTH = 16, parameter CWIDTH = NR_STAGES * DWIDTH, // filter coefficients parameter DDWIDTH = 2 * DWIDTH) (input clk, input enabled, input signed [0:DWIDTH-1] a_in, output signed [0:DWIDTH-1] b_out, input [0:CWIDTH-1] h_in); // 22x16 wires // Generate and connect NR_STAGES filter stages (next slide) endmodule 31 26/0/16

32 Module FIR (body) wire signed [0:DWIDTH-1] a [0:NR_STAGES]; // Internal registers, wires wire signed [0:DDWIDTH-1] b [0:NR_STAGES]; generate // Generate filter stages genvar i; for (i = 0; i < NR_STAGES; i = i + 1) begin : stage FIRstage #(DWIDTH, DDWIDTH) comp (clk, enabled, a[i], b[i], a[i+1], b[i+1], h_in[i*dwidth:(i+1)*dwidth-1]); end endgenerate assign b[0] = 0; assign a[0] = a_in; // connect stages to FIR interface assign b_out = b[nr_stages][0:dwidth-1]; 32 26/0/16

33 clk A 22-stage FIR filter x(n) x(n) D x(n-1) D x(n-20) D x(n-21) D h 0 h 1 h 20 h 21 b y(n) 22 registers are clocked simultaneously, clk) and 22 multiply-adds run synchronously, at a rate of.1 khz (audio) critical path = 1 multiplication + 22 addition (non optimal) 33 26/0/16

34 clk A 22-stage FIR filter x(n) x(n) D x(n-1) D x(n-20) D x(n-21) D h 0 h 1 h 20 h 21 b y(n) Transposed / retimed version of this filter can easily run at 100 MHz on an FPGA: maximum f sample = f clock = 100MHz With f clock = 100MHz and f sample =.1 khz the HW utilization is only.1khz/100000khz = 0.0% Filter can also be realized with 1 adder + 1 multiplier (L3) 3 26/0/16

35 2IN35: reporting guidelines 2016 (1) 1. Submit one report per team (2 students) 2. Respect deadlines: Assignment L3: Thursday May 26, 2016 Assignment L: Thursday June 9, 2016 Assignment L5: Thursday June 16, Make sure that assignments L3, L, and L5 are demonstrated to and signed of by Alok, Rudolf, or Kees.. Report on lab assignments L3, L, and L5. 5. Submit the reports using Peach (paper copies will not be accepted) /0/16

36 2IN35: reporting guidelines 2016 (2) General guidelines (each assignments), to be followed strictly: 6. Analyze the specifications and requirements. 7. Present/motivate key ideas/decisions, design options, alternatives, trade-offs. 8. Draw architecture block diagram (= picture!). 9. Explain functional correctness of your Verilog programs (include your complete Verilog programs in an appendix). 10. Explain #clock cycles per sample time T s. Include waveforms. 11. Report, analyze & explain FPGA-resource usage and utilization {#multipliers, #BRAMS, #LUTs} in relation to your design. 12. Report, analyze & explain (min) sample time T s and (max) sample frequency f s, both after synthesis and after placement & routing /0/16

37 2IN35: reporting guidelines 2016 (3) 13. Include simulation results: both wave forms in time domain, and in frequency domain (apply FFT) (assignments 3 and only). 1. Include answers to the inline questions 15. Annotate all graphs to include for both axis: - quantity (weight, distance, duration, ) - unit (ounce, light year, century, ) - linear/log/... (ok to assume linear) 37 26/0/16

38 Lab assignment 1 Lab assignment 1: Today: start Tue May 3: completion Lab assignment 2: Tue May 3: start 38 26/0/16


