EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 4 Simulink Design Environment Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki Material being constantly updated Please sign up using your UCLA EE user name (I need this for verification purposes) Homework #1 up online Due: Friday, Apr 18, 4pm Slide 2
Multiple design descriptions Algorithm (Matlab or C) Fixed point description RTL (behavioral, structural) Test vectors for logic analysis Development Multiple engineering teams involved This work demonstrates Unified Matlab/Simulink environment Optimized design and verification Slide 3 Scaling Impacts Architecture E op / E ref @ V dd max 10 1 0.1 Process: L-Vt S-Vt H-Vt P2 P2 T2 T2 Architecture: P2: parallel 2 T2: time-mux 2 Scaling 0.01 0.1 1 10 T op / T ref @ V max dd Slide 4
Simulink to Silicon Mapping MDL to RTL conversion, automated P&R flow Simulink Fix-pt lib MDL Custom tool 1 Speed Power Area backend [R. Davis et al., JSSC Mar 02] Slide 5 Including Emulation XSG hardware library, RTL translation scripts Simulink Hw lib RTL backend Custom tool 2 backend Speed Power Area and are I/O equivalent [K. Kuusilinna et al., book chap. in SoC Revolution, KAP 2003] Slide 6
Closing the Loop: I/O Verification I/O hardware library, automated flow Simulink I/O lib Hw lib RTL Custom tool 3 backend Custom tool 2 backend Speed Power Area implements logic analysis [D. Markovic, C. Chang, B. Richards, H. So, B. Nikolic, R.W. Brodersen, CICC 07] Slide 7 Proposed Approach Unified Simulink environment Enter the design only once! Algorithm verification / emulation Abstract view of architecture based debug Hardware-equivalent blocks Basic operators Add, multiply, shift, mux Implementation constraints Word-size, latency Slide 8
Bit True Cycle Accurate Model Speed Power Area [Slide from B. Brodersen] Slide 9 Now to Implement That Description Algorithm/flexibility evaluation Digital delay, area and energy estimates & effect of analog impairments Initial System Description (Floating point Matlab/Simulink) Determine Flexibility Requirements Description with Hardware Constraints (Fixed point Simulink, FSM Control in Stateflow) Common test vectors, and hardware description of net list and modules Real-time Emulation ( Array) Automated Generation (Chip-in-a-Day flow) [Slide from B. Brodersen] Slide 10
Simulink Based Chip Design Strategy Directly map diagram into hardware since there is a one for one relationship for each of the blocks S reg Mult1 Mac1 X reg Mac2 Add, Sub, Shift Mult2 Result: An architecture that can be implemented rapidly Slide 11 Hardware Libraries Xilinx System Generator Synplify DSP Slide 12
Synplify DSP Overview [Courtesy: Synplicity] DSP Synthesis Solution: DSP Design Library Capture, Simulate, Analyze Sample Rates & Quantization High Architectural Abstraction DSP Synthesis Engine Architectural Optimizations Synthesizable RTL Technology Dependent Logic Synthesis Engine Logic-level Optimizations Map to Memory, Mults, Sample Time Definition Physical Clock Constraint Model Based Design Architectural Optimizations DSP Synthesis Synplify DSP Application Logic Optimizations RTL Synthesis Synplify Pro Synthesis Synplify Pro Synthesis Synplify Pro Synthesis Synplify DSP Library synthesizable RTL synthesizable RTL synthesizable RTL Logic Synthesis y=fir(x) c=a+b; r(0)=p(1)^q(3); Lecture 4 Synplify DSP Library Features Automatic Automatic Propagation Propagation of of Fixed-Point Fixed-Point Quantization Quantization Parameters Parameters HW Implementable Simulink Blockset Features: Datapath Propagation Sample Rate Display Fixed-Point Analysis Fast Simulation: Simulink S-Functions Hardware Abstraction Full Full User User Control Control of of Precision Precision Automatic Propagation and Display of Automatic Propagation and Display of Sample Rate Relationships Sample Rate Relationships [Courtesy: Synplicity] Analyze and Debug Quantization Effects Analyze and Debug Quantization Effects with Fixed-Point System Tools with Fixed-Point System Tools Slide 14
Simple FIR Example [Courtesy: Synplicity] Basic 6-Tap FIR 18-bit input Coefficients 18-bit Fixed Low Pass freq. spec Positive Symmetric -0.0458 0.1770 0.4124 0.4124 0.1770-0.0458-0.0458 0.1770 0.4124 0.4124 0.1770-0.0458 Slide 15 Advantages of DSP Synthesis Explore Speed/Area Tradeoffs from a Single Model More Parallelization DSP Synthesis Area Parameterized Schematics or RTL Architectural Alternatives More Serialization Delay [Courtesy: Synplicity] Slide 16
Architecture Transformations Systematic way to do this the goal is to reach target point (e.g. reference point) on Energy-Delay line parallel reference Energy time-mux pipeline intl, fold Area 0 intl, fold V DD scaling time-mux reference pipeline, parallel Delay Slide 17 Pipelining Strategy Library blocks / macros synthesized @ V DD ref T Clk @ V DD ref Speed Power Area mult add Cycle Time V DD scaling gate sizing Pipeline logic scaling FO4 inv simulation T Clk @ V DD opt V DD ref Latency 0 Energy Slide 18
[Courtesy: Synplicity] Comprehensive Library for Waveform Development Synplify DSP v3.2 Subsystems Black Box Sources Constant Counter DDS (NCO) Ramp Random Sequence Transforms FFT Math Abs Accumulator Add Binary Logic Comparator DivMod Gain Inverter Log (ln,log10,log2) MinMax Mult Negate Pow Shifter Sign SinCos Sqrt CORDIC Div Exp Log Polar Rotator SinCos Sqrt Memories Delay FIFO Permutation RAM Register ROM Shift Register Signal Operations Concatenate Convert De/Commutator De/Mux Down/Upsample Extract Parallel to Serial Recast Serial to Parallel Vector De/Mux Communications Block De/Interleaver Convolutional De/Interleaver Convolutional Encoder De/Puncture Viterbi Decoder New! Control Logic M Control Mealy State Machine Moore State Machine Filtering CIC Differentiator FIR FIR Rate Converter IIR Integrator FIR Engine (Adaptive) New! Reloadable FIR Slide 19 Vector Support [Courtesy: Synplicity] Vector signals in Synplify DSP Designs Concise description of parallel and multi-channel algorithms Blocks Updated w/vector Support: Blocks Updated w/vector Support: port in port in port out port out add add mult mult counter counter acc acc fir fir fft fft fifo fifo ram ram delay delay gain gain rom rom SR SR comparator comparator sincos sincos shifter shifter Upsample Upsample Downsample Downsample Mux Mux Demux Demux convert convert Constant Constant New Blocks: New Blocks: Vector Mux Vector Mux Vector Demux Vector Demux Slide 20
[Courtesy: Synplicity] More Complex Example: Rx Chain Multi-Rate Support Synplify DSP Viterbi Decoder Synplify DSP FIR Waveform and Constellation Analysis M Control Slide 21 Simulink Design Flow DSP algorithm Timed dataflow SysGen SynDSP B-box HDL Architectural Transformations backend backend Speed Power Area Hardware co-simulation Slide 22
Based Chip Verification Matlab Simulink model emulation board board Real-time hardware co-simulation Slide 23 Based Verification + + = I/O TB I/O TB Goal: use Simulink testbench (TB) for verification Develop custom interface blocks (I/O) Place I/O and RTL into TB model Simulink implicitly provides the testbench Slide 24
Custom Interface Blocks (I/O) Sw / Hw interfaces Regs, FIFOs, BRAMs External interfaces GPIO ports A/D & D/A Analog subs. Debugging Signal gen. Hw scope Fully automated RTL flow for verification Slide 25 Simulink Test Model ADDR IN OUT WE BRAM_IN -c- -c- in rst Simulink hardware model out ADDR IN OUT WE BRAM_ logic gpio gpio gpio -cin rst clk board out gpio ADDR IN OUT WE BRAM_ -c- reset sim_rst reg0 logic software_reg Slide 26
Initial Verification Strategy Client PC RS232 ~Kb/s board GPIO ~130Mb/s board Testbench model on the board Test vectors entered through RS232 Block read / write operation Custom read_xps, write_xps commands Real-time performance bounded by GPIO Slide 27 Example: SVD Test Model Emulation-based I/O test Slide 28
Based Verification board GPIO board Real-time at-speed verification Slide 29 Measured Functionality Eigenvalues 12 10 8 6 4 2 4x4 MIMO channel tracking theoretical hardware σ 4 2 σ 3 2 σ 2 2 σ 1 2 0 0 8 16 24 32 Number of Symbols [k] Up to 10 b/s/hz with adaptive PSK Slide 30
Improved Verification Strategy User Term. Ethernet BORPH board Z Dok ~500Mb/s board Goal 1: Improved I/O speed Standardized / interface Goal 2: test vectors on the Remote control via BORPH, an OS Testbench executes on a BORPH managed Slide 31 Based Verification (Summary) I/O I/O TB TB Simulation Simulink Simulink Simulink Pure SW Simulation HDL Simulink Simulink Simulink ModelSim co-simulation Emulation I/O Test HIL tools Simulink Hardware in the loop simulation Pure emulation & & Custom SW Testvectors outside Testvectors inside Slide 32
Summary Simulink provides design description that is convenient for algorithm developers and hardware designers Design is described using the block-based model that is bit-true and cycle-accurate with respect to hardware We can also perform mixed-mode simulations to quantify the impact of finite wordlength effects (to be discussed in detail after we finish with DSP kernels) Slide 33