Architecture Techniques
|
|
- Silvia Reed
- 5 years ago
- Views:
Transcription
1 EE29A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 3 Architecture Techniques Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki up and running Go to: EEWeb / Online Lab Please sign up using your UCLA EE user name (I need this or veriication purposes) Homework # coming up on Wed Background, circuit and microarchiecture techniques Slide 2
2 Real Lie Example: 802.a Baseband Viterbi Decoder MAC Core DMA PCI Time/Freq Synch ADC/DAC FSM AGC FFT Direct mapped architecture 200 MOPS/mW 80 MHz clock! 40 GOPS Power = 200mW 0.25µm CMOS The architecture has to track technology Atheros 802.a baseband processor Slide 3 Wireless Baseband Chip Design Direct mapping is the most energy-eicient Technology is too ast or dedicated hardware Opportunity to urther reduce energy and area Energy Eiciency Speed o technology Microprocessors Programmable DSPs Hardwired Logic GHz 00 s o MHz 0 s o MHz Clock Period Slide 4
3 A Number o Variables to Consider How to optimally combine all variables? E Lk W e V Vth V 3 dd T D ( V E dd V V W 0 2 Sw V dd W dd α th) Speed o technology (ast) Required speed (slow) pipelining parallelism time-multiplexing Vdd sizing Vth Clock Period 2 orders o magnitude (and growing ) Slide 5 Introduction to Architecture Optimization E-A-D E-D Algorithm Modeling DSP Architectures Circuit Optimization Tsample Tclk Power Area Timing Simulink Cadence RTL Slide 6
4 Architectural Feedback rom Technology Simulink hardware library implicitly carries inormation only about latency and wordlength (we can later choose sample period when targeting an FPGA) For ASIC low, block characterization also has to include technology eatures such as speed, power, and area But, technology parameters scale each generation Need a general and quick characterization methodology Propagate results back to Simulink to avoid iterations Slide 7 Architecture-Circuit Co-design behavioral E-D DSP Architectures Circuit Optimization Tclk HDL logical L physical Architectural Feedback Pre-layout Post-layout Speed Power Area Re- synthesis Speed Power Area Slide 8
5 Starting Point: Datapath Characterization Balance tradeos due to gate size (W) and supply (V DD ) Energy 0 W Min delay W V DD re V DD scaling Target delay Delay Circuit Level Optimal design point Curves due to W and V DD are tangent (equal sensitivity) Goal: keep all pipelines at the same E-D point Slide 9 Cycle Time is Common or All Blocks Simulink RTL Synopsys latency add Area Power Speed mult cycle time (norm.) netlist HSPICE Switch-level accuracy Speed Power Area Slide 0
6 Next Step: Block Characterization Goal: balance logic depth within a block Latency 0 Target T Clk add Speed Power Area mult Cycle time Micro-Architecture Level Select block latency to achieve target T Clk Balances pipeline logic depth Apply W and V DD scaling to the underlying pipelines Slide Architectural Feedback to Simulink Characterize blocks with predetermined wordlength Translate timing speciication to a target supply voltage Determine optimal latency or a given cycle time Energy (norm.) V Simulated FO4 inverter (Vdd scaling) Target speed speed (nominal Vdd) (nominal V DD ) 0.6V Desired point (optimal Target Vdd) speed (a) Delay (norm.) Latency (b) m=8 add a=2 Synthesized blocks (nominal Vdd) Target speed Area Power Speed mult Cycle time (norm.) Slide 2
7 Basic Micro-Architectural Techniques Parallelism, pipelining, time-multiplexing A B (a) reerence A B A B (c) pipeline A A (d) reerence or time-mux A B (b) parallel A 2 (e) time-multiplex Slide 3 Architecture Trade-Os : Reerence Datapath Critical path delay T adder + T comparator (=25ns) Total capacitance being switched = C re V DD = V DD,re = 5V Power or reerence datapath = P re = C re V DD,re2 re [A.Chandrakasan, S.Sheng, R.Brodersen, JSSC 4/92] Slide 4
8 Parallel Datapath The clock rate can be reduced by hal with the same throughput par = re /2 V DD,par = V DD,re /.7, C par = 2.5C re P par = 2.5C re (V DD,re /.7) 2 ( re /2) ~ 0.36P re Slide 5 Parallelism Adds Latency Clk time A REG Add Z A A2 A3 A4 A5 ½ Clk Z Z2 Z3 Z4 Z5 A REG Add Clk Z A A3 A5 REG Add Level o parallelism P=2 A2 A4 Z Z2 Z3 Z4 Z5 Slide 6
9 Increasing Level o Parallelism Area: A N N A Re E Op (norm.) Parallelism Improves throughput or the same energy Improves energy or the same throuhtput Cost: increased area Throughput (/FO4) The more parallel the better? Slide 7 The More Parallel the Better? Total Energy Reerence Parallel Supply voltage, V DD Leakage and overhead start to dominate at high levels o parallelism, causing min E to increase Optimum voltage also increases with parallelism Slide 8
10 Pipelined Datapath Critical path delay is less max (T adder, T comparator ) Keeping clock rate constant: pipe = re Voltage can be dropped V DD,pipe = V DD,re /.7 Capacitance slightly higher: C pipe =.5C re P pipe = (.5C re )(V DD,re /.7) 2 re ~ 0.39P re Slide 9 Pipelining Real Lie Example Superscalar processor determine optimal pipeline depth and target requency Power model PowerTimer toolset developed at IBM T.J. Watson RC Methodology to build energy models based on results o circuit-level power analysis tool [V. Srinivasan et al., MICRO 02] Slide 20
11 Timing Model Analytical pipeline model Time per stage o pipe is Ti = ti/si + ci Time to complete FXU operation in presence o stalls T xu = T + Stall xu-xu *T + Stall xu-pu *T2 + + Stall xu-bru *T4 Stall xu-xu = *(s -)+ 2 *(s -2)+ i cond prob. That an FXU instruction m depends on FXU instruction (m-i) Throughput = u /T xu + u 2 /T pu + u 3 /T lsu + u 4 /T bru u i raction o time pipe I has instructions arriving rom FE o the machine u i =0 unutilized pipe, u i = ully utilized [V. Srinivasan et al., MICRO 02] Slide 2 Simulation Results Sec 2000 More stages or lower power! Power 8 FO4 Perormance 0 FO4 Slide 22
12 Simulation Resutls TPC-C Optimal pipeline depth is application dependent Power 23 FO4 Perormance 0 FO4 Slide 23 Choosing a Pipeline Register Faster latch = shallower pipe = higher perormance Slide 24
13 Conclusions rom the Paper Perormance-driven design leads to short pipelines Optimal pipeline depth or a superscalar processor Power: around 20FO4 Perormance: around 0FO4 Reerence: Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Victor Zyuban, Philip N. Strenski, and Philip G. Emma, Optimizing Pipelines or Power and Perormance, in Proc. 35 th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO- 35), November Slide 25 Architecture Summary (Simple Datapath) [A.Chandrakasan, S.Sheng, R.Brodersen, JSSC 4/92] Slide 26
14 Summary: Parallelism and Pipelining A B reerence (a) A B A B (c) pipeline pipeline 2 2 A parallel B (b) parallel Energy/Op reerence It is important to link back to E-D tradeo parallel/pipeline Time/Op Slide 27 Minimum Energy: E Lk /E Sw ~ 0.5 re E Op / nominal E Op re V th -80mV max 0.8V dd re V th -95mV max 0.57V dd 0.2 nominal re V th -40mV parallel max 0.52V pipeline dd E Leakage /E Switching Large (E Lk /E Sw ) opt Flat E op minimum Topology Inv Add Dec (E Lk /E Sw ) opt Optimal designs have high leakage Must adapt to process and activity variations Slide 28
15 Time Multiplexing A A (a) reerence or time-mux reerence A 2 2 time-mux (b) time-multiplex Energy/Op reerence time-mux Time/Op Slide 29 Data Stream Interleaving PE = recursive operation symbol N SVD SVD SVD SVD PE N blocks symbol N symbol PE too ast Large area Interleaved Architecutre N P/S PE S/P N Reduced area P/S overhead Highly pipelined symbol N symbol symbol Slide 30
16 PE Perorms Recursive Operation Interleave = upsample & pipeline / s 2 s C 2 C C 2 C a Slide 3 Data Stream Interleaving Example x(k) time index k x N x 2 x z N z 2 z a+b+m=n N Clk y(k ) c Clk Recursive operation: z(k) = x(k) + c z(k ) N data streams: x, x 2,, x N a z(k) y y 2 y N time index k m c b z Slide 32
17 Folding symbol PE = recursive operation PE PE PE too ast Large area symbol N blocks N symbol Folded Architecutre 0 PE 2 N symbol Reduced area Highly pipelined N symbol Slide 33 Folding Example 6 data streams data sorting c 6 c 2 c y 4 (k) y 3 (k) 64 clk cycles s= s= y 2 (k) s= c 6 y (k) s=0 c y (k) 0 s in PE * 4 Clk y 4 (k) y 3 (k) in y (k) y 2 (k) Folding = upsampling & pipelining Reduced area (shared datapath logic) Slide 34
18 Area Beneit o Interleaving and Folding Area: A = A logic + A registers Interleaving or Folding o level N A = A logic + N A registers Timing and Energy stay the same Energy/Op upsample pipeline Time/Op Slide 35 Architectural Transormations Procedure: move toward desired E-D point while minimizing area Energy V DD scaling reerence reerence Area 0 Delay Slide 36
19 Architectural Transormations Parallelism & Pipelining reduce Energy, increase Area Energy V DD scaling reerence parallel pipeline Area 0 reerence pipeline, parallel Delay Slide 37 Architectural Transormations Time-Multiplexing increase Energy, reduce Area Energy time-mux reerence parallel pipeline Area 0 V DD scaling time-mux reerence pipeline, parallel Delay Slide 38
20 Architectural Transormations Interleaving & Folding const Energy, reduce Area parallel reerence Energy time-mux pipeline intl, old Area 0 intl, old V DD scaling time-mux reerence pipeline, parallel Delay Slide 39 Back to Sensitivity Analysis small T Op with E Op small E op with T Op (Sens > ) (Sens < ) parallelism good to save energy time-mux good to save area Slide 40
21 Energy-Area Tradeo High throughput: Parallelism = Large Area 4 3 parallelism 2 2 time-mux b ALU Max E op A = A = 5 3 A re A re T target Low throughput: Time-Mux = Small Area Slide 4 It is Basically a Time-Space Tradeo re E op / E op Higher throughput re 3T op re T op /3 re T op /4 re 4T op re T op 0. re A / A 0 op op Slide 42
22 Another Requirement: Flexibility Determining how much to include and how to do it in the most eicient way possible Claims (to be shown) There are good reasons or lexibility The cost o lexibility is orders o magnitude o ineiciency over an optimized solution There are many dierent ways to provide lexibility [Remaining slides: courtesy o Pro. Bob Brodersen, UCB] Slide 43 Good Reasons or Flexibility One design or a number o SoC customers more sales volume Customers able to provide added value and uniqueness Unsure o speciication or can t make a decision Backwards compatibility with debugged sotware Risk, cost and time o implementing hardwired solutions Important to note: these are business, not technical reasons Slide 44
23 So, What is the Cost o Flexibility? We need technical metrics that we can use to compare lexible and non-lexible implementations A power metric because o thermal limitations An energy metric or portable operation A cost metric related to the area o the chip Perormance (computational throughput) Let s use metrics normalized to the amount o computation being perormed so now lets deine computation Slide 45 Deinitions Computation Operation = OP =algorithmically interesting computation (i.e. multiply, add, delay) MOPS = Millions o OP s per Second N op = Number o parallel OP s in each clock cycle Power P chip = Total power o chip = A chip C sw V DD2 clk C sw = Switched Capacitance / mm 2 =P chip / (A chip V DD2 clk ) Area A chip = Total area o chip A op = Average area o each operation = A chip / N op Slide 46
24 Energy Eiciency Metric: MOPS/mW How much computing (number o operations) can we can do with a inite energy source (e.g. battery)? Energy eiciency = = Number o useul operations Energy required Number o operations NanoJoule = OP/sec nj/sec = MOPS mw = Power eiciency = OP nj Energy eiciency = Power eiciency Slide 47 Energy and Power Eiciency OP/nJ = MOPS/mW Interestingly, the energy eiciency metric or energy constrained applications (OP/nJ) or a ixed number o operations, is the same as that or thermal (power) considerations when maximizing throughput (MOPS/mW). So let s look at a number o chips to see how these eiciency numbers compare. Slide 48
25 ISSCC Chips (0.8µm 0.25µm) Chip Year Paper Description Chip Year Paper Description S/ Strong-Arm PPC Comm G Graphics Alpha Multimedia P Multimedia Alpha MPEG Dec PPC Multimedia Microprocessors General purpose DSPs Dedicated designs Encryption Hearing Aid FIR MPEG Dec a Slide 49 Energy Eiciency (MOPS/mW or OP/nJ) 000 Dedicated Energy (Power) Eiciency MOPS/mW Microprocessors General Purpose DSP 3 orders o Magnitude! Chip Number Slide 50
26 Why Such a Big Dierence? Lets look at the components o MOPS/mW The operations per second: MOPS = clk N op The power: P chip = A chip C sw V DD 2 clk The ratio (MOPS / P chip ) gives the MOPS/mW = ( clk N op ) / (A chip C sw V DD2 clk ) Simpliying, MOPS/mW = / (A op C sw V DD2 ) So lets look at the 3 components: V DD, C sw and A op Slide 5 Supply Voltage, V DD 3 MOPS/mW = / (A op C sw V DD2 ) Vdd (Volts) Microprocessors General Purpose DSP Dedicated Chip Number Supply voltage isn t the cause o the dierence. (it s actually a bit higher or the dedicated chips) Slide 52
27 Switched Capacitance, C sw (pf/mm 2 ) MOPS/mW = / (A op C sw V DD2 ) 0 Csw (p/mm 2 ) General Purpose DSP Dedicated 30 Microprocessors Chip Number C sw is lower or dedicated, but only by a actor o 2-3 Slide 53 A op = Area per operation (A chip /N op ) MOPS/mW = / (A op C sw V DD2 ) 000 Aop (mm 2 per operation) Microprocessors General Purpose DSP Dedicated Chip Number A op explains the dierence: more parallelism (higher N op ) in a smaller chip area (less overhead) Slide 54
28 Let s Look at Some Chips to Actually See the Dierent Architectures We ll look at one rom each category Energy (Power) Eiciency ( MOPS/mW ) Microprocessors PPC General Purpose DSP NEC DSP MUD Dedicated Chip Number Slide 55 Microprocessor: MOPS/mW = 0.3 The only circuitry which supports useul operations All the rest is overhead to support the time multiplexing N op = 2 clock = 450 MHz (2 way) => 900 MIPS Two operations each clock cycle, so A op = A chip /2= 42mm 2 Power = 7 Watts Slide 56
29 General Purpose DSP: MOPS/mW = 7 Same granularity (a datapath), more parallelism 4 Parallel processors (4 ops each) N op = 6 clock = 50 MHz => 800 MOPS Sixteen operations each clock cycle, so A op = A chip /6= 5.3mm 2 Power = 0 mw Slide 57 Dedicated Design: MOPS/mW=200 Complex mult/add (8 ops) Fully parallel mapping o adaptive correlator algorithm. No time multiplexing. N op = 96 clock = 25 MHz => 2400 MOPS A op = 5.4 mm 2 /96 =0.5 mm 2 Power = 2 mw Slide 58
30 The Basic Problem is Time Multiplexing Processor architectures obtain perormance by increasing the clock rate, because the parallelism is low Results in ever increasing memory on the chip, high control overhead and ast area consuming logic But doesn t time multiplexing give better area eiciency? Slide 59 Area Eiciency SOC based devices are oten very cost sensitive So we need a $ cost metric => or SOC s that is equivalent to the eiciency o area utilization Area Eiciency Metric: Computation per unit area = MOPS/mm 2 How much o a $ cost (area) penalty will we have i we put down many parallel hardware units and have limited time multiplexing? Slide 60
31 Surprisingly, the Area Eiciency Roughly Tracks the Energy Eiciency MOPS/mm2 00 Microprocessors ~2 orders o magnitude 0 General Purpose DSP Chip Number Dedicated The overhead o lexibility in processor architectures is so high that there is even an area penalty Slide 6 Hardware / Sotware Conclusion: There is no sotware/hardware tradeo. The dierence between hardware and sotware in perormance, power and area is so large that there is no tradeo. It is reasons other than power, energy, perormance or cost that drives a sotware solution (e.g. business, legacy, ). The Cost o Flexibility is extremely high, so the other reasons better be good! Slide 62
ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures
ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge
More informationSimulink Design Environment
EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 4 Simulink Design Environment Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki Material being constantly updated Please
More informationPOWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS
POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationSimulink-Hardware Flow
5/2/22 EE26B: VLSI Signal Processing Simulink-Hardware Flow Prof. Dejan Marković ee26b@gmail.com Development Multiple design descriptions Algorithm (MATLAB or C) Fixed point description RTL (behavioral,
More information13. Power Optimization
13. Power Optimization May 2013 QII52016-13.0.0 QII52016-13.0.0 The Quartus II sotware oers power-driven compilation to ully optimize device power consumption. Power-driven compilation ocuses on reducing
More informationEE 466/586 VLSI Design. Partha Pande School of EECS Washington State University
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 18 Implementation Methods The Design Productivity Challenge Logic Transistors per Chip (K) 10,000,000.10m
More informationEE241 - Spring 2004 Advanced Digital Integrated Circuits
EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.
More informationFlexible wireless communication architectures
Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationA 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing
A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine
More informationECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego
Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies
More informationMulti-Gigahertz Parallel FFTs for FPGA and ASIC Implementation
Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain
More informationProcessor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.
Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationCSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen
CSE 548 Computer Architecture Clock Rate vs IPC V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger Presented by: Ning Chen Transistor Changes Development of silicon fabrication technology caused transistor
More informationEECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC
EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC April 12, 2012 John Wawrzynek Spring 2012 EECS150 - Lec24-hdl3 Page 1 Parallelism Parallelism is the act of doing more than one thing
More informationAn Overview of Standard Cell Based Digital VLSI Design
An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,
More informationHigh performance, power-efficient DSPs based on the TI C64x
High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research
More informationOptimizing Pipelines for Power and Performance
Optimizing Pipelines for Power and Performance Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Victor Zyuban, Philip N. Strenski, Philip G. Emma IBM T.J. Watson Research Center Yorktown Heights,
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More information10. SOPC Builder Component Development Walkthrough
10. SOPC Builder Component Development Walkthrough QII54007-9.0.0 Introduction This chapter describes the parts o a custom SOPC Builder component and guides you through the process o creating an example
More informationWordlength Optimization
EE216B: VLSI Signal Processing Wordlength Optimization Prof. Dejan Marković ee216b@gmail.com Number Systems: Algebraic Algebraic Number e.g. a = + b [1] High level abstraction Infinite precision Often
More informationAn Asynchronous Array of Simple Processors for DSP Applications
An Asynchronous Array of Simple Processors for DSP Applications Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, Bevan Baas
More information2. Recommended Design Flow
2. Recommended Design Flow This chapter describes the Altera-recommended design low or successully implementing external memory interaces in Altera devices. Altera recommends that you create an example
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationDSP Design Flow User Guide
DSP Design Flow User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com Document Date: June 2009 Copyright 2009 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company,
More informationECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141
ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition
More informationEECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)
A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey EECS Dept., University
More informationDesign Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Design Methodologies December 10, 2002 L o g i c T r a n s i s t o r s p e r C h i p ( K ) 1 9 8 1 1
More informationResponse Time and Throughput
Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing
More informationDesign Methodologies
Design Methodologies 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Complexity Productivity (K) Trans./Staff - Mo. Productivity Trends Logic Transistor per Chip (M) 10,000 0.1
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationMOS High Performance Arithmetic
MOS High Performance Arithmetic Mark Horowitz Stanford University horowitz@ee.stanford.edu Arithmetic Is Important Then Now (TegraK1) 2 What Is Hard? 9999999 + 1 3 Proof: 1 st Gen 2 nd Gen 4 And Getting
More informationComputing s Energy Problem:
Computing s Energy Problem: (and what we can do about it) Mark Horowitz Stanford University horowitz@ee.stanford.edu Everything Has A Computer Inside 2 The Reason is Simple: Moore s Law Made Gates Cheap
More informationAdaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010
Adaptive Voltage Scaling (AVS) Alex Vainberg Email: alex.vainberg@nsc.com October 13, 2010 Agenda AVS Introduction, Technology and Architecture Design Implementation Hardware Performance Monitors Overview
More informationOn GPU Bus Power Reduction with 3D IC Technologies
On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The
More informationTechnology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas
Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist
More informationAdvanced Synthesis Techniques
Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationPower Reduction Techniques in the Memory System. Typical Memory Hierarchy
Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationFrequency and Voltage Scaling Design. Ruixing Yang
Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages
More informationCIS 371 Spring 2010 Thu. 4 March 2010
1 Computer Organization and Design Midterm Exam Solutions CIS 371 Spring 2010 Thu. 4 March 2010 This exam is closed book and note. You may use one double-sided sheet of notes, but no magnifying glasses!
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance
Chapter 1 Computer Abstractions and Technology Lesson 2: Understanding Performance Indeed, the cost-performance ratio of the product will depend most heavily on the implementer, just as ease of use depends
More informationTIMA Lab. Research Reports
ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,
More informationConfigurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.
Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the
More informationAn Overview of a Compiler for Mapping MATLAB Programs onto FPGAs
An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu
More informationPower Optimization for Universal Hash Function Data Path Using Divide-and-Concatenate Technique
Poer Optimization or Universal Hash Function Data Path Using Divide-and-Concatenate Technique Bo Yang, and Ramesh Karri Dept. o Electrical and Computer Engineering, Polytechnic University Brooklyn, NY,
More informationA fast and area-efficient FPGA-based architecture for high accuracy logarithm approximation
A ast and area-eicient FPGA-based architecture or high accuracy logarithm approximation Dimitris Bariamis, Dimitris Maroulis, Dimitris K. Iakovidis Department o Inormatics and Telecommunications University
More informationECE 747 Digital Signal Processing Architecture. ESL Design Methodologies
ECE 747 Digital Signal Processing Architecture ESL Design Methodologies Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 747 Spring 2006 Slide 1 What is ESL Design?
More informationOverview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions
Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,
More informationIntroduction to CMOS VLSI Design Lecture 13: SRAM
Introduction to CMOS VLSI Design Lecture 13: SRAM David Harris Harvey Mudd College Spring 2004 1 Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Serial Access
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 19: Verilog and Processor Performance Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Verilog Basics Hardware description language
More informationURL: Offered by: Should already know how to design with logic. Will learn...
00 1 EE 3755 Computer Organization 00 1 URL: http://www.ece.lsu.edu/ee3755 Offered by: David M. Koppelman Room 3191 P. Taylor Hall 578-5482, koppel@ece.lsu.edu, http://www.ece.lsu.edu/koppel Tentative
More informationDesign Methodologies. Kai Huang
Design Methodologies Kai Huang News Is that real? In such a thermally constrained environment, going quad-core only makes sense if you can properly power gate/turbo up when some cores are idle. I have
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationA 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling
A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge,
More informationMemory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM
ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationAdaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware. Objective/Approach/Process
Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware John Zaino, Eric Pauer, Ken Smith, Paul Fiore, Jairam Ramanathan, Cory Myers {john.c.aino, ken.smith,
More informationThe Use of the No-Instruction-Set Computer (NISC) for Acceleration in WISHBONE-Based Systems
The Use o the No-Instruction-Set Computer () or Acceleration in WISHBONE-Based Systems Roko Grubišić, Vlado Sruk Technical Report 11-14-2008 Department o Electronics, Microelectronics, Computer and Intelligent
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationEmerging DRAM Technologies
1 Emerging DRAM Technologies Michael Thiems amt051@email.mot.com DigitalDNA Systems Architecture Laboratory Motorola Labs 2 Motivation DRAM and the memory subsystem significantly impacts the performance
More information310/ ICTP-INFN Advanced Tranining Course on FPGA and VHDL for Hardware Simulation and Synthesis 27 November - 22 December 2006
310/1780-18 ICTP-INFN Advanced Tranining Course on FPGA and VHDL for Hardware Simulation and Synthesis 27 November - 22 December 2006 Design Methodology Tools Jorgen CHRISTIANSEN PH-ED CERN CH-1221 Geneva
More informationVerilog for High Performance
Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes
More informationRTL Power Estimation and Optimization
Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL
More informationDigital Integrated Circuits
Digital Integrated Circuits EE141 Fall 2005 Tu & Th 11-12:30 203 McLaughlin What is This Class About? Introduction to Digital Integrated Circuits Introduction: Issues in digital design CMOS devices and
More information13. Power Management in Stratix IV Devices
February 2011 SIV51013-3.2 13. Power Management in Stratix IV Devices SIV51013-3.2 This chapter describes power management in Stratix IV devices. Stratix IV devices oer programmable power technology options
More informationAdvanced Design System 1.5. DSP Synthesis
Advanced Design System 1.5 DSP Synthesis December 2000 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard
More informationMore Course Information
More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well
More informationHardware/Software Partitioning for SoCs. EECE Advanced Topics in VLSI Design Spring 2009 Brad Quinton
Hardware/Software Partitioning for SoCs EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton Goals of this Lecture Automatic hardware/software partitioning is big topic... In this lecture,
More informationMemory Arrays. Array Architecture. Chapter 16 Memory Circuits and Chapter 12 Array Subsystems from CMOS VLSI Design by Weste and Harris, 4 th Edition
Chapter 6 Memory Circuits and Chapter rray Subsystems from CMOS VLSI Design by Weste and Harris, th Edition E E 80 Introduction to nalog and Digital VLSI Paul M. Furth New Mexico State University Static
More informationComputer Systems Architecture Spring 2016
Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationL2: Design Representations
CS250 VLSI Systems Design L2: Design Representations John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) Engineering Challenge Application Gap usually too large to bridge in one step,
More informationPower-Optimal Pipelining in Deep Submicron Technology Λ
8. Power-Optimal Pipelining in Deep Submicron Technology Λ Seongmoo Heo and Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory Vassar Street, Cambridge, MA 9 fheomoo,krsteg@csail.mit.edu
More informationWhat is this class all about?
EE141-Fall 2007 Digital Integrated Circuits Instructor: Elad Alon TuTh 3:30-5pm 155 Donner 1 1 What is this class all about? Introduction to digital integrated circuit design engineering Will describe
More informationAn Analytic Model for Embedded Machine Vision: Architecture and Performance Exploration
419 An Analytic Model or Embedded Machine Vision: Architecture and Perormance Exploration Chan Kit Wai, Prahlad Vadakkepat, Tan Kok Kiong Department o Electrical and Computer Engineering, 4 Engineering
More informationReduce Your System Power Consumption with Altera FPGAs Altera Corporation Public
Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary
More informationLinking Layout to Logic Synthesis: A Unification-Based Approach
Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and
More informationUnderstanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters
Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters TIPL 4703 Presented by Ken Chan Prepared by Ken Chan 1 Table o Contents What is SNR Deinition o SNR Components
More informationHead, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India
Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the
More informationEECS 244 Computer-Aided Design of Integrated Circuits and Systems
EECS 244 Computer-Aided Design of Integrated Circuits and Systems Professor A. Richard Newton Room 566 Cory Hall 642-2967, rnewton@ic.eecs Office Hours: Tu. Th. 3:30-4:30pm Fall 1997 Administrative Details
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017
Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of
More informationCS 250 VLSI Design Lecture 11 Design Verification
CS 250 VLSI Design Lecture 11 Design Verification 2012-9-27 John Wawrzynek Jonathan Bachrach Krste Asanović John Lazzaro TA: Rimas Avizienis www-inst.eecs.berkeley.edu/~cs250/ IBM Power 4 174 Million Transistors
More informationTowards Optimal Custom Instruction Processors
Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors
More information2. Design Planning with the Quartus II Software
November 2013 QII51016-13.1.0 2. Design Planning with the Quartus II Sotware QII51016-13.1.0 This chapter discusses key FPGA design planning considerations, provides recommendations, and describes various
More informationVLSI Signal Processing
VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface
More informationDesign of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture
Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and
More informationSynthesizable FPGA Fabrics Targetable by the VTR CAD Tool
Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design
More informationA 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013
A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias David Kidd August 26, 2013 1 HOTCHIPS 2013 Copyright 2013 SuVolta, Inc. All rights reserved. Agenda DDC transistor and PowerShrink platform
More informationIntroduction to SRAM. Jasur Hanbaba
Introduction to SRAM Jasur Hanbaba Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Non-volatile Memory Manufacturing Flow Memory Arrays Memory Arrays Random Access Memory Serial
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More information