Architecture Techniques

Size: px
Start display at page:

Download "Architecture Techniques"

Transcription

1 EE29A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 3 Architecture Techniques Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki up and running Go to: EEWeb / Online Lab Please sign up using your UCLA EE user name (I need this or veriication purposes) Homework # coming up on Wed Background, circuit and microarchiecture techniques Slide 2

2 Real Lie Example: 802.a Baseband Viterbi Decoder MAC Core DMA PCI Time/Freq Synch ADC/DAC FSM AGC FFT Direct mapped architecture 200 MOPS/mW 80 MHz clock! 40 GOPS Power = 200mW 0.25µm CMOS The architecture has to track technology Atheros 802.a baseband processor Slide 3 Wireless Baseband Chip Design Direct mapping is the most energy-eicient Technology is too ast or dedicated hardware Opportunity to urther reduce energy and area Energy Eiciency Speed o technology Microprocessors Programmable DSPs Hardwired Logic GHz 00 s o MHz 0 s o MHz Clock Period Slide 4

3 A Number o Variables to Consider How to optimally combine all variables? E Lk W e V Vth V 3 dd T D ( V E dd V V W 0 2 Sw V dd W dd α th) Speed o technology (ast) Required speed (slow) pipelining parallelism time-multiplexing Vdd sizing Vth Clock Period 2 orders o magnitude (and growing ) Slide 5 Introduction to Architecture Optimization E-A-D E-D Algorithm Modeling DSP Architectures Circuit Optimization Tsample Tclk Power Area Timing Simulink Cadence RTL Slide 6

4 Architectural Feedback rom Technology Simulink hardware library implicitly carries inormation only about latency and wordlength (we can later choose sample period when targeting an FPGA) For ASIC low, block characterization also has to include technology eatures such as speed, power, and area But, technology parameters scale each generation Need a general and quick characterization methodology Propagate results back to Simulink to avoid iterations Slide 7 Architecture-Circuit Co-design behavioral E-D DSP Architectures Circuit Optimization Tclk HDL logical L physical Architectural Feedback Pre-layout Post-layout Speed Power Area Re- synthesis Speed Power Area Slide 8

5 Starting Point: Datapath Characterization Balance tradeos due to gate size (W) and supply (V DD ) Energy 0 W Min delay W V DD re V DD scaling Target delay Delay Circuit Level Optimal design point Curves due to W and V DD are tangent (equal sensitivity) Goal: keep all pipelines at the same E-D point Slide 9 Cycle Time is Common or All Blocks Simulink RTL Synopsys latency add Area Power Speed mult cycle time (norm.) netlist HSPICE Switch-level accuracy Speed Power Area Slide 0

6 Next Step: Block Characterization Goal: balance logic depth within a block Latency 0 Target T Clk add Speed Power Area mult Cycle time Micro-Architecture Level Select block latency to achieve target T Clk Balances pipeline logic depth Apply W and V DD scaling to the underlying pipelines Slide Architectural Feedback to Simulink Characterize blocks with predetermined wordlength Translate timing speciication to a target supply voltage Determine optimal latency or a given cycle time Energy (norm.) V Simulated FO4 inverter (Vdd scaling) Target speed speed (nominal Vdd) (nominal V DD ) 0.6V Desired point (optimal Target Vdd) speed (a) Delay (norm.) Latency (b) m=8 add a=2 Synthesized blocks (nominal Vdd) Target speed Area Power Speed mult Cycle time (norm.) Slide 2

7 Basic Micro-Architectural Techniques Parallelism, pipelining, time-multiplexing A B (a) reerence A B A B (c) pipeline A A (d) reerence or time-mux A B (b) parallel A 2 (e) time-multiplex Slide 3 Architecture Trade-Os : Reerence Datapath Critical path delay T adder + T comparator (=25ns) Total capacitance being switched = C re V DD = V DD,re = 5V Power or reerence datapath = P re = C re V DD,re2 re [A.Chandrakasan, S.Sheng, R.Brodersen, JSSC 4/92] Slide 4

8 Parallel Datapath The clock rate can be reduced by hal with the same throughput par = re /2 V DD,par = V DD,re /.7, C par = 2.5C re P par = 2.5C re (V DD,re /.7) 2 ( re /2) ~ 0.36P re Slide 5 Parallelism Adds Latency Clk time A REG Add Z A A2 A3 A4 A5 ½ Clk Z Z2 Z3 Z4 Z5 A REG Add Clk Z A A3 A5 REG Add Level o parallelism P=2 A2 A4 Z Z2 Z3 Z4 Z5 Slide 6

9 Increasing Level o Parallelism Area: A N N A Re E Op (norm.) Parallelism Improves throughput or the same energy Improves energy or the same throuhtput Cost: increased area Throughput (/FO4) The more parallel the better? Slide 7 The More Parallel the Better? Total Energy Reerence Parallel Supply voltage, V DD Leakage and overhead start to dominate at high levels o parallelism, causing min E to increase Optimum voltage also increases with parallelism Slide 8

10 Pipelined Datapath Critical path delay is less max (T adder, T comparator ) Keeping clock rate constant: pipe = re Voltage can be dropped V DD,pipe = V DD,re /.7 Capacitance slightly higher: C pipe =.5C re P pipe = (.5C re )(V DD,re /.7) 2 re ~ 0.39P re Slide 9 Pipelining Real Lie Example Superscalar processor determine optimal pipeline depth and target requency Power model PowerTimer toolset developed at IBM T.J. Watson RC Methodology to build energy models based on results o circuit-level power analysis tool [V. Srinivasan et al., MICRO 02] Slide 20

11 Timing Model Analytical pipeline model Time per stage o pipe is Ti = ti/si + ci Time to complete FXU operation in presence o stalls T xu = T + Stall xu-xu *T + Stall xu-pu *T2 + + Stall xu-bru *T4 Stall xu-xu = *(s -)+ 2 *(s -2)+ i cond prob. That an FXU instruction m depends on FXU instruction (m-i) Throughput = u /T xu + u 2 /T pu + u 3 /T lsu + u 4 /T bru u i raction o time pipe I has instructions arriving rom FE o the machine u i =0 unutilized pipe, u i = ully utilized [V. Srinivasan et al., MICRO 02] Slide 2 Simulation Results Sec 2000 More stages or lower power! Power 8 FO4 Perormance 0 FO4 Slide 22

12 Simulation Resutls TPC-C Optimal pipeline depth is application dependent Power 23 FO4 Perormance 0 FO4 Slide 23 Choosing a Pipeline Register Faster latch = shallower pipe = higher perormance Slide 24

13 Conclusions rom the Paper Perormance-driven design leads to short pipelines Optimal pipeline depth or a superscalar processor Power: around 20FO4 Perormance: around 0FO4 Reerence: Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Victor Zyuban, Philip N. Strenski, and Philip G. Emma, Optimizing Pipelines or Power and Perormance, in Proc. 35 th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO- 35), November Slide 25 Architecture Summary (Simple Datapath) [A.Chandrakasan, S.Sheng, R.Brodersen, JSSC 4/92] Slide 26

14 Summary: Parallelism and Pipelining A B reerence (a) A B A B (c) pipeline pipeline 2 2 A parallel B (b) parallel Energy/Op reerence It is important to link back to E-D tradeo parallel/pipeline Time/Op Slide 27 Minimum Energy: E Lk /E Sw ~ 0.5 re E Op / nominal E Op re V th -80mV max 0.8V dd re V th -95mV max 0.57V dd 0.2 nominal re V th -40mV parallel max 0.52V pipeline dd E Leakage /E Switching Large (E Lk /E Sw ) opt Flat E op minimum Topology Inv Add Dec (E Lk /E Sw ) opt Optimal designs have high leakage Must adapt to process and activity variations Slide 28

15 Time Multiplexing A A (a) reerence or time-mux reerence A 2 2 time-mux (b) time-multiplex Energy/Op reerence time-mux Time/Op Slide 29 Data Stream Interleaving PE = recursive operation symbol N SVD SVD SVD SVD PE N blocks symbol N symbol PE too ast Large area Interleaved Architecutre N P/S PE S/P N Reduced area P/S overhead Highly pipelined symbol N symbol symbol Slide 30

16 PE Perorms Recursive Operation Interleave = upsample & pipeline / s 2 s C 2 C C 2 C a Slide 3 Data Stream Interleaving Example x(k) time index k x N x 2 x z N z 2 z a+b+m=n N Clk y(k ) c Clk Recursive operation: z(k) = x(k) + c z(k ) N data streams: x, x 2,, x N a z(k) y y 2 y N time index k m c b z Slide 32

17 Folding symbol PE = recursive operation PE PE PE too ast Large area symbol N blocks N symbol Folded Architecutre 0 PE 2 N symbol Reduced area Highly pipelined N symbol Slide 33 Folding Example 6 data streams data sorting c 6 c 2 c y 4 (k) y 3 (k) 64 clk cycles s= s= y 2 (k) s= c 6 y (k) s=0 c y (k) 0 s in PE * 4 Clk y 4 (k) y 3 (k) in y (k) y 2 (k) Folding = upsampling & pipelining Reduced area (shared datapath logic) Slide 34

18 Area Beneit o Interleaving and Folding Area: A = A logic + A registers Interleaving or Folding o level N A = A logic + N A registers Timing and Energy stay the same Energy/Op upsample pipeline Time/Op Slide 35 Architectural Transormations Procedure: move toward desired E-D point while minimizing area Energy V DD scaling reerence reerence Area 0 Delay Slide 36

19 Architectural Transormations Parallelism & Pipelining reduce Energy, increase Area Energy V DD scaling reerence parallel pipeline Area 0 reerence pipeline, parallel Delay Slide 37 Architectural Transormations Time-Multiplexing increase Energy, reduce Area Energy time-mux reerence parallel pipeline Area 0 V DD scaling time-mux reerence pipeline, parallel Delay Slide 38

20 Architectural Transormations Interleaving & Folding const Energy, reduce Area parallel reerence Energy time-mux pipeline intl, old Area 0 intl, old V DD scaling time-mux reerence pipeline, parallel Delay Slide 39 Back to Sensitivity Analysis small T Op with E Op small E op with T Op (Sens > ) (Sens < ) parallelism good to save energy time-mux good to save area Slide 40

21 Energy-Area Tradeo High throughput: Parallelism = Large Area 4 3 parallelism 2 2 time-mux b ALU Max E op A = A = 5 3 A re A re T target Low throughput: Time-Mux = Small Area Slide 4 It is Basically a Time-Space Tradeo re E op / E op Higher throughput re 3T op re T op /3 re T op /4 re 4T op re T op 0. re A / A 0 op op Slide 42

22 Another Requirement: Flexibility Determining how much to include and how to do it in the most eicient way possible Claims (to be shown) There are good reasons or lexibility The cost o lexibility is orders o magnitude o ineiciency over an optimized solution There are many dierent ways to provide lexibility [Remaining slides: courtesy o Pro. Bob Brodersen, UCB] Slide 43 Good Reasons or Flexibility One design or a number o SoC customers more sales volume Customers able to provide added value and uniqueness Unsure o speciication or can t make a decision Backwards compatibility with debugged sotware Risk, cost and time o implementing hardwired solutions Important to note: these are business, not technical reasons Slide 44

23 So, What is the Cost o Flexibility? We need technical metrics that we can use to compare lexible and non-lexible implementations A power metric because o thermal limitations An energy metric or portable operation A cost metric related to the area o the chip Perormance (computational throughput) Let s use metrics normalized to the amount o computation being perormed so now lets deine computation Slide 45 Deinitions Computation Operation = OP =algorithmically interesting computation (i.e. multiply, add, delay) MOPS = Millions o OP s per Second N op = Number o parallel OP s in each clock cycle Power P chip = Total power o chip = A chip C sw V DD2 clk C sw = Switched Capacitance / mm 2 =P chip / (A chip V DD2 clk ) Area A chip = Total area o chip A op = Average area o each operation = A chip / N op Slide 46

24 Energy Eiciency Metric: MOPS/mW How much computing (number o operations) can we can do with a inite energy source (e.g. battery)? Energy eiciency = = Number o useul operations Energy required Number o operations NanoJoule = OP/sec nj/sec = MOPS mw = Power eiciency = OP nj Energy eiciency = Power eiciency Slide 47 Energy and Power Eiciency OP/nJ = MOPS/mW Interestingly, the energy eiciency metric or energy constrained applications (OP/nJ) or a ixed number o operations, is the same as that or thermal (power) considerations when maximizing throughput (MOPS/mW). So let s look at a number o chips to see how these eiciency numbers compare. Slide 48

25 ISSCC Chips (0.8µm 0.25µm) Chip Year Paper Description Chip Year Paper Description S/ Strong-Arm PPC Comm G Graphics Alpha Multimedia P Multimedia Alpha MPEG Dec PPC Multimedia Microprocessors General purpose DSPs Dedicated designs Encryption Hearing Aid FIR MPEG Dec a Slide 49 Energy Eiciency (MOPS/mW or OP/nJ) 000 Dedicated Energy (Power) Eiciency MOPS/mW Microprocessors General Purpose DSP 3 orders o Magnitude! Chip Number Slide 50

26 Why Such a Big Dierence? Lets look at the components o MOPS/mW The operations per second: MOPS = clk N op The power: P chip = A chip C sw V DD 2 clk The ratio (MOPS / P chip ) gives the MOPS/mW = ( clk N op ) / (A chip C sw V DD2 clk ) Simpliying, MOPS/mW = / (A op C sw V DD2 ) So lets look at the 3 components: V DD, C sw and A op Slide 5 Supply Voltage, V DD 3 MOPS/mW = / (A op C sw V DD2 ) Vdd (Volts) Microprocessors General Purpose DSP Dedicated Chip Number Supply voltage isn t the cause o the dierence. (it s actually a bit higher or the dedicated chips) Slide 52

27 Switched Capacitance, C sw (pf/mm 2 ) MOPS/mW = / (A op C sw V DD2 ) 0 Csw (p/mm 2 ) General Purpose DSP Dedicated 30 Microprocessors Chip Number C sw is lower or dedicated, but only by a actor o 2-3 Slide 53 A op = Area per operation (A chip /N op ) MOPS/mW = / (A op C sw V DD2 ) 000 Aop (mm 2 per operation) Microprocessors General Purpose DSP Dedicated Chip Number A op explains the dierence: more parallelism (higher N op ) in a smaller chip area (less overhead) Slide 54

28 Let s Look at Some Chips to Actually See the Dierent Architectures We ll look at one rom each category Energy (Power) Eiciency ( MOPS/mW ) Microprocessors PPC General Purpose DSP NEC DSP MUD Dedicated Chip Number Slide 55 Microprocessor: MOPS/mW = 0.3 The only circuitry which supports useul operations All the rest is overhead to support the time multiplexing N op = 2 clock = 450 MHz (2 way) => 900 MIPS Two operations each clock cycle, so A op = A chip /2= 42mm 2 Power = 7 Watts Slide 56

29 General Purpose DSP: MOPS/mW = 7 Same granularity (a datapath), more parallelism 4 Parallel processors (4 ops each) N op = 6 clock = 50 MHz => 800 MOPS Sixteen operations each clock cycle, so A op = A chip /6= 5.3mm 2 Power = 0 mw Slide 57 Dedicated Design: MOPS/mW=200 Complex mult/add (8 ops) Fully parallel mapping o adaptive correlator algorithm. No time multiplexing. N op = 96 clock = 25 MHz => 2400 MOPS A op = 5.4 mm 2 /96 =0.5 mm 2 Power = 2 mw Slide 58

30 The Basic Problem is Time Multiplexing Processor architectures obtain perormance by increasing the clock rate, because the parallelism is low Results in ever increasing memory on the chip, high control overhead and ast area consuming logic But doesn t time multiplexing give better area eiciency? Slide 59 Area Eiciency SOC based devices are oten very cost sensitive So we need a $ cost metric => or SOC s that is equivalent to the eiciency o area utilization Area Eiciency Metric: Computation per unit area = MOPS/mm 2 How much o a $ cost (area) penalty will we have i we put down many parallel hardware units and have limited time multiplexing? Slide 60

31 Surprisingly, the Area Eiciency Roughly Tracks the Energy Eiciency MOPS/mm2 00 Microprocessors ~2 orders o magnitude 0 General Purpose DSP Chip Number Dedicated The overhead o lexibility in processor architectures is so high that there is even an area penalty Slide 6 Hardware / Sotware Conclusion: There is no sotware/hardware tradeo. The dierence between hardware and sotware in perormance, power and area is so large that there is no tradeo. It is reasons other than power, energy, perormance or cost that drives a sotware solution (e.g. business, legacy, ). The Cost o Flexibility is extremely high, so the other reasons better be good! Slide 62

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge

More information

Simulink Design Environment

Simulink Design Environment EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 4 Simulink Design Environment Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki Material being constantly updated Please

More information

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Simulink-Hardware Flow

Simulink-Hardware Flow 5/2/22 EE26B: VLSI Signal Processing Simulink-Hardware Flow Prof. Dejan Marković ee26b@gmail.com Development Multiple design descriptions Algorithm (MATLAB or C) Fixed point description RTL (behavioral,

More information

13. Power Optimization

13. Power Optimization 13. Power Optimization May 2013 QII52016-13.0.0 QII52016-13.0.0 The Quartus II sotware oers power-driven compilation to ully optimize device power consumption. Power-driven compilation ocuses on reducing

More information

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University EE 466/586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 18 Implementation Methods The Design Productivity Challenge Logic Transistors per Chip (K) 10,000,000.10m

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE241 - Spring 2004 Advanced Digital Integrated Circuits EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies

More information

Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation

Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation Doug Johnson, Applications Consultant Chris Eddington, Technical Marketing Synopsys 2013 1 Synopsys, Inc. 700 E. Middlefield Road Mountain

More information

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley. Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

CSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen

CSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen CSE 548 Computer Architecture Clock Rate vs IPC V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger Presented by: Ning Chen Transistor Changes Development of silicon fabrication technology caused transistor

More information

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC April 12, 2012 John Wawrzynek Spring 2012 EECS150 - Lec24-hdl3 Page 1 Parallelism Parallelism is the act of doing more than one thing

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

Optimizing Pipelines for Power and Performance

Optimizing Pipelines for Power and Performance Optimizing Pipelines for Power and Performance Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Victor Zyuban, Philip N. Strenski, Philip G. Emma IBM T.J. Watson Research Center Yorktown Heights,

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

10. SOPC Builder Component Development Walkthrough

10. SOPC Builder Component Development Walkthrough 10. SOPC Builder Component Development Walkthrough QII54007-9.0.0 Introduction This chapter describes the parts o a custom SOPC Builder component and guides you through the process o creating an example

More information

Wordlength Optimization

Wordlength Optimization EE216B: VLSI Signal Processing Wordlength Optimization Prof. Dejan Marković ee216b@gmail.com Number Systems: Algebraic Algebraic Number e.g. a = + b [1] High level abstraction Infinite precision Often

More information

An Asynchronous Array of Simple Processors for DSP Applications

An Asynchronous Array of Simple Processors for DSP Applications An Asynchronous Array of Simple Processors for DSP Applications Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, Bevan Baas

More information

2. Recommended Design Flow

2. Recommended Design Flow 2. Recommended Design Flow This chapter describes the Altera-recommended design low or successully implementing external memory interaces in Altera devices. Altera recommends that you create an example

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

DSP Design Flow User Guide

DSP Design Flow User Guide DSP Design Flow User Guide 101 Innovation Drive San Jose, CA 95134 www.altera.com Document Date: June 2009 Copyright 2009 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company,

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510)

EECS Dept., University of California at Berkeley. Berkeley Wireless Research Center Tel: (510) A V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications Hui Zhang, Vandana Prabhu, Varghese George, Marlene Wan, Martin Benes, Arthur Abnous, and Jan M. Rabaey EECS Dept., University

More information

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Design Methodologies December 10, 2002 L o g i c T r a n s i s t o r s p e r C h i p ( K ) 1 9 8 1 1

More information

Response Time and Throughput

Response Time and Throughput Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing

More information

Design Methodologies

Design Methodologies Design Methodologies 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Complexity Productivity (K) Trans./Staff - Mo. Productivity Trends Logic Transistor per Chip (M) 10,000 0.1

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

MOS High Performance Arithmetic

MOS High Performance Arithmetic MOS High Performance Arithmetic Mark Horowitz Stanford University horowitz@ee.stanford.edu Arithmetic Is Important Then Now (TegraK1) 2 What Is Hard? 9999999 + 1 3 Proof: 1 st Gen 2 nd Gen 4 And Getting

More information

Computing s Energy Problem:

Computing s Energy Problem: Computing s Energy Problem: (and what we can do about it) Mark Horowitz Stanford University horowitz@ee.stanford.edu Everything Has A Computer Inside 2 The Reason is Simple: Moore s Law Made Gates Cheap

More information

Adaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010

Adaptive Voltage Scaling (AVS) Alex Vainberg   October 13, 2010 Adaptive Voltage Scaling (AVS) Alex Vainberg Email: alex.vainberg@nsc.com October 13, 2010 Agenda AVS Introduction, Technology and Architecture Design Implementation Hardware Performance Monitors Overview

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist

More information

Advanced Synthesis Techniques

Advanced Synthesis Techniques Advanced Synthesis Techniques Reminder From Last Year Use UltraFast Design Methodology for Vivado www.xilinx.com/ultrafast Recommendations for Rapid Closure HDL: use HDL Language Templates & DRC Constraints:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Frequency and Voltage Scaling Design. Ruixing Yang

Frequency and Voltage Scaling Design. Ruixing Yang Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages

More information

CIS 371 Spring 2010 Thu. 4 March 2010

CIS 371 Spring 2010 Thu. 4 March 2010 1 Computer Organization and Design Midterm Exam Solutions CIS 371 Spring 2010 Thu. 4 March 2010 This exam is closed book and note. You may use one double-sided sheet of notes, but no magnifying glasses!

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance Chapter 1 Computer Abstractions and Technology Lesson 2: Understanding Performance Indeed, the cost-performance ratio of the product will depend most heavily on the implementer, just as ease of use depends

More information

TIMA Lab. Research Reports

TIMA Lab. Research Reports ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,

More information

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc. Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the

More information

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs

An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs An Overview of a Compiler for Mapping MATLAB Programs onto FPGAs P. Banerjee Department of Electrical and Computer Engineering Northwestern University 2145 Sheridan Road, Evanston, IL-60208 banerjee@ece.northwestern.edu

More information

Power Optimization for Universal Hash Function Data Path Using Divide-and-Concatenate Technique

Power Optimization for Universal Hash Function Data Path Using Divide-and-Concatenate Technique Poer Optimization or Universal Hash Function Data Path Using Divide-and-Concatenate Technique Bo Yang, and Ramesh Karri Dept. o Electrical and Computer Engineering, Polytechnic University Brooklyn, NY,

More information

A fast and area-efficient FPGA-based architecture for high accuracy logarithm approximation

A fast and area-efficient FPGA-based architecture for high accuracy logarithm approximation A ast and area-eicient FPGA-based architecture or high accuracy logarithm approximation Dimitris Bariamis, Dimitris Maroulis, Dimitris K. Iakovidis Department o Inormatics and Telecommunications University

More information

ECE 747 Digital Signal Processing Architecture. ESL Design Methodologies

ECE 747 Digital Signal Processing Architecture. ESL Design Methodologies ECE 747 Digital Signal Processing Architecture ESL Design Methodologies Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 747 Spring 2006 Slide 1 What is ESL Design?

More information

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,

More information

Introduction to CMOS VLSI Design Lecture 13: SRAM

Introduction to CMOS VLSI Design Lecture 13: SRAM Introduction to CMOS VLSI Design Lecture 13: SRAM David Harris Harvey Mudd College Spring 2004 1 Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Serial Access

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed. Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 19: Verilog and Processor Performance Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Verilog Basics Hardware description language

More information

URL: Offered by: Should already know how to design with logic. Will learn...

URL:  Offered by: Should already know how to design with logic. Will learn... 00 1 EE 3755 Computer Organization 00 1 URL: http://www.ece.lsu.edu/ee3755 Offered by: David M. Koppelman Room 3191 P. Taylor Hall 578-5482, koppel@ece.lsu.edu, http://www.ece.lsu.edu/koppel Tentative

More information

Design Methodologies. Kai Huang

Design Methodologies. Kai Huang Design Methodologies Kai Huang News Is that real? In such a thermally constrained environment, going quad-core only makes sense if you can properly power gate/turbo up when some cores are idle. I have

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling

A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge,

More information

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware. Objective/Approach/Process

Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware. Objective/Approach/Process Adaptive Computing Systems (ACS) Domain for Implementing DSP Algorithms in Reconfigurable Hardware John Zaino, Eric Pauer, Ken Smith, Paul Fiore, Jairam Ramanathan, Cory Myers {john.c.aino, ken.smith,

More information

The Use of the No-Instruction-Set Computer (NISC) for Acceleration in WISHBONE-Based Systems

The Use of the No-Instruction-Set Computer (NISC) for Acceleration in WISHBONE-Based Systems The Use o the No-Instruction-Set Computer () or Acceleration in WISHBONE-Based Systems Roko Grubišić, Vlado Sruk Technical Report 11-14-2008 Department o Electronics, Microelectronics, Computer and Intelligent

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Emerging DRAM Technologies

Emerging DRAM Technologies 1 Emerging DRAM Technologies Michael Thiems amt051@email.mot.com DigitalDNA Systems Architecture Laboratory Motorola Labs 2 Motivation DRAM and the memory subsystem significantly impacts the performance

More information

310/ ICTP-INFN Advanced Tranining Course on FPGA and VHDL for Hardware Simulation and Synthesis 27 November - 22 December 2006

310/ ICTP-INFN Advanced Tranining Course on FPGA and VHDL for Hardware Simulation and Synthesis 27 November - 22 December 2006 310/1780-18 ICTP-INFN Advanced Tranining Course on FPGA and VHDL for Hardware Simulation and Synthesis 27 November - 22 December 2006 Design Methodology Tools Jorgen CHRISTIANSEN PH-ED CERN CH-1221 Geneva

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

Digital Integrated Circuits

Digital Integrated Circuits Digital Integrated Circuits EE141 Fall 2005 Tu & Th 11-12:30 203 McLaughlin What is This Class About? Introduction to Digital Integrated Circuits Introduction: Issues in digital design CMOS devices and

More information

13. Power Management in Stratix IV Devices

13. Power Management in Stratix IV Devices February 2011 SIV51013-3.2 13. Power Management in Stratix IV Devices SIV51013-3.2 This chapter describes power management in Stratix IV devices. Stratix IV devices oer programmable power technology options

More information

Advanced Design System 1.5. DSP Synthesis

Advanced Design System 1.5. DSP Synthesis Advanced Design System 1.5 DSP Synthesis December 2000 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard

More information

More Course Information

More Course Information More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well

More information

Hardware/Software Partitioning for SoCs. EECE Advanced Topics in VLSI Design Spring 2009 Brad Quinton

Hardware/Software Partitioning for SoCs. EECE Advanced Topics in VLSI Design Spring 2009 Brad Quinton Hardware/Software Partitioning for SoCs EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton Goals of this Lecture Automatic hardware/software partitioning is big topic... In this lecture,

More information

Memory Arrays. Array Architecture. Chapter 16 Memory Circuits and Chapter 12 Array Subsystems from CMOS VLSI Design by Weste and Harris, 4 th Edition

Memory Arrays. Array Architecture. Chapter 16 Memory Circuits and Chapter 12 Array Subsystems from CMOS VLSI Design by Weste and Harris, 4 th Edition Chapter 6 Memory Circuits and Chapter rray Subsystems from CMOS VLSI Design by Weste and Harris, th Edition E E 80 Introduction to nalog and Digital VLSI Paul M. Furth New Mexico State University Static

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

L2: Design Representations

L2: Design Representations CS250 VLSI Systems Design L2: Design Representations John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) Engineering Challenge Application Gap usually too large to bridge in one step,

More information

Power-Optimal Pipelining in Deep Submicron Technology Λ

Power-Optimal Pipelining in Deep Submicron Technology Λ 8. Power-Optimal Pipelining in Deep Submicron Technology Λ Seongmoo Heo and Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory Vassar Street, Cambridge, MA 9 fheomoo,krsteg@csail.mit.edu

More information

What is this class all about?

What is this class all about? EE141-Fall 2007 Digital Integrated Circuits Instructor: Elad Alon TuTh 3:30-5pm 155 Donner 1 1 What is this class all about? Introduction to digital integrated circuit design engineering Will describe

More information

An Analytic Model for Embedded Machine Vision: Architecture and Performance Exploration

An Analytic Model for Embedded Machine Vision: Architecture and Performance Exploration 419 An Analytic Model or Embedded Machine Vision: Architecture and Perormance Exploration Chan Kit Wai, Prahlad Vadakkepat, Tan Kok Kiong Department o Electrical and Computer Engineering, 4 Engineering

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters

Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters Understanding Signal to Noise Ratio and Noise Spectral Density in high speed data converters TIPL 4703 Presented by Ken Chan Prepared by Ken Chan 1 Table o Contents What is SNR Deinition o SNR Components

More information

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the

More information

EECS 244 Computer-Aided Design of Integrated Circuits and Systems

EECS 244 Computer-Aided Design of Integrated Circuits and Systems EECS 244 Computer-Aided Design of Integrated Circuits and Systems Professor A. Richard Newton Room 566 Cory Hall 642-2967, rnewton@ic.eecs Office Hours: Tu. Th. 3:30-4:30pm Fall 1997 Administrative Details

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

CS 250 VLSI Design Lecture 11 Design Verification

CS 250 VLSI Design Lecture 11 Design Verification CS 250 VLSI Design Lecture 11 Design Verification 2012-9-27 John Wawrzynek Jonathan Bachrach Krste Asanović John Lazzaro TA: Rimas Avizienis www-inst.eecs.berkeley.edu/~cs250/ IBM Power 4 174 Million Transistors

More information

Towards Optimal Custom Instruction Processors

Towards Optimal Custom Instruction Processors Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors

More information

2. Design Planning with the Quartus II Software

2. Design Planning with the Quartus II Software November 2013 QII51016-13.1.0 2. Design Planning with the Quartus II Sotware QII51016-13.1.0 This chapter discusses key FPGA design planning considerations, provides recommendations, and describes various

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013 A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias David Kidd August 26, 2013 1 HOTCHIPS 2013 Copyright 2013 SuVolta, Inc. All rights reserved. Agenda DDC transistor and PowerShrink platform

More information

Introduction to SRAM. Jasur Hanbaba

Introduction to SRAM. Jasur Hanbaba Introduction to SRAM Jasur Hanbaba Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Non-volatile Memory Manufacturing Flow Memory Arrays Memory Arrays Random Access Memory Serial

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information