Adaptive Robustness Tuning for High Performance Domino Logic
|
|
- Marvin Rose
- 5 years ago
- Views:
Transcription
1 Adaptive Robustness Tuning for High Performance Domino Logic Bharan Giridhar 1, David Fick 1, Matthew Fojtik 1, Sudhir Satpathy 1, David Bull 2, Dennis Sylvester 1 and David Blaauw 1 1 niversity of Michigan, SA 2 ARM, nited Kingdom Symposia on VLSI Technology and Circuits
2 Motivation Targeted use of Domino logic remains the designer s choice to implement speed critical paths in power constrained designs Samsung Hummingbird, AMD Bulldozer Golden et al., ISSCC 2011 With increasing process variation, domino logic is getting less beneficial Increasing margins for leakage, charge sharing and noise Slide 1
3 Normalized Delay Normalized Keeper Size Motivation Margining the keeper for robustness under worstcase PVT can degrade performance by ~32% A0 Keeper Margining A Normalized Delay Normalized Keeper Size ~32% performance degradation CLK 0.77 Nominal Process Process, Temp Process, Temp, Noise ART Domino tracks PVT and trades extra robustness and timing margins for performance 1 Slide 2
4 ART Domino Gate Fast, speculative evaluation followed by fully margined (safe) evaluation to detect errors A B CLK Slide 3
5 ART Domino Gate Fast, speculative evaluation followed by fully margined (safe) evaluation to detect errors A B CLK Slide 4
6 ART Domino Gate Fast, speculative evaluation followed by fully margined (safe) evaluation to detect errors Headers/footers shared across gates to minimize overhead T V X EC ECB CLK Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () A B EC CLK CHECK (ECB) V Y V X V Y ECB EC A V DYN T ( ) Y Slide 5
7 ART Domino Gate V X < and V Y > speed critical transitions at both nodes by reducing voltage swings V Y > speeds the following gate by trading its noise margin for speed T Remove margins EC V X (<) A CLK B ECB Leakier Stack CLK EC CHECK (ECB) Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () V Y (>) V X V Y ECB EC A V DYN T ( ) Y Slide 6
8 ART Domino Gate V X < and V Y > speed critical transitions at both nodes by reducing voltage swings V Y > speeds the following gate by trading its noise margin for speed T Fast, speculative evaluation EC V X (<) A CLK B ECB Leakier Stack Y (Speculate) CLK EC CHECK (ECB) Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () V Y (>) V X V Y ECB EC A V DYN T ( ) Y Slide 7
9 ART Domino Gate Slower, safe evaluation performed in the background No impact on computation latency T Restore margins EC V X (=) ECB Y (Speculate) CLK Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () A CLK B Leakier Stack EC CHECK (ECB) V Y (=) V X V Y ECB EC A V DYN T ( ) Y Slide 8
10 ART Domino Gate Slower, safe evaluation performed in the background No impact on computation latency T Slower, safe evaluation EC V X (=) ECB Y (Speculate) CLK Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () A CLK B Leakier Stack Y (Safe) EC CHECK (ECB) V Y (=) V X V Y ECB EC A V DYN T ( ) Y Slide 9
11 ART Domino Gate In case of errors, the errant computation is flushed from the pipeline The result of safe evaluation is propagated, guaranteeing forward progress T Error Detection EC ECB V X Y (Speculate) V Y A CLK B Y (Safe) Compare and Detect Errors ECB EC T ( ) Slide 10
12 ART Pipeline Overlapping clocks Latchless hand-off Pipe Stage N Pipe Stage N+1 Conventional Domino Pipeline Slide 11
13 ART Pipeline Pipe Stage N Pipe Stage N+1 Slide 12
14 ART Pipeline Pipe Stage N X DOMBF M X B F 1 X Pipe Stage N+1 DOMBF M X B F 1 Slide 13
15 ART Pipeline Speculative Evaluation () Pipe Stage N DOMBF M X B F 1 Stores speculative result of half-stage Overlapping clocks Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 During, DOMBF snoops on the value propagated forward through the mux Overlapping clocks eliminate latches between pipe stages, provide skew tolerance Slide 14
16 ART Pipeline Speculative Evaluation () Pipe Stage N DOMBF M X B F 1 Stores speculative result of half-stage Overlapping clocks Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 During, DOMBF snoops on the value propagated forward through the mux Overlapping clocks eliminate latches between pipe stages, provide skew tolerance Slide 15
17 ART Pipeline Checking Evaluation () Pipe Stage N DOMBF Pipe-stage Split: Parallel evaluations during phase M X B F 1 DOMBF value from propagated Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 To allow the slower safe evaluation to complete, each pipe stage is split in half during Value stored on DOMBF propagated forward cutting the stage depth by half Slide 16
18 ART Pipeline Checking Evaluation () Pipe Stage N DOMBF M X B F 1 Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 Both halves of each pipe stage perform safe evaluations simultaneously Slide 17
19 Checking Evaluation () Pipe Stage N DOMBF M X B F 1 ART Pipeline Latchless hand-off Pipe Stage N+1 Result Result B F 1 M X DOMBF Results from and to check for errors in the segment Both halves of each pipe stage perform safe evaluations simultaneously Error detector at each DOMBF checks the segment till the preceding DOMBF for errors Slide 18
20 DOMBF BF1 ART Pipeline Fully margined Domino XOR Error Detection Error BIT0 BF2 (identical to BF1) Bit1 Bit0 BitN Fully Margined Domino OR tree Stage_N error SYSCLK STG_1 Stage_1 error Error Detection in next clock cycle in parallel with next set of gate evaluations GATE1 GATE2 GATE3 GATE4 GATE5 GATE6 GATE7 GATE8 Pipe Stage N : Precharge error logic (in red) : Copy Gate4 TO BF2 and DOMBF to BF1 +: Evaluate Domino XOR, Domino OR tree Slide 19
21 ART Pipeline Pipe Stage N DOMBF M X B F 1 Pipe Stage N+1 DOMBF M X B F 1 SIG N T A0 SIG N V X A1 ECB_N V Y SIG_B_N BFIN BF1 EC_N BFOT DOMBF, BF1(2) are fully margined EC_N+1 V X V Y A0 STG_N+1 ECB_N+1 ECB_N+1 A1 EC_N+1 STG_N T ( ) Slide 20
22 ART Clock Generation GCLK ( from clock ring) Ф1 (= STG_N) Ф2 ( = STG_N+1) STG_1 STG_2 Domino gates clocked by the global clock generator Ф3 ( = EC_N) EC_N STG_N SIG N SN SN D Q D Q GCLK Ф1 Ф2 CK QN RN CK QN RN SN D Q CK QN RN Ф4 ( = EC_N+1) ECB_N STG_N SIG N Power gates and other locally derived signals clocked using locally generated clocks ECB_N STG_N EC_N STG_N SIG N SIG N Slide 21
23 System Implementation X[15:0] NEG[15:0] BOOTH ENCODER ZERO[15:0] SHIFT[15:0] (T 1,T 1 ) (T 2,T 2 ) (T 3,T 3 ) (T 4,T 4 ) Pipe Stage 1 Pipe Stage 2 Y[15:0] BOOTH SIGN[15:0] DECODER PARTIAL PRODCTS[15:0][32:0] TEST HARNESS A[63:0] MLTIPLIER ARRAY RADIX-2 KOGGE STONE PRODCT[63:0] 32 32b multiplier in 65nm CMOS 4 T/T voltage domains, 2 pipeline stages Slide 22
24 Die Micrograph MLTIPLIER ARRAY KOGGE STONE BOOTH DECODER BOOTH DECODER TEST HARNESS BOOTH ENCODER 1.5mm CLKGEN + CLK MESH BFFERS 1.5mm SCAN 65nm CMOS process Die Size: 1496μm X 1496μm Slide 23
25 1.2-T (V) Measured Results Measured Max. Performance: 1192 MHz Error region (Robustness failure) 1040 MHz 1072 MHz 1136 MHz 1104 MHz Performance with ART up by 34% by eliminating robustness margins at nominal PVT 0.1 No Adapting 890 MHz 944 MHz 976 MHz 1008 MHz MHz T (V) Slide 24
26 Required T (V) Power (mw) Required T (V) No Adapting Measured Results ~34% performance gain with 47% Power increase Simulated Power Standard Domino ~17% Performance gain with 12% Power reduction Performance (MHz) Minimum Power at each frequency Method applicable to performance constrained logic where performance cannot be obtained by any other means Power increases sharply at higher frequencies due to increased short circuit current on the output inverter T T 1, T 2, T 3, T 4 T 1, T 2, T 3 Performance (MHz) Performance (MHz) ΔT/ΔT at each measured power-frequency point Slide 25
27 Number of Chips % Performance Gain (Robustness speculation) ART Performance Gain 10 8 At 27C, 1.2V Slow Die Fast Die Typical Die 6 20 Dies % Performance Gain (Robustness Speculation) Temperature (C) Measured average % gain due to robustness tuning across 20 dies is ~28% % Gain decreases at higher temperatures as gates become less robust Slide 26
28 Performance (MHz) 1300 Overall Performance Gain Worst Process,Voltage,Temp* Process Gain Temp Gain Voltage Gain ART Gain X X X ART gain Robustness tuning X 1.04X 1X X 1.12X 1.07X X 1.17X 1.12X Voltage gain Temp gain Process gain Timing Speculation Total gains of 49% to 71% over conventionally margined designs 300 Slow Nominal Fast * Worst process was set by the slowest die. Temperature was set to 85C and Supply was degraded by 10% to 1.08V Slide 27
29 % Error (robustness failures) % Error (timing failures) Error Rate T tuning T tuning T[1-4] = 1.2V T[1-4] = 0V MHz T 1-4 = 0.9V T 1-3 = 0.15V, T 4 = 0.17V T (V), T (V)* * Δ T = 0.9V-T, Δ T = T-0.17V Operating Frequency (MHz) Error rate is more sensitive to T than T T tuning also affects robustness of following gate Slide 28
30 Conclusion A new high speed design style called ART Domino was presented ART Domino provides overall gain of up to 1.71x over conventional domino 3.2x faster than static CMOS Design overhead is amortized over pipeline stage depth Slide 29
31 Thank You Questions? Slide 30
32 BACKP SLIDES Symposia on VLSI Technology and Circuits Slide 31
33 ART Clock Generation Ф1 (= STG_N) Example 4-Stage Pipeline Operation Phases GCLK ( from clock ring) Ф2 ( = STG_N+1) SYSCLK (GCLK 2) STG_1 STG_2 Globally generated clocks with relaxed skew constraints STG_3 (=STG_1 CLK) STG_4 (=STG_2 CLK) Ф3 ( = EC_N) EC_N STG_N SIG N SN SN D Q D Q GCLK Ф1 Ф2 CK QN RN CK QN RN SN D Q CK QN RN Ф4 ( = EC_N+1) ECB_N STG_N SIG N Locally generated clocking signals with stricter skew constraints ECB_N STG_N EC_N STG_N SIG N SIG N Slide 32
34 System Implementation Conventional Domino Other Logic FF Conventional Domino Logic FF Other Logic Slide 33
35 System Implementation Conventional Domino STG1 Eval Prchg Slide 34
36 System Implementation Conventional Domino STG2 Prchg Eval Slide 35
37 System Implementation ART Domino Other Logic FF ART Domino Logic FF Other Logic Slide 36
38 System Implementation ART Domino STG1 Slide 37
39 System Implementation ART Domino STG2 Slide 38
40 System Implementation ART Domino STG3 Slide 39
41 System Implementation ART Domino STG4 Slide 40
42 System Implementation ART Domino SYSCLK (GCLK 2) STG_1 STG_2 STG_3 (=STG_1 CLK) Error Detect STG1 Error Detect STG1 Error Detect STG1 Error Detect STG1 STG_4 (=STG_2 CLK) Slide 41
43 Architectural Replay Cycle0 Cycle1 Cycle2 SYSCLK (GCLK 2) STG_1 Error detected in STG2 STG_2 STG_3 STG_4 Slide 42
44 Architectural Replay Cycle0 Cycle1 Cycle2 SYSCLK (GCLK 2) STG_1 Error detected in STG2 STG_2 STG_3 STG_4 Copy safe value Slide 43
45 Architectural Replay Cycle0 Cycle1 Cycle2 SYSCLK (GCLK 2) STG_1 Error detected in STG2 STG_2 STG_3 STG_4 Copy safe value Slide 44
46 Error Detection Scenarios Slide 45
Centip3De: A 64-Core, 3D Stacked, Near-Threshold System
1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman
More informationDesign of Low Power Wide Gates used in Register File and Tag Comparator
www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationDynamic Logic Families
Dynamic Logic Families C.K. Ken Yang UCLA yangck@ucla.edu Courtesy of MAH,JR 1 Overview Reading Rabaey 6.3 (Dynamic), 7.5.2 (NORA) Overview This set of notes cover in greater detail Dynamic Logic Families
More informationMillimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells
1 Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells Gregory Chen, Matthew Fojtik, Daeyeon Kim, David Fick, Junsun Park, Mingoo Seok, Mao-Ter Chen, Zhiyoong Foo, Dennis
More informationAn Asynchronous Floating-Point Multiplier
An Asynchronous Floating-Point Multiplier Basit Riaz Sheikh and Rajit Manohar Computer Systems Lab Cornell University http://vlsi.cornell.edu/ The person that did the work! Motivation Fast floating-point
More informationTHE latest generation of microprocessors uses a combination
1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz
More informationPOWER consumption has become one of the most important
704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George
More informationPower Analysis for CMOS based Dual Mode Logic Gates using Power Gating Techniques
Power Analysis for CMOS based Dual Mode Logic Gates using Power Gating Techniques S. Nand Singh Dr. R. Madhu M. Tech (VLSI Design) Assistant Professor UCEK, JNTUK. UCEK, JNTUK Abstract: Low power technology
More informationLecture 5. Other Adder Issues
Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There
More informationSEE Tolerant Self-Calibrating Simple Fractional-N PLL
SEE Tolerant Self-Calibrating Simple Fractional-N PLL Robert L. Shuler, Avionic Systems Division, NASA Johnson Space Center, Houston, TX 77058 Li Chen, Department of Electrical Engineering, University
More informationZ-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.
Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance
More informationCHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER
84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The
More informationDESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER
DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER Bhuvaneswaran.M 1, Elamathi.K 2 Assistant Professor, Muthayammal Engineering college, Rasipuram, Tamil Nadu, India 1 Assistant Professor, Muthayammal
More informationYield-driven Near-threshold SRAM Design
Yield-driven Near-threshold SRAM Design Gregory K. Chen, David Blaauw, Trevor Mudge, Dennis Sylvester Department of EECS University of Michigan Ann Arbor, MI 48109 grgkchen@umich.edu, blaauw@umich.edu,
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationPOWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS
POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More informationMemory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.
Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile
More informationSwizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems
1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna
More informationPICo Embedded High Speed Cache Design Project
PICo Embedded High Speed Cache Design Project TEAM LosTohmalesCalientes Chuhong Duan ECE 4332 Fall 2012 University of Virginia cd8dz@virginia.edu Andrew Tyler ECE 4332 Fall 2012 University of Virginia
More informationEE577b. Register File. By Joong-Seok Moon
EE577b Register File By Joong-Seok Moon Register File A set of registers that store data Consists of a small array of static memory cells Smallest size and fastest access time in memory hierarchy (Register
More informationDynamic CMOS Logic Gate
Dynamic CMOS Logic Gate In dynamic CMOS logic a single clock can be used to accomplish both the precharge and evaluation operations When is low, PMOS pre-charge transistor Mp charges Vout to Vdd, since
More informationLecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.
Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports
More informationDDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators
DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators Tuck-Boon Chan, Puneet Gupta, Andrew B. Kahng and Liangzhen Lai UC San Diego ECE and CSE Departments, La Jolla,
More informationA Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits
A Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits Basit Riaz Sheikh and Rajit Manohar, Cornell University We present two novel energy-efficient pipeline templates for high
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More informationWeek 7: Assignment Solutions
Week 7: Assignment Solutions 1. In 6-bit 2 s complement representation, when we subtract the decimal number +6 from +3, the result (in binary) will be: a. 111101 b. 000011 c. 100011 d. 111110 Correct answer
More informationMTJ-Based Nonvolatile Logic-in-Memory Architecture
2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory
More informationCENTIP3DE: A 64-CORE, 3DSTACKED NEAR-THRESHOLD SYSTEM
... CENTIP3DE: A 64-CORE, 3DSTACKED NEAR-THRESHOLD SYSTEM... Ronald G. Dreslinski David Fick Bharan Giridhar Gyouho Kim Sangwon Seo Matthew Fojtik Sudhir Satpathy Yoonmyung Lee Daeyeon Kim Nurrachman Liu
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationA 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling
A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge,
More informationDRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias
ASub-0 Sub-0.9V Logic-compatible Embedded DRAM with Boosted 3T Gain Cell, Regulated Bit-line Write Scheme and PVT-tracking Read Reference Bias Ki Chul Chun, Pulkit Jain, Jung Hwa Lee*, Chris H. Kim University
More informationImplementation of Asynchronous Topology using SAPTL
Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department
More informationEmbedded Systems Ch 15 ARM Organization and Implementation
Embedded Systems Ch 15 ARM Organization and Implementation Byung Kook Kim Dept of EECS Korea Advanced Institute of Science and Technology Summary ARM architecture Very little change From the first 3-micron
More informationAdaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010
Adaptive Voltage Scaling (AVS) Alex Vainberg Email: alex.vainberg@nsc.com October 13, 2010 Agenda AVS Introduction, Technology and Architecture Design Implementation Hardware Performance Monitors Overview
More information10. Interconnects in CMOS Technology
10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October
More informationEE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements
EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last
More informationMOS High Performance Arithmetic
MOS High Performance Arithmetic Mark Horowitz Stanford University horowitz@ee.stanford.edu Arithmetic Is Important Then Now (TegraK1) 2 What Is Hard? 9999999 + 1 3 Proof: 1 st Gen 2 nd Gen 4 And Getting
More informationDYNAMIC ASYNCHRONOUS LOGIC FOR HIGH SPEED CMOS SYSTEMS. I. Introduction
DYNAMIC ASYNCHRONOUS LOGIC FOR HIGH SEED CMOS SYSTEMS Anthony J. McAuley Bellcore (Room MRE-2N263), 445 South Street, Morristown, NJ 07962-1910, USA hone: (201)-829-4698; E-mail: mcauley@bellcore.com Abstract
More informationGood Morning. I will be presenting the 2.25MByte cache on-board Hewlett-Packard s latest PA-RISC CPU.
1 A 900MHz 2.25MByte Cache with On Chip CPU - Now in SOI/Cu J. Michael Hill Jonathan Lachman Good Morning. I will be presenting the 2.25MByte cache on-board Hewlett-Packard s latest PA-RISC CPU. 2 Outline
More informationMinimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007
Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional
More informationBinary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.
Binary Arithmetic Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. MIT 6.004 Fall 2018 Reminder: Encoding Positive Integers Bit i in a binary representation (in right-to-left order)
More informationLast Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks
Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationMicroprocessor and DSP Technologies for the Nanoscale Era
Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel Corporation, Hillsboro, OR ram.krishnamurthy@intel.com 1 July 5, 2005 Intel
More informationMemory, Latches, & Registers
Memory, Latches, & Registers 1) Structured Logic Arrays 2) Memory Arrays 3) Transparent Latches 4) Saving a few bucks at toll booths 5) Edge-triggered Registers L14 Memory 1 General Table Lookup Synthesis
More informationSIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I
SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK Subject with Code : DICD (16EC5703) Year & Sem: I-M.Tech & I-Sem Course
More informationDeep Sub-Micron Cache Design
Cache Design Challenges in Deep Sub-Micron Process Technologies L2 COE Carl Dietz May 25, 2007 Deep Sub-Micron Cache Design Agenda Bitcell Design Array Design SOI Considerations Surviving in the corporate
More informationVLSI Arithmetic Lecture 6
VLSI Arithmetic Lecture 6 Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Review Lecture 5 Prefix Adders and Parallel Prefix Adders from: Ercegovac-Lang Oklobdzija 2004
More informationASNT2032-MBL Digital DMUX 12-to-24 with LVDS Interfaces
-MBL Digital DMUX 12-to-24 with LVDS Interfaces Digital demultiplexer (DMUX) 12-to-24 with LVDS output interface Programmable LVDS/CML/ECL input interface Supports data rates from 1.0Mbps to 3.6Gbps Preset
More informationA novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context.
A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. Anselme Vignon, Stefan Cosemans, Wim Dehaene K.U. Leuven ESAT - MICAS Laboratory Kasteelpark Arenberg
More informationDesign of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture
Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017
Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of
More informationReliable Physical Unclonable Function based on Asynchronous Circuits
Reliable Physical Unclonable Function based on Asynchronous Circuits Kyung Ki Kim Department of Electronic Engineering, Daegu University, Gyeongbuk, 38453, South Korea. E-mail: kkkim@daegu.ac.kr Abstract
More informationReduce Your System Power Consumption with Altera FPGAs Altera Corporation Public
Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary
More informationA 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing
A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine
More informationA 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology
http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee
More informationUltra Low Power (ULP) Challenge in System Architecture Level
Ultra Low Power (ULP) Challenge in System Architecture Level - New architectures for 45-nm, 32-nm era ASP-DAC 2007 Designers' Forum 9D: Panel Discussion: Top 10 Design Issues Toshinori Sato (Kyushu U)
More informationASNT Gbps 1:16 Digital Deserializer
12.5Gbps 1:16 Digital Deserializer Broadband up to 12.5Gbps (gigabits per second) 1:16 Deserializer High-speed Input Data Buffer with on-chip 100Ohm differential termination. Full-rate CML Input Clock
More informationHigh-speed Serial Interface
High-speed Serial Interface Lect. 16 Clock and Data Recovery 3 1 CDR Design Example ( 권대현 ) Clock and Data Recovery Circuits Transceiver PLL vs. CDR High-speed CDR Phase Detector Charge Pump Voltage Controlled
More informationDESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES
Volume 120 No. 6 2018, 4453-4466 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR
More informationHigh-Performance Full Adders Using an Alternative Logic Structure
Term Project EE619 High-Performance Full Adders Using an Alternative Logic Structure by Atulya Shivam Shree (10327172) Raghav Gupta (10327553) Department of Electrical Engineering, Indian Institure Technology,
More informationNear-Threshold Computing: Reclaiming Moore s Law
1 Near-Threshold Computing: Reclaiming Moore s Law Dr. Ronald G. Dreslinski Research Fellow Ann Arbor 1 1 Motivation 1000000 Transistors (100,000's) 100000 10000 Power (W) Performance (GOPS) Efficiency (GOPS/W)
More information3. Implementing Logic in CMOS
3. Implementing Logic in CMOS 3. Implementing Logic in CMOS Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 27 September, 27 ECE Department,
More information2. Link and Memory Architectures and Technologies
2. Link and Memory Architectures and Technologies 2.1 Links, Thruput/Buffering, Multi-Access Ovrhds 2.2 Memories: On-chip / Off-chip SRAM, DRAM 2.A Appendix: Elastic Buffers for Cross-Clock Commun. Manolis
More informationSigmaRAM Echo Clocks
SigmaRAM Echo s AN002 Introduction High speed, high throughput cell processing applications require fast access to data. As clock rates increase, the amount of time available to access and register data
More informationA 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter
A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter Bongjin Kim, Weichao Xu, and Chris H. Kim University of Minnesota,
More informationISSN Vol.08,Issue.07, July-2016, Pages:
ISSN 2348 2370 Vol.08,Issue.07, July-2016, Pages:1312-1317 www.ijatir.org Low Power Asynchronous Domino Logic Pipeline Strategy Using Synchronization Logic Gates H. NASEEMA BEGUM PG Scholar, Dept of ECE,
More informationThe Design and Implementation of a Low-Latency On-Chip Network
The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current
More informationOptimization of Phase- Locked Loop Circuits via Geometric Programming
Optimization of Phase- Locked Loop Circuits via Geometric Programming D. Colleran, C. Portmann, A. Hassibi, C. Crusius, S. S. Mohan, S. Boyd, T. H. Lee, and M. Hershenson Outline Motivation Geometric programming
More informationLatch Based Design (1A) Young Won Lim 2/18/15
Latch Based Design (1A) Copyright (c) 2015 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any
More informationGHz Asynchronous SRAM in 65nm. Jonathan Dama, Andrew Lines Fulcrum Microsystems
GHz Asynchronous SRAM in 65nm Jonathan Dama, Andrew Lines Fulcrum Microsystems Context Three Generations in Production, including: Lowest latency 24-port 10G L2 Ethernet Switch Lowest Latency 24-port 10G
More informationFigure 1. An 8-bit Superset Adder.
Improving the Adder: A Fault-tolerant, Reconfigurable Parallel Prefix Adder Kyle E. Powers Dar-Eaum A. Nam Eric A. Llana ECE 4332 Fall 2012 University of Virginia @virginia.edu ABSTRACT
More informationFrequency and Voltage Scaling Design. Ruixing Yang
Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationHigh Performance and Low Power Design Techniques for ASIC and Custom in Nanometer Technologies. David Chinnery
High Performance and Low Power Design Techniques for ASIC and Custom in Nanometer Technologies David Chinnery Outline Introduction Microarchitecture Clock distribution, clock gating, and registers Logic
More informationNear-Threshold Computing in FinFET Technologies: Opportunities for Improved Voltage Scalability
Near-Threshold Computing in FinFET Technologies: Opportunities for Improved Voltage Scalability Nathaniel Pinckney 1, Lucian Shifren 2, Brian Cline 3, Saurabh Sinha 3, Supreet Jeloka 1, Ronald G. Dreslinski
More informationDYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)
DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS
More informationSpare Block Cache Architecture to Enable Low-Voltage Operation
Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 1-1-2011 Spare Block Cache Architecture to Enable Low-Voltage Operation Nafiul Alam Siddique Portland State University
More informationTHE synchronous DRAM (SDRAM) has been widely
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 10, OCTOBER 1997 1597 A Study of Pipeline Architectures for High-Speed Synchronous DRAM s Hoi-Jun Yoo Abstract The performances of SDRAM s with different
More informationA 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array. Mingu Kang, Sujan Gonugondla, Naresh Shanbhag
A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign Machine Learning under Resource
More information10/5/2016. Review of General Bit-Slice Model. ECE 120: Introduction to Computing. Initialization of a Serial Comparator
University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Example of Serialization Review of General Bit-Slice Model General model parameters
More informationJan Rabaey Homework # 7 Solutions EECS141
UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Last modified on March 30, 2004 by Gang Zhou (zgang@eecs.berkeley.edu) Jan Rabaey Homework # 7
More informationIntroduction to CMOS VLSI Design Lecture 13: SRAM
Introduction to CMOS VLSI Design Lecture 13: SRAM David Harris Harvey Mudd College Spring 2004 1 Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Serial Access
More information16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path
Volume 4 Issue 01 Pages-4786-4792 January-2016 ISSN (e): 2321-7545 Website: http://ijsae.in 16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Authors Channa.sravya
More informationEmerging DRAM Technologies
1 Emerging DRAM Technologies Michael Thiems amt051@email.mot.com DigitalDNA Systems Architecture Laboratory Motorola Labs 2 Motivation DRAM and the memory subsystem significantly impacts the performance
More informationGigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004
Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration
More informationAnalysis of Different Multiplication Algorithms & FPGA Implementation
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET4076) Lecture 8(2) I DDQ Current Testing (Chapter 13) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Learning aims Describe the
More informationSelecting PLLs for ASIC Applications Requires Tradeoffs
Selecting PLLs for ASIC Applications Requires Tradeoffs John G. Maneatis, Ph.., President, True Circuits, Inc. Los Altos, California October 7, 2004 Phase-Locked Loops (PLLs) are commonly used to perform
More information1. Designing a 64-word Content Addressable Memory Background
UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Project Phase I Specification NTU IC541CA (Spring 2004) 1. Designing a 64-word Content Addressable
More informationChapter 9. Design for Testability
Chapter 9 Design for Testability Testability CUT = Circuit Under Test A design property that allows: cost-effective development of tests to be applied to the CUT determining the status of the CUT (normal
More informationVLSI microarchitecture. Scaling. Toni Juan
VLSI microarchitecture Scaling Toni Juan Short index Administrative glass schedule projects? Topic of the day: Scaling VLSI microarchitecture Toni Juan, Nov 2000 2 Administrative Class Schedule Day Topic
More informationReadings. H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152
Readings H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152 Recent Research Paper The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays, Hrishikesh et
More informationASNT1016-PQA 16:1 MUX-CMU
16:1 MUX-CMU 16 to 1 multiplexer (MUX) with integrated CMU (clock multiplication unit). PLL-based architecture featuring both counter and forward clocking modes. Supports multiple data rates in the 9.8-12.5Gb/s
More informationDIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design
DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Or, how I learned to stop worrying and love complexity. http://www.eecs.umich.edu/~taustin Microprocessor Verification Task of determining
More informationMemory, Latches, & Registers
Memory, Latches, & Registers 1) Structured Logic Arrays 2) Memory Arrays 3) Transparent Latches 4) How to save a few bucks at toll booths 5) Edge-triggered Registers L13 Memory 1 General Table Lookup Synthesis
More information