Adaptive Robustness Tuning for High Performance Domino Logic

Size: px
Start display at page:

Download "Adaptive Robustness Tuning for High Performance Domino Logic"

Transcription

1 Adaptive Robustness Tuning for High Performance Domino Logic Bharan Giridhar 1, David Fick 1, Matthew Fojtik 1, Sudhir Satpathy 1, David Bull 2, Dennis Sylvester 1 and David Blaauw 1 1 niversity of Michigan, SA 2 ARM, nited Kingdom Symposia on VLSI Technology and Circuits

2 Motivation Targeted use of Domino logic remains the designer s choice to implement speed critical paths in power constrained designs Samsung Hummingbird, AMD Bulldozer Golden et al., ISSCC 2011 With increasing process variation, domino logic is getting less beneficial Increasing margins for leakage, charge sharing and noise Slide 1

3 Normalized Delay Normalized Keeper Size Motivation Margining the keeper for robustness under worstcase PVT can degrade performance by ~32% A0 Keeper Margining A Normalized Delay Normalized Keeper Size ~32% performance degradation CLK 0.77 Nominal Process Process, Temp Process, Temp, Noise ART Domino tracks PVT and trades extra robustness and timing margins for performance 1 Slide 2

4 ART Domino Gate Fast, speculative evaluation followed by fully margined (safe) evaluation to detect errors A B CLK Slide 3

5 ART Domino Gate Fast, speculative evaluation followed by fully margined (safe) evaluation to detect errors A B CLK Slide 4

6 ART Domino Gate Fast, speculative evaluation followed by fully margined (safe) evaluation to detect errors Headers/footers shared across gates to minimize overhead T V X EC ECB CLK Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () A B EC CLK CHECK (ECB) V Y V X V Y ECB EC A V DYN T ( ) Y Slide 5

7 ART Domino Gate V X < and V Y > speed critical transitions at both nodes by reducing voltage swings V Y > speeds the following gate by trading its noise margin for speed T Remove margins EC V X (<) A CLK B ECB Leakier Stack CLK EC CHECK (ECB) Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () V Y (>) V X V Y ECB EC A V DYN T ( ) Y Slide 6

8 ART Domino Gate V X < and V Y > speed critical transitions at both nodes by reducing voltage swings V Y > speeds the following gate by trading its noise margin for speed T Fast, speculative evaluation EC V X (<) A CLK B ECB Leakier Stack Y (Speculate) CLK EC CHECK (ECB) Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () V Y (>) V X V Y ECB EC A V DYN T ( ) Y Slide 7

9 ART Domino Gate Slower, safe evaluation performed in the background No impact on computation latency T Restore margins EC V X (=) ECB Y (Speculate) CLK Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () A CLK B Leakier Stack EC CHECK (ECB) V Y (=) V X V Y ECB EC A V DYN T ( ) Y Slide 8

10 ART Domino Gate Slower, safe evaluation performed in the background No impact on computation latency T Slower, safe evaluation EC V X (=) ECB Y (Speculate) CLK Speculative Precharge () Speculative Evaluation () Checker Precharge () Checker Evaluation () A CLK B Leakier Stack Y (Safe) EC CHECK (ECB) V Y (=) V X V Y ECB EC A V DYN T ( ) Y Slide 9

11 ART Domino Gate In case of errors, the errant computation is flushed from the pipeline The result of safe evaluation is propagated, guaranteeing forward progress T Error Detection EC ECB V X Y (Speculate) V Y A CLK B Y (Safe) Compare and Detect Errors ECB EC T ( ) Slide 10

12 ART Pipeline Overlapping clocks Latchless hand-off Pipe Stage N Pipe Stage N+1 Conventional Domino Pipeline Slide 11

13 ART Pipeline Pipe Stage N Pipe Stage N+1 Slide 12

14 ART Pipeline Pipe Stage N X DOMBF M X B F 1 X Pipe Stage N+1 DOMBF M X B F 1 Slide 13

15 ART Pipeline Speculative Evaluation () Pipe Stage N DOMBF M X B F 1 Stores speculative result of half-stage Overlapping clocks Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 During, DOMBF snoops on the value propagated forward through the mux Overlapping clocks eliminate latches between pipe stages, provide skew tolerance Slide 14

16 ART Pipeline Speculative Evaluation () Pipe Stage N DOMBF M X B F 1 Stores speculative result of half-stage Overlapping clocks Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 During, DOMBF snoops on the value propagated forward through the mux Overlapping clocks eliminate latches between pipe stages, provide skew tolerance Slide 15

17 ART Pipeline Checking Evaluation () Pipe Stage N DOMBF Pipe-stage Split: Parallel evaluations during phase M X B F 1 DOMBF value from propagated Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 To allow the slower safe evaluation to complete, each pipe stage is split in half during Value stored on DOMBF propagated forward cutting the stage depth by half Slide 16

18 ART Pipeline Checking Evaluation () Pipe Stage N DOMBF M X B F 1 Latchless hand-off Pipe Stage N+1 DOMBF M X B F 1 Both halves of each pipe stage perform safe evaluations simultaneously Slide 17

19 Checking Evaluation () Pipe Stage N DOMBF M X B F 1 ART Pipeline Latchless hand-off Pipe Stage N+1 Result Result B F 1 M X DOMBF Results from and to check for errors in the segment Both halves of each pipe stage perform safe evaluations simultaneously Error detector at each DOMBF checks the segment till the preceding DOMBF for errors Slide 18

20 DOMBF BF1 ART Pipeline Fully margined Domino XOR Error Detection Error BIT0 BF2 (identical to BF1) Bit1 Bit0 BitN Fully Margined Domino OR tree Stage_N error SYSCLK STG_1 Stage_1 error Error Detection in next clock cycle in parallel with next set of gate evaluations GATE1 GATE2 GATE3 GATE4 GATE5 GATE6 GATE7 GATE8 Pipe Stage N : Precharge error logic (in red) : Copy Gate4 TO BF2 and DOMBF to BF1 +: Evaluate Domino XOR, Domino OR tree Slide 19

21 ART Pipeline Pipe Stage N DOMBF M X B F 1 Pipe Stage N+1 DOMBF M X B F 1 SIG N T A0 SIG N V X A1 ECB_N V Y SIG_B_N BFIN BF1 EC_N BFOT DOMBF, BF1(2) are fully margined EC_N+1 V X V Y A0 STG_N+1 ECB_N+1 ECB_N+1 A1 EC_N+1 STG_N T ( ) Slide 20

22 ART Clock Generation GCLK ( from clock ring) Ф1 (= STG_N) Ф2 ( = STG_N+1) STG_1 STG_2 Domino gates clocked by the global clock generator Ф3 ( = EC_N) EC_N STG_N SIG N SN SN D Q D Q GCLK Ф1 Ф2 CK QN RN CK QN RN SN D Q CK QN RN Ф4 ( = EC_N+1) ECB_N STG_N SIG N Power gates and other locally derived signals clocked using locally generated clocks ECB_N STG_N EC_N STG_N SIG N SIG N Slide 21

23 System Implementation X[15:0] NEG[15:0] BOOTH ENCODER ZERO[15:0] SHIFT[15:0] (T 1,T 1 ) (T 2,T 2 ) (T 3,T 3 ) (T 4,T 4 ) Pipe Stage 1 Pipe Stage 2 Y[15:0] BOOTH SIGN[15:0] DECODER PARTIAL PRODCTS[15:0][32:0] TEST HARNESS A[63:0] MLTIPLIER ARRAY RADIX-2 KOGGE STONE PRODCT[63:0] 32 32b multiplier in 65nm CMOS 4 T/T voltage domains, 2 pipeline stages Slide 22

24 Die Micrograph MLTIPLIER ARRAY KOGGE STONE BOOTH DECODER BOOTH DECODER TEST HARNESS BOOTH ENCODER 1.5mm CLKGEN + CLK MESH BFFERS 1.5mm SCAN 65nm CMOS process Die Size: 1496μm X 1496μm Slide 23

25 1.2-T (V) Measured Results Measured Max. Performance: 1192 MHz Error region (Robustness failure) 1040 MHz 1072 MHz 1136 MHz 1104 MHz Performance with ART up by 34% by eliminating robustness margins at nominal PVT 0.1 No Adapting 890 MHz 944 MHz 976 MHz 1008 MHz MHz T (V) Slide 24

26 Required T (V) Power (mw) Required T (V) No Adapting Measured Results ~34% performance gain with 47% Power increase Simulated Power Standard Domino ~17% Performance gain with 12% Power reduction Performance (MHz) Minimum Power at each frequency Method applicable to performance constrained logic where performance cannot be obtained by any other means Power increases sharply at higher frequencies due to increased short circuit current on the output inverter T T 1, T 2, T 3, T 4 T 1, T 2, T 3 Performance (MHz) Performance (MHz) ΔT/ΔT at each measured power-frequency point Slide 25

27 Number of Chips % Performance Gain (Robustness speculation) ART Performance Gain 10 8 At 27C, 1.2V Slow Die Fast Die Typical Die 6 20 Dies % Performance Gain (Robustness Speculation) Temperature (C) Measured average % gain due to robustness tuning across 20 dies is ~28% % Gain decreases at higher temperatures as gates become less robust Slide 26

28 Performance (MHz) 1300 Overall Performance Gain Worst Process,Voltage,Temp* Process Gain Temp Gain Voltage Gain ART Gain X X X ART gain Robustness tuning X 1.04X 1X X 1.12X 1.07X X 1.17X 1.12X Voltage gain Temp gain Process gain Timing Speculation Total gains of 49% to 71% over conventionally margined designs 300 Slow Nominal Fast * Worst process was set by the slowest die. Temperature was set to 85C and Supply was degraded by 10% to 1.08V Slide 27

29 % Error (robustness failures) % Error (timing failures) Error Rate T tuning T tuning T[1-4] = 1.2V T[1-4] = 0V MHz T 1-4 = 0.9V T 1-3 = 0.15V, T 4 = 0.17V T (V), T (V)* * Δ T = 0.9V-T, Δ T = T-0.17V Operating Frequency (MHz) Error rate is more sensitive to T than T T tuning also affects robustness of following gate Slide 28

30 Conclusion A new high speed design style called ART Domino was presented ART Domino provides overall gain of up to 1.71x over conventional domino 3.2x faster than static CMOS Design overhead is amortized over pipeline stage depth Slide 29

31 Thank You Questions? Slide 30

32 BACKP SLIDES Symposia on VLSI Technology and Circuits Slide 31

33 ART Clock Generation Ф1 (= STG_N) Example 4-Stage Pipeline Operation Phases GCLK ( from clock ring) Ф2 ( = STG_N+1) SYSCLK (GCLK 2) STG_1 STG_2 Globally generated clocks with relaxed skew constraints STG_3 (=STG_1 CLK) STG_4 (=STG_2 CLK) Ф3 ( = EC_N) EC_N STG_N SIG N SN SN D Q D Q GCLK Ф1 Ф2 CK QN RN CK QN RN SN D Q CK QN RN Ф4 ( = EC_N+1) ECB_N STG_N SIG N Locally generated clocking signals with stricter skew constraints ECB_N STG_N EC_N STG_N SIG N SIG N Slide 32

34 System Implementation Conventional Domino Other Logic FF Conventional Domino Logic FF Other Logic Slide 33

35 System Implementation Conventional Domino STG1 Eval Prchg Slide 34

36 System Implementation Conventional Domino STG2 Prchg Eval Slide 35

37 System Implementation ART Domino Other Logic FF ART Domino Logic FF Other Logic Slide 36

38 System Implementation ART Domino STG1 Slide 37

39 System Implementation ART Domino STG2 Slide 38

40 System Implementation ART Domino STG3 Slide 39

41 System Implementation ART Domino STG4 Slide 40

42 System Implementation ART Domino SYSCLK (GCLK 2) STG_1 STG_2 STG_3 (=STG_1 CLK) Error Detect STG1 Error Detect STG1 Error Detect STG1 Error Detect STG1 STG_4 (=STG_2 CLK) Slide 41

43 Architectural Replay Cycle0 Cycle1 Cycle2 SYSCLK (GCLK 2) STG_1 Error detected in STG2 STG_2 STG_3 STG_4 Slide 42

44 Architectural Replay Cycle0 Cycle1 Cycle2 SYSCLK (GCLK 2) STG_1 Error detected in STG2 STG_2 STG_3 STG_4 Copy safe value Slide 43

45 Architectural Replay Cycle0 Cycle1 Cycle2 SYSCLK (GCLK 2) STG_1 Error detected in STG2 STG_2 STG_3 STG_4 Copy safe value Slide 44

46 Error Detection Scenarios Slide 45

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Dynamic Logic Families

Dynamic Logic Families Dynamic Logic Families C.K. Ken Yang UCLA yangck@ucla.edu Courtesy of MAH,JR 1 Overview Reading Rabaey 6.3 (Dynamic), 7.5.2 (NORA) Overview This set of notes cover in greater detail Dynamic Logic Families

More information

Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells

Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells 1 Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells Gregory Chen, Matthew Fojtik, Daeyeon Kim, David Fick, Junsun Park, Mingoo Seok, Mao-Ter Chen, Zhiyoong Foo, Dennis

More information

An Asynchronous Floating-Point Multiplier

An Asynchronous Floating-Point Multiplier An Asynchronous Floating-Point Multiplier Basit Riaz Sheikh and Rajit Manohar Computer Systems Lab Cornell University http://vlsi.cornell.edu/ The person that did the work! Motivation Fast floating-point

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

POWER consumption has become one of the most important

POWER consumption has become one of the most important 704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George

More information

Power Analysis for CMOS based Dual Mode Logic Gates using Power Gating Techniques

Power Analysis for CMOS based Dual Mode Logic Gates using Power Gating Techniques Power Analysis for CMOS based Dual Mode Logic Gates using Power Gating Techniques S. Nand Singh Dr. R. Madhu M. Tech (VLSI Design) Assistant Professor UCEK, JNTUK. UCEK, JNTUK Abstract: Low power technology

More information

Lecture 5. Other Adder Issues

Lecture 5. Other Adder Issues Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There

More information

SEE Tolerant Self-Calibrating Simple Fractional-N PLL

SEE Tolerant Self-Calibrating Simple Fractional-N PLL SEE Tolerant Self-Calibrating Simple Fractional-N PLL Robert L. Shuler, Avionic Systems Division, NASA Johnson Space Center, Houston, TX 77058 Li Chen, Department of Electrical Engineering, University

More information

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER Bhuvaneswaran.M 1, Elamathi.K 2 Assistant Professor, Muthayammal Engineering college, Rasipuram, Tamil Nadu, India 1 Assistant Professor, Muthayammal

More information

Yield-driven Near-threshold SRAM Design

Yield-driven Near-threshold SRAM Design Yield-driven Near-threshold SRAM Design Gregory K. Chen, David Blaauw, Trevor Mudge, Dennis Sylvester Department of EECS University of Michigan Ann Arbor, MI 48109 grgkchen@umich.edu, blaauw@umich.edu,

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE. Memory Design I Professor Chris H. Kim University of Minnesota Dept. of ECE chriskim@ece.umn.edu Array-Structured Memory Architecture 2 1 Semiconductor Memory Classification Read-Write Wi Memory Non-Volatile

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

PICo Embedded High Speed Cache Design Project

PICo Embedded High Speed Cache Design Project PICo Embedded High Speed Cache Design Project TEAM LosTohmalesCalientes Chuhong Duan ECE 4332 Fall 2012 University of Virginia cd8dz@virginia.edu Andrew Tyler ECE 4332 Fall 2012 University of Virginia

More information

EE577b. Register File. By Joong-Seok Moon

EE577b. Register File. By Joong-Seok Moon EE577b Register File By Joong-Seok Moon Register File A set of registers that store data Consists of a small array of static memory cells Smallest size and fastest access time in memory hierarchy (Register

More information

Dynamic CMOS Logic Gate

Dynamic CMOS Logic Gate Dynamic CMOS Logic Gate In dynamic CMOS logic a single clock can be used to accomplish both the precharge and evaluation operations When is low, PMOS pre-charge transistor Mp charges Vout to Vdd, since

More information

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed. Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports

More information

DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators

DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators Tuck-Boon Chan, Puneet Gupta, Andrew B. Kahng and Liangzhen Lai UC San Diego ECE and CSE Departments, La Jolla,

More information

A Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits

A Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits A Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits Basit Riaz Sheikh and Rajit Manohar, Cornell University We present two novel energy-efficient pipeline templates for high

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Week 7: Assignment Solutions

Week 7: Assignment Solutions Week 7: Assignment Solutions 1. In 6-bit 2 s complement representation, when we subtract the decimal number +6 from +3, the result (in binary) will be: a. 111101 b. 000011 c. 100011 d. 111110 Correct answer

More information

MTJ-Based Nonvolatile Logic-in-Memory Architecture

MTJ-Based Nonvolatile Logic-in-Memory Architecture 2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory

More information

CENTIP3DE: A 64-CORE, 3DSTACKED NEAR-THRESHOLD SYSTEM

CENTIP3DE: A 64-CORE, 3DSTACKED NEAR-THRESHOLD SYSTEM ... CENTIP3DE: A 64-CORE, 3DSTACKED NEAR-THRESHOLD SYSTEM... Ronald G. Dreslinski David Fick Bharan Giridhar Gyouho Kim Sangwon Seo Matthew Fojtik Sudhir Satpathy Yoonmyung Lee Daeyeon Kim Nurrachman Liu

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling

A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge,

More information

DRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias

DRAM with Boosted 3T Gain Cell, PVT-tracking Read Reference Bias ASub-0 Sub-0.9V Logic-compatible Embedded DRAM with Boosted 3T Gain Cell, Regulated Bit-line Write Scheme and PVT-tracking Read Reference Bias Ki Chul Chun, Pulkit Jain, Jung Hwa Lee*, Chris H. Kim University

More information

Implementation of Asynchronous Topology using SAPTL

Implementation of Asynchronous Topology using SAPTL Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department

More information

Embedded Systems Ch 15 ARM Organization and Implementation

Embedded Systems Ch 15 ARM Organization and Implementation Embedded Systems Ch 15 ARM Organization and Implementation Byung Kook Kim Dept of EECS Korea Advanced Institute of Science and Technology Summary ARM architecture Very little change From the first 3-micron

More information

Adaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010

Adaptive Voltage Scaling (AVS) Alex Vainberg   October 13, 2010 Adaptive Voltage Scaling (AVS) Alex Vainberg Email: alex.vainberg@nsc.com October 13, 2010 Agenda AVS Introduction, Technology and Architecture Design Implementation Hardware Performance Monitors Overview

More information

10. Interconnects in CMOS Technology

10. Interconnects in CMOS Technology 10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October

More information

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last

More information

MOS High Performance Arithmetic

MOS High Performance Arithmetic MOS High Performance Arithmetic Mark Horowitz Stanford University horowitz@ee.stanford.edu Arithmetic Is Important Then Now (TegraK1) 2 What Is Hard? 9999999 + 1 3 Proof: 1 st Gen 2 nd Gen 4 And Getting

More information

DYNAMIC ASYNCHRONOUS LOGIC FOR HIGH SPEED CMOS SYSTEMS. I. Introduction

DYNAMIC ASYNCHRONOUS LOGIC FOR HIGH SPEED CMOS SYSTEMS. I. Introduction DYNAMIC ASYNCHRONOUS LOGIC FOR HIGH SEED CMOS SYSTEMS Anthony J. McAuley Bellcore (Room MRE-2N263), 445 South Street, Morristown, NJ 07962-1910, USA hone: (201)-829-4698; E-mail: mcauley@bellcore.com Abstract

More information

Good Morning. I will be presenting the 2.25MByte cache on-board Hewlett-Packard s latest PA-RISC CPU.

Good Morning. I will be presenting the 2.25MByte cache on-board Hewlett-Packard s latest PA-RISC CPU. 1 A 900MHz 2.25MByte Cache with On Chip CPU - Now in SOI/Cu J. Michael Hill Jonathan Lachman Good Morning. I will be presenting the 2.25MByte cache on-board Hewlett-Packard s latest PA-RISC CPU. 2 Outline

More information

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007 Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional

More information

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. Binary Arithmetic Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. MIT 6.004 Fall 2018 Reminder: Encoding Positive Integers Bit i in a binary representation (in right-to-left order)

More information

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Microprocessor and DSP Technologies for the Nanoscale Era

Microprocessor and DSP Technologies for the Nanoscale Era Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel Corporation, Hillsboro, OR ram.krishnamurthy@intel.com 1 July 5, 2005 Intel

More information

Memory, Latches, & Registers

Memory, Latches, & Registers Memory, Latches, & Registers 1) Structured Logic Arrays 2) Memory Arrays 3) Transparent Latches 4) Saving a few bucks at toll booths 5) Edge-triggered Registers L14 Memory 1 General Table Lookup Synthesis

More information

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK Subject with Code : DICD (16EC5703) Year & Sem: I-M.Tech & I-Sem Course

More information

Deep Sub-Micron Cache Design

Deep Sub-Micron Cache Design Cache Design Challenges in Deep Sub-Micron Process Technologies L2 COE Carl Dietz May 25, 2007 Deep Sub-Micron Cache Design Agenda Bitcell Design Array Design SOI Considerations Surviving in the corporate

More information

VLSI Arithmetic Lecture 6

VLSI Arithmetic Lecture 6 VLSI Arithmetic Lecture 6 Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Review Lecture 5 Prefix Adders and Parallel Prefix Adders from: Ercegovac-Lang Oklobdzija 2004

More information

ASNT2032-MBL Digital DMUX 12-to-24 with LVDS Interfaces

ASNT2032-MBL Digital DMUX 12-to-24 with LVDS Interfaces -MBL Digital DMUX 12-to-24 with LVDS Interfaces Digital demultiplexer (DMUX) 12-to-24 with LVDS output interface Programmable LVDS/CML/ECL input interface Supports data rates from 1.0Mbps to 3.6Gbps Preset

More information

A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context.

A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context. Anselme Vignon, Stefan Cosemans, Wim Dehaene K.U. Leuven ESAT - MICAS Laboratory Kasteelpark Arenberg

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

Reliable Physical Unclonable Function based on Asynchronous Circuits

Reliable Physical Unclonable Function based on Asynchronous Circuits Reliable Physical Unclonable Function based on Asynchronous Circuits Kyung Ki Kim Department of Electronic Engineering, Daegu University, Gyeongbuk, 38453, South Korea. E-mail: kkkim@daegu.ac.kr Abstract

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

Ultra Low Power (ULP) Challenge in System Architecture Level

Ultra Low Power (ULP) Challenge in System Architecture Level Ultra Low Power (ULP) Challenge in System Architecture Level - New architectures for 45-nm, 32-nm era ASP-DAC 2007 Designers' Forum 9D: Panel Discussion: Top 10 Design Issues Toshinori Sato (Kyushu U)

More information

ASNT Gbps 1:16 Digital Deserializer

ASNT Gbps 1:16 Digital Deserializer 12.5Gbps 1:16 Digital Deserializer Broadband up to 12.5Gbps (gigabits per second) 1:16 Deserializer High-speed Input Data Buffer with on-chip 100Ohm differential termination. Full-rate CML Input Clock

More information

High-speed Serial Interface

High-speed Serial Interface High-speed Serial Interface Lect. 16 Clock and Data Recovery 3 1 CDR Design Example ( 권대현 ) Clock and Data Recovery Circuits Transceiver PLL vs. CDR High-speed CDR Phase Detector Charge Pump Voltage Controlled

More information

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES Volume 120 No. 6 2018, 4453-4466 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR

More information

High-Performance Full Adders Using an Alternative Logic Structure

High-Performance Full Adders Using an Alternative Logic Structure Term Project EE619 High-Performance Full Adders Using an Alternative Logic Structure by Atulya Shivam Shree (10327172) Raghav Gupta (10327553) Department of Electrical Engineering, Indian Institure Technology,

More information

Near-Threshold Computing: Reclaiming Moore s Law

Near-Threshold Computing: Reclaiming Moore s Law 1 Near-Threshold Computing: Reclaiming Moore s Law Dr. Ronald G. Dreslinski Research Fellow Ann Arbor 1 1 Motivation 1000000 Transistors (100,000's) 100000 10000 Power (W) Performance (GOPS) Efficiency (GOPS/W)

More information

3. Implementing Logic in CMOS

3. Implementing Logic in CMOS 3. Implementing Logic in CMOS 3. Implementing Logic in CMOS Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 27 September, 27 ECE Department,

More information

2. Link and Memory Architectures and Technologies

2. Link and Memory Architectures and Technologies 2. Link and Memory Architectures and Technologies 2.1 Links, Thruput/Buffering, Multi-Access Ovrhds 2.2 Memories: On-chip / Off-chip SRAM, DRAM 2.A Appendix: Elastic Buffers for Cross-Clock Commun. Manolis

More information

SigmaRAM Echo Clocks

SigmaRAM Echo Clocks SigmaRAM Echo s AN002 Introduction High speed, high throughput cell processing applications require fast access to data. As clock rates increase, the amount of time available to access and register data

More information

A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter

A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter Bongjin Kim, Weichao Xu, and Chris H. Kim University of Minnesota,

More information

ISSN Vol.08,Issue.07, July-2016, Pages:

ISSN Vol.08,Issue.07, July-2016, Pages: ISSN 2348 2370 Vol.08,Issue.07, July-2016, Pages:1312-1317 www.ijatir.org Low Power Asynchronous Domino Logic Pipeline Strategy Using Synchronization Logic Gates H. NASEEMA BEGUM PG Scholar, Dept of ECE,

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

Optimization of Phase- Locked Loop Circuits via Geometric Programming

Optimization of Phase- Locked Loop Circuits via Geometric Programming Optimization of Phase- Locked Loop Circuits via Geometric Programming D. Colleran, C. Portmann, A. Hassibi, C. Crusius, S. S. Mohan, S. Boyd, T. H. Lee, and M. Hershenson Outline Motivation Geometric programming

More information

Latch Based Design (1A) Young Won Lim 2/18/15

Latch Based Design (1A) Young Won Lim 2/18/15 Latch Based Design (1A) Copyright (c) 2015 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any

More information

GHz Asynchronous SRAM in 65nm. Jonathan Dama, Andrew Lines Fulcrum Microsystems

GHz Asynchronous SRAM in 65nm. Jonathan Dama, Andrew Lines Fulcrum Microsystems GHz Asynchronous SRAM in 65nm Jonathan Dama, Andrew Lines Fulcrum Microsystems Context Three Generations in Production, including: Lowest latency 24-port 10G L2 Ethernet Switch Lowest Latency 24-port 10G

More information

Figure 1. An 8-bit Superset Adder.

Figure 1. An 8-bit Superset Adder. Improving the Adder: A Fault-tolerant, Reconfigurable Parallel Prefix Adder Kyle E. Powers Dar-Eaum A. Nam Eric A. Llana ECE 4332 Fall 2012 University of Virginia @virginia.edu ABSTRACT

More information

Frequency and Voltage Scaling Design. Ruixing Yang

Frequency and Voltage Scaling Design. Ruixing Yang Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

High Performance and Low Power Design Techniques for ASIC and Custom in Nanometer Technologies. David Chinnery

High Performance and Low Power Design Techniques for ASIC and Custom in Nanometer Technologies. David Chinnery High Performance and Low Power Design Techniques for ASIC and Custom in Nanometer Technologies David Chinnery Outline Introduction Microarchitecture Clock distribution, clock gating, and registers Logic

More information

Near-Threshold Computing in FinFET Technologies: Opportunities for Improved Voltage Scalability

Near-Threshold Computing in FinFET Technologies: Opportunities for Improved Voltage Scalability Near-Threshold Computing in FinFET Technologies: Opportunities for Improved Voltage Scalability Nathaniel Pinckney 1, Lucian Shifren 2, Brian Cline 3, Saurabh Sinha 3, Supreet Jeloka 1, Ronald G. Dreslinski

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

Spare Block Cache Architecture to Enable Low-Voltage Operation

Spare Block Cache Architecture to Enable Low-Voltage Operation Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 1-1-2011 Spare Block Cache Architecture to Enable Low-Voltage Operation Nafiul Alam Siddique Portland State University

More information

THE synchronous DRAM (SDRAM) has been widely

THE synchronous DRAM (SDRAM) has been widely IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 10, OCTOBER 1997 1597 A Study of Pipeline Architectures for High-Speed Synchronous DRAM s Hoi-Jun Yoo Abstract The performances of SDRAM s with different

More information

A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array. Mingu Kang, Sujan Gonugondla, Naresh Shanbhag

A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array. Mingu Kang, Sujan Gonugondla, Naresh Shanbhag A 19.4 nj/decision 364K Decisions/s In-Memory Random Forest Classifier in 6T SRAM Array Mingu Kang, Sujan Gonugondla, Naresh Shanbhag University of Illinois at Urbana Champaign Machine Learning under Resource

More information

10/5/2016. Review of General Bit-Slice Model. ECE 120: Introduction to Computing. Initialization of a Serial Comparator

10/5/2016. Review of General Bit-Slice Model. ECE 120: Introduction to Computing. Initialization of a Serial Comparator University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Example of Serialization Review of General Bit-Slice Model General model parameters

More information

Jan Rabaey Homework # 7 Solutions EECS141

Jan Rabaey Homework # 7 Solutions EECS141 UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Last modified on March 30, 2004 by Gang Zhou (zgang@eecs.berkeley.edu) Jan Rabaey Homework # 7

More information

Introduction to CMOS VLSI Design Lecture 13: SRAM

Introduction to CMOS VLSI Design Lecture 13: SRAM Introduction to CMOS VLSI Design Lecture 13: SRAM David Harris Harvey Mudd College Spring 2004 1 Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports Serial Access

More information

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path

16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Volume 4 Issue 01 Pages-4786-4792 January-2016 ISSN (e): 2321-7545 Website: http://ijsae.in 16x16 Multiplier Design Using Asynchronous Pipeline Based On Constructed Critical Data Path Authors Channa.sravya

More information

Emerging DRAM Technologies

Emerging DRAM Technologies 1 Emerging DRAM Technologies Michael Thiems amt051@email.mot.com DigitalDNA Systems Architecture Laboratory Motorola Labs 2 Motivation DRAM and the memory subsystem significantly impacts the performance

More information

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration

More information

Analysis of Different Multiplication Algorithms & FPGA Implementation

Analysis of Different Multiplication Algorithms & FPGA Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 8(2) I DDQ Current Testing (Chapter 13) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Learning aims Describe the

More information

Selecting PLLs for ASIC Applications Requires Tradeoffs

Selecting PLLs for ASIC Applications Requires Tradeoffs Selecting PLLs for ASIC Applications Requires Tradeoffs John G. Maneatis, Ph.., President, True Circuits, Inc. Los Altos, California October 7, 2004 Phase-Locked Loops (PLLs) are commonly used to perform

More information

1. Designing a 64-word Content Addressable Memory Background

1. Designing a 64-word Content Addressable Memory Background UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Project Phase I Specification NTU IC541CA (Spring 2004) 1. Designing a 64-word Content Addressable

More information

Chapter 9. Design for Testability

Chapter 9. Design for Testability Chapter 9 Design for Testability Testability CUT = Circuit Under Test A design property that allows: cost-effective development of tests to be applied to the CUT determining the status of the CUT (normal

More information

VLSI microarchitecture. Scaling. Toni Juan

VLSI microarchitecture. Scaling. Toni Juan VLSI microarchitecture Scaling Toni Juan Short index Administrative glass schedule projects? Topic of the day: Scaling VLSI microarchitecture Toni Juan, Nov 2000 2 Administrative Class Schedule Day Topic

More information

Readings. H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152

Readings. H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152 Readings H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152 Recent Research Paper The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays, Hrishikesh et

More information

ASNT1016-PQA 16:1 MUX-CMU

ASNT1016-PQA 16:1 MUX-CMU 16:1 MUX-CMU 16 to 1 multiplexer (MUX) with integrated CMU (clock multiplication unit). PLL-based architecture featuring both counter and forward clocking modes. Supports multiple data rates in the 9.8-12.5Gb/s

More information

DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design

DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Or, how I learned to stop worrying and love complexity. http://www.eecs.umich.edu/~taustin Microprocessor Verification Task of determining

More information

Memory, Latches, & Registers

Memory, Latches, & Registers Memory, Latches, & Registers 1) Structured Logic Arrays 2) Memory Arrays 3) Transparent Latches 4) How to save a few bucks at toll booths 5) Edge-triggered Registers L13 Memory 1 General Table Lookup Synthesis

More information