Performance Optimization and Analysis of Blade Designs under Delay Variability
|
|
- Gervais Lewis
- 5 years ago
- Views:
Transcription
1 Performance Optimization and Analysis of Blade Designs under Delay Variability Dylan Hand, Hsin-Ho Huang, Benmao Chang, Yang Zhang *, Matheus Trevisan Moreira, Melvin Breuer, Ney Laert Vilar Calazans, and Peter A. Beerel May 5th, 2015 * University of Southern California, Los Angeles, CA Pontifi cia Universidade Cato lica do Rio Grande do Sul, Porto Alegre, Brazil Dept. of Automation, Tsinghua University, Beijing, China
2 The Context: Delay Overheads Cycle time of clocked logic Logic Time Logic gates PVT margin Clock margin Flip-flop alignment Traditional synchronous design suffers from increased margins Worse at low and near-threshold regions [Dreslinksi et al., IEEE Proc. 2010] MOTIVATION 2
3 Number of Operations Potential of Average-Case Data Cycle time of clocked logic Data delay in Plasma CPU Logic Time Logic gates Worst-Case Data Delay Slower Delay variation due to data is rarely exploited in synchronous designs MOTIVATION 3
4 Number of Operations Potential of Average-Case Data Cycle time of clocked logic Worst-Case Delay Data delay in Plasma CPU Logic Time Logic gates Worst-Case Data Delay Slower Delay variation due to data is rarely exploited in synchronous designs MOTIVATION 3
5 Number of Operations Potential of Average-Case Data Cycle time of clocked logic Average Delay Worst-Case Delay Data delay in Plasma CPU Logic Time Logic gates Worst-Case Data Delay Slower Delay variation due to data is rarely exploited in synchronous designs MOTIVATION 3
6 CLK Sample CLK Sample One Solution: Resiliency Controller A Asynchronous Controller Controller B Err Reconfigurable Delay Lines Err Combinational Logic Error Detection Logic Single Rail Datapath Error Detection Logic Error Detecting Latch Timing errors delay handshaking by the resiliency window RESILIENCY 4
7 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
8 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
9 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
10 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
11 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err = 1 Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
12 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err = 1 Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
13 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
14 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4
15 Number of Operations Resiliency Performance Benefit Nominal Delay Worst-Case Delay Data delay in Plasma CPU Delay Key Question: How do we set to optimize performance RESILIENCY 5
16 Outline Performance Optimization Delay models Impact of delay line quantization Impact of metastability Comparison to Bubble Razor Case Study Analyze and optimize a 3-stage Blade CPU Conclusions and Future Work OUTLINE 6
17 Probability Probability Frequency Delay Models Normal Distribution Real World Distribution Logic Delay Logic Delay Our approach Log-Normal Distribution Logic Delay Analyze the performance of Blade for a variety of delay models DELAY MODELS 7
18 Probability Optimal Average-Case Performance σ μ + DELAY MODELS 8
19 Probability Probability Optimal Average-Case Performance P R ( d ) Probability of error (p) + : Average delay of Blade stage Definitions C : Clock Period / Cycle Time EC : Effective Clock Period p : Probability of error d : Average delay of Blade stage C = + d = + p 1-P R ( d ) + Optimal performance achieved by minimizing d Assumes backward latency is hidden via latch retiming DELAY MODELS 8
20 Optimal Probability of Error - p opt p opt observations Varies between 20% and 35% for lognormal distributions Significantly higher than in sync resiliency Higher Variance Constant for normal distributions! DELAY MODELS 9
21 Proof of constant p opt Assume worst case delay per stage is constant Worst case delay is set by mean, variance, and SER K = + K = μ + m σ Systematic Error Rate (ξ) sets the worst-case delay per stage, K ξ = 1 P R d C m = f(ξ) N DELAY MODELS 10
22 Proof of constant p opt Assume worst case delay per stage is constant Worst case delay is set by mean, variance, and SER K = + K = μ + m σ Recall: d = + p = 1 p + p K For Normal distribution: 1 p = 1 [1 + erf μ ] 2 2σ μ 1 2p = erf 2σ Taking inverse error function of both sides: erf 1 μ 1 2p = 2σ = 2σ[erf 1 1 2p ] + μ DELAY MODELS 10
23 Proof of constant p opt Assume worst case delay per stage is constant Worst case delay is set by mean, variance, and SER K = + K = μ + m σ Recall: Rewrite: d = + p = 1 p + p K d = 1 p [ 2σ[erf 1 1 2p ] + μ] + p K Minimize d by taking derivative and setting it equal to zero : d p = 1 + y 2 erf 1 y y 2 erf 1 y + m = 0 y = 1 2p Note y and p are independent of σ and μ! m depends only on ξ Implication: Tuning of delay line may target fixed probability! DELAY MODELS 10
24 (%) Optimal Size of Resiliency Window - Blade supports maximum of 50% of clock cycle Optimal is larger for designs with high-variance! DELAY LINE QUANTIZATION 11
25 Delay Delay Line Quantification Effects Recall: d = + p p * Quantification effects reduced due to inherent tradeoff between nominal delay and error penalty p * DELAY LINE QUANTIZATION 12
26 Delay Delay Line Quantification in BD Bundled Data: d = d Linear relationship between delay line quantization and average stage delay in Bundled Data DELAY LINE QUANTIZATION 13
27 No MS MS in (only) MS in Detection Logic Metastability Effects p : probability of timing violation t MS : time for metastability to resolve t MSE : MS in error-detecting latch t MSL : MS in detection logic E[delay] + + t MSL + t MSL METASTABILITY 14
28 No MS MS in (only) MS in Detection Logic Metastability Effects p : probability of timing violation t MS : time for metastability to resolve t MSE : MS in error-detecting latch t MSL : MS in detection logic Metastability resolution times most often hidden! E[delay] + + t MSL + t MSL METASTABILITY 14
29 Metastability Effects METASTABILITY 15
30 Metastability Effects METASTABILITY 15
31 Comparison to Sync Resiliency Synchronous EC set by systematic error rate N-Stage Rings F F F F Bubble Razor [Zhang,2014] EC = C 2 1 p 2N Blade EC = + p opt M S M S COMPARISON 16
32 Comparison to Sync Resiliency Synchronous EC set by systematic error rate Synchronous 35% 23% BR Bubble Razor [Zhang,2014] EC = C 2 1 p 2N Blade EC = + p opt Blade Normal Distribution COMPARISON 16
33 Comparison to Sync Resiliency Synchronous EC set by systematic error rate Bubble Razor [Zhang,2014] EC = C 2 1 p 2N Blade EC = + p opt Normal Distribution COMPARISON 16
34 Application to 3-Stage CPU Plasma MIPS OpenCore 28nm FDSOI Compare mathematical model of optimal resiliency window () with simulation results [1] CASE STUDY 17
35 Application to 3-Stage CPU 20 Distribution A Cycles (%) Delay (%) Distribution C Cycles (%) Delay (%) CASE STUDY 18
36 Application to 3-Stage CPU Distribution A Distribution B Distribution C Model Sim Model Sim Model Sim max 27% 35% 43% opt 26% 27% 34% 35% 39% 37% EC opt 74.8% 75.1% 71.3% 71.9% 75.1% 74.8% Model estimated optimal within 5.4% Optimal EC within 99% CASE STUDY 19
37 Application to 3-Stage CPU Distribution A Distribution B Distribution C Model Sim Model Sim Model Sim max 27% 35% 43% opt 26% 27% 34% 35% 39% 37% EC opt 74.8% 75.1% 71.3% 71.9% 75.1% 74.8% Ideal opt 48% 53% 46% Ideal EC opt 63.2% 67.8% 74.5% Model estimated optimal within 5.4% Optimal EC within 99% Model allows estimation of optimal w/o limitations of simulated design CASE STUDY 19
38 Summary and Conclusions Performance model Use either analytical and real world delay distributions Predicts performance within 99% accuracy Comparison to sync N-stage rings 23% better than Bubble Razor 35% better than traditional designs Several interesting conclusions Optimal error rate is relatively high and may be constant Programmable delay line need not be fine-grained Metastability impact is negligible Supporting larger resiliency windows may be useful CONCLUSIONS 20
39 Questions?
On the Reuse of Timing Resilient Architecture for Testing Path Delay Faults in Critical Paths
On the Reuse of Timing Resilient Architecture for Testing Path Delay Faults in Critical Paths Felipe A. Kuentzer, Leonardo R. Juracy and Alexandre M. Amory Faculty of Informatics, PUCRS University Av.
More informationArea Optimization of Resilient Designs Guided by a Mixed Integer Geometric Program
Area Optimization of Resilient Designs Guided by a Mixed Integer Geometric Program Hsin-Ho Huang 1 Huimei Cheng 1 Chris Chu 2 Peter A. Beerel 1 hsinhohu@usc.edu huimeich@usc.edu cnchu@iastate.edu pabeerel@usc.edu
More informationCHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER
84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The
More informationReliable Physical Unclonable Function based on Asynchronous Circuits
Reliable Physical Unclonable Function based on Asynchronous Circuits Kyung Ki Kim Department of Electronic Engineering, Daegu University, Gyeongbuk, 38453, South Korea. E-mail: kkkim@daegu.ac.kr Abstract
More informationPushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University
PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -
More informationChronos Latency - Pole Position Performance
WHITE PAPER Chronos Latency - Pole Position Performance By G. Rinaldi and M. T. Moreira, Chronos Tech 1 Introduction Modern SoC performance is often limited by the capability to exchange information at
More informationA Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique
A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique P. Durga Prasad, M. Tech Scholar, C. Ravi Shankar Reddy, Lecturer, V. Sumalatha, Associate Professor Department
More informationEvaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination
Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Richard Kershaw and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering, Viterbi School
More informationAIRbus Interface. Features Fixed width (8-, 16-, or 32-bit) data transfers (dependent on the width. Functional Description. General Arrangement
AIRbus Interface December 22, 2000; ver. 1.00 Functional Specification 9 Features Fixed width (8-, 16-, or 32-bit) data transfers (dependent on the width of the data bus) Read and write access Four-way
More informationFrequency and Voltage Scaling Design. Ruixing Yang
Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages
More informationSingle-Track Asynchronous Pipeline Templates Using 1-of-N Encoding
Single-Track Asynchronous Pipeline Templates Using 1-of-N Encoding Marcos Ferretti, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California Los Angeles, CA 90089
More informationImplementing Tile-based Chip Multiprocessors with GALS Clocking Styles
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues
More informationLecture 11 Logic Synthesis, Part 2
Lecture 11 Logic Synthesis, Part 2 Xuan Silvia Zhang Washington University in St. Louis http://classes.engineering.wustl.edu/ese461/ Write Synthesizable Code Use meaningful names for signals and variables
More informationUltra Low Power (ULP) Challenge in System Architecture Level
Ultra Low Power (ULP) Challenge in System Architecture Level - New architectures for 45-nm, 32-nm era ASP-DAC 2007 Designers' Forum 9D: Panel Discussion: Top 10 Design Issues Toshinori Sato (Kyushu U)
More informationISSN Vol.08,Issue.07, July-2016, Pages:
ISSN 2348 2370 Vol.08,Issue.07, July-2016, Pages:1312-1317 www.ijatir.org Low Power Asynchronous Domino Logic Pipeline Strategy Using Synchronization Logic Gates H. NASEEMA BEGUM PG Scholar, Dept of ECE,
More informationSystemVerilog Assertions for Clock-Domain-Crossing Data Paths. Don Mills Microchip Technology Inc.
SystemVerilog Assertions for Clock-Domain-Crossing Data Paths Don Mills Microchip Technology Inc. Outline Brief Review of CDC Concepts and Issues Basics of SystemVerilog Assertions Modeling Techniques
More informationGlobally Asynchronous Locally Synchronous FPGA Architectures
Globally Asynchronous Locally Synchronous FPGA Architectures Andrew Royal and Peter Y. K. Cheung Department of Electrical & Electronic Engineering, Imperial College, London, UK {a.royal, p.cheung}@imperial.ac.uk
More informationMinimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007
Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More information10/5/2016. Review of General Bit-Slice Model. ECE 120: Introduction to Computing. Initialization of a Serial Comparator
University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Example of Serialization Review of General Bit-Slice Model General model parameters
More informationL12: I/O Systems. Name: ID:
L12: I/O Systems Name: ID: Synchronous and Asynchronous Buses Synchronous Bus (e.g., processor-memory buses) Includes a clock in the control lines and has a fixed protocol for communication that is relative
More informationAn Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart
An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart Weiwei Jiang Columbia University, USA Gabriele Miorandi University of Ferrara, Italy Wayne Burleson
More informationSynchronization In Digital Systems
2011 International Conference on Information and Network Technology IPCSIT vol.4 (2011) (2011) IACSIT Press, Singapore Synchronization In Digital Systems Ranjani.M. Narasimhamurthy Lecturer, Dr. Ambedkar
More informationLow Power Asynchronous Viterbi Decoder Using Hybrid Register Exchange Method
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 12, Issue 4, Ver. II (Jul.-Aug. 2017), PP 26-30 www.iosrjournals.org Low Power Asynchronous
More informationOptimization of Robust Asynchronous Circuits by Local Input Completeness Relaxation. Computer Science Department Columbia University
Optimization of Robust Asynchronous ircuits by Local Input ompleteness Relaxation heoljoo Jeong Steven M. Nowick omputer Science Department olumbia University Outline 1. Introduction 2. Background: Hazard
More informationBISTed cores and Test Time Minimization in NOC-based Systems
BISTed cores and Test Time Minimization in NOC-based Systems Érika Cota 1 Luigi Carro 1,2 Flávio Wagner 1 Marcelo Lubaszewski 1,2 1 PPGC - Instituto de Informática 2 PPGEE - Depto. Engenharia Elétrica
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationPOWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS
POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University
More informationImplementation of Asynchronous Topology using SAPTL
Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 4: Memory Hierarchy Memory Taxonomy SRAM Basics Memory Organization DRAM Basics Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering
More informationAdaptive Robustness Tuning for High Performance Domino Logic
Adaptive Robustness Tuning for High Performance Domino Logic Bharan Giridhar 1, David Fick 1, Matthew Fojtik 1, Sudhir Satpathy 1, David Bull 2, Dennis Sylvester 1 and David Blaauw 1 1 niversity of Michigan,
More informationMCD: A Multiple Clock Domain Microarchitecture
MCD: A Multiple Clock Domain Microarchitecture Dave Albonesi in collaboration with Greg Semeraro Grigoris Magklis Rajeev Balasubramonian Steve Dropsho Sandhya Dwarkadas Michael Scott Caveats We started
More informationA Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy
A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy Abstract This paper work leads to a working implementation of a Low Power DDR SDRAM Controller that is meant to be used as a reference for
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationDYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR. Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro
DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre, RS - Brazil
More informationResearch Article Dynamic Reconfigurable Computing: The Alternative to Homogeneous Multicores under Massive Defect Rates
International Journal of Reconfigurable Computing Volume 2, Article ID 452589, 7 pages doi:.55/2/452589 Research Article Dynamic Reconfigurable Computing: The Alternative to Homogeneous Multicores under
More informationRe-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs
More informationDDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators
DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators Tuck-Boon Chan, Puneet Gupta, Andrew B. Kahng and Liangzhen Lai UC San Diego ECE and CSE Departments, La Jolla,
More informationPhysical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture
1 Physical Implementation of the DSPI etwork-on-chip in the FAUST Architecture Ivan Miro-Panades 1,2,3, Fabien Clermidy 3, Pascal Vivet 3, Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris,
More information8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin
Experiences of Low Power Design Implementation and Verification Shi-Hao Chen Global Unichip Corp. Hsin-Chu Science Park, Hsin-Chu, Taiwan 300 +886-3-564-6600 hockchen@globalunichip.com Jiing-Yuan Lin Global
More informationPessimism Reduction In Static Timing Analysis Using Interdependent Setup and Hold Times
Pessimism Reduction In Static Timing Analysis Using Interdependent Setup and Hold Times Emre Salman, Ali Dasdan, Feroze Taraporevala, Kayhan Küçükçakar, and Eby G. Friedman Department of Electrical and
More informationOn Constructing Lower Power and Robust Clock Tree via Slew Budgeting
1 On Constructing Lower Power and Robust Clock Tree via Slew Budgeting Yeh-Chi Chang, Chun-Kai Wang and Hung-Ming Chen Dept. of EE, National Chiao Tung University, Taiwan 2012 年 3 月 29 日 Outline 2 Motivation
More informationEEL 4783: HDL in Digital System Design
EEL 4783: HDL in Digital System Design Lecture 10: Synthesis Optimization Prof. Mingjie Lin 1 What Can We Do? Trade-offs with speed versus area. Resource sharing for area optimization. Pipelining, retiming,
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017
Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of
More informationSelecting PLLs for ASIC Applications Requires Tradeoffs
Selecting PLLs for ASIC Applications Requires Tradeoffs John G. Maneatis, Ph.., President, True Circuits, Inc. Los Altos, California October 7, 2004 Phase-Locked Loops (PLLs) are commonly used to perform
More informationAutomated versus Manual Design of Asynchronous Circuits in DSM Technologies
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.inf.pucrs.br Automated versus Manual Design of Asynchronous Circuits in DSM Technologies Matheus Moreira, Bruno Oliveira, Julian Pontes, Ney Calazans
More informationBUS TIMING ANALYSIS. George E Hadley, Timothy Rogers, and David G Meyer 2018, Images Property of their Respective Owners
BUS TIMING ANALYSIS George E Hadley, Timothy Rogers, and David G Meyer 2018, Images Property of their Respective Owners LEARNING OBJECTIVES identify CPU and memory timing parameters draw a bus timing diagram
More informationCode No: R Set No. 1
Code No: R059210504 Set No. 1 II B.Tech I Semester Regular Examinations, November 2007 DIGITAL LOGIC DESIGN ( Common to Computer Science & Engineering, Information Technology and Computer Science & Systems
More informationLatch Based Design (1A) Young Won Lim 2/18/15
Latch Based Design (1A) Copyright (c) 2015 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any
More informationECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I
ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I Overview Anti-fuse and EEPROM-based devices Contemporary SRAM devices - Wiring - Embedded New trends - Single-driver wiring -
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationHermes-GLP: A GALS Network on Chip Router with Power Control Techniques
IEEE Computer Society Annual Symposium on VLSI Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques Julian Pontes 1, Matheus Moreira 2, Rafael Soares 3, Ney Calazans 4 Faculty of Informatics,
More informationStephen Churcher, Tom Kean, and Bill Wilkie Xilinx Inc. (presented by Raj Patel)
The XC6200 astmap TM Processor Interface Stephen Churcher, Tom Kean, and ill Wilkie Xilinx Inc. (presented by Raj Patel) bstract The Xilinx XC6200 is the first commercially available PG to be specifically
More informationAsynchronous Elastic Dataflows by Leveraging Clocked EDA
Optimised Synthesis of Asynchronous Elastic Dataflows by Leveraging Clocked EDA Mahdi Jelodari Mamaghani, Jim Garside, Will Toms, & Doug Edwards Verona, Italy 29 th August 2014 Motivation: Automatic GALSification
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationMultimedia-Systems. Operating Systems. Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr. rer. nat. Max Mühlhäuser Prof. Dr.-Ing. Wolfgang Effelsberg
Multimedia-Systems Operating Systems Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr. rer. nat. Max Mühlhäuser Prof. Dr.-Ing. Wolfgang Effelsberg WE: University of Mannheim, Dept. of Computer Science Praktische
More informationA Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset
A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute
More informationSymmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment
Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network
More informationKyoung Hwan Lim and Taewhan Kim Seoul National University
Kyoung Hwan Lim and Taewhan Kim Seoul National University Table of Contents Introduction Motivational Example The Proposed Algorithm Experimental Results Conclusion In synchronous circuit design, all sequential
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationAsynchronous Circuits: An Increasingly Practical Design Solution
Asynchronous Circuits: An Increasingly Practical Design Solution Peter A. Beerel Fulcrum Microsystems Calabasas Hills, CA 91301 and Electrical Engineering, Systems Division University of Southern California
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationDYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)
DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS
More informationA Framework for Systematic Evaluation and Exploration of Design Rules
A Framework for Systematic Evaluation and Exploration of Design Rules Rani S. Ghaida* and Prof. Puneet Gupta EE Dept., University of California, Los Angeles (rani@ee.ucla.edu), (puneet@ee.ucla.edu) Work
More informationChapter 8. Coping with Physical Failures, Soft Errors, and Reliability Issues. System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P.
Chapter 8 Coping with Physical Failures, Soft Errors, and Reliability Issues System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P. 1 1 What is this chapter about? Gives an Overview of and
More informationA Hybrid Load Balance Mechanism for Distributed Home Agents in Mobile IPv6
A Hybrid Load Balance Mechanism for Distributed Home Agents in Mobile IPv6 1 Hui Deng 2Xiaolong Huang 3Kai Zhang 3 Zhisheng Niu 1Masahiro Ojima 1R&D Center Hitachi (China) Ltd. Beijing 100004, China 2Dept.
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast
More informationQuantifying the Dynamic Ocean Surface Using Underwater Radiometric Measurements
DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Quantifying the Dynamic Ocean Surface Using Underwater Radiometric Measurements Dick K.P. Yue Center for Ocean Engineering
More informationECE 551: Digital System *
ECE 551: Digital System * Design & Synthesis Lecture Set 5 5.1: Verilog Behavioral Model for Finite State Machines (FSMs) 5.2: Verilog Simulation I/O and 2001 Standard (In Separate File) 3/4/2003 1 Explicit
More informationECE 752 Adv. Computer Architecture I
. UIVERSIY OF WISCOSI ECE 752 Adv. Computer Architecture I Midterm Exam 1 Held in class Wednesday, March 9, 2005 ame: his exam is open books, open notes, and open all handouts (including previous homeworks
More informationModeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors
Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Siew-Kei Lam Centre for High Performance Embedded Systems, Nanyang Technological University, Singapore (assklam@ntu.edu.sg)
More informationFast Flexible FPGA-Tuned Networks-on-Chip
This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe
More informationPOWER consumption has become one of the most important
704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George
More informationThe Ohio State University Columbus, Ohio, USA Universidad Autónoma de Nuevo León San Nicolás de los Garza, Nuevo León, México, 66450
Optimization and Analysis of Variability in High Precision Injection Molding Carlos E. Castro 1, Blaine Lilly 1, José M. Castro 1, and Mauricio Cabrera Ríos 2 1 Department of Industrial, Welding & Systems
More informationLecture 24: Sequential Logic Design. Let s refresh our memory.
18 100 Lecture 24: equential Logic esign 15 L24 1 James C. Hoe ept of ECE, CMU April 21, 2015 Today s Goal: tart thinking about stateful stuff Announcements: Read Rizzoni 12.6 HW 9 due Exam 3 on April
More informationFlexible Architecture Research Machine (FARM)
Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense
More informationHow to Synchronize a Pausible Clock to a Reference. Robert Najvirt, Andreas Steininger
How to Synchronize a Pausible Clock to a Reference Robert Najvirt, Andreas Steininger GALS Communication The communication between two (locally) synchronous modules has inevitable potential for metastable
More informationChallenges in Verification of Clock Domain Crossings
Challenges in Verification of Clock Domain Crossings Vishnu C. Vimjam and Al Joseph Real Intent Inc., Sunnyvale, CA, USA Notice of Copyright This material is protected under the copyright laws of the U.S.
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationIN this letter we focus on OpenFlow-based network state
1 On the Impact of Networ State Collection on the Performance of SDN Applications Mohamed Aslan, and Ashraf Matrawy Carleton University, ON, Canada Abstract Intelligent and autonomous SDN applications
More informationThe Impact of Wave Pipelining on Future Interconnect Technologies
The Impact of Wave Pipelining on Future Interconnect Technologies Jeff Davis, Vinita Deodhar, and Ajay Joshi School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332-0250
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate
More informationSEE Tolerant Self-Calibrating Simple Fractional-N PLL
SEE Tolerant Self-Calibrating Simple Fractional-N PLL Robert L. Shuler, Avionic Systems Division, NASA Johnson Space Center, Houston, TX 77058 Li Chen, Department of Electrical Engineering, University
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationA Comparison of Alternative Encoding Mechanisms for Web Services 1
A Comparison of Alternative Encoding Mechanisms for Web Services 1 Min Cai, Shahram Ghandeharizadeh, Rolfe Schmidt, Saihong Song Computer Science Department University of Southern California Los Angeles,
More informationDepartment of Electrical Engineering and Computer Sciences Fall 2016 Instructors: Randy Katz, Bernhard Boser
University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Fall 2016 Instructors: Randy Katz, Bernhard Boser 2016-12-16 L CS61C FINAL J After the
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationEECS150 - Digital Design Lecture 17 Memory 2
EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit
More informationOutline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline
Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication Khanh N. Dang and Xuan-Tu Tran Email: khanh.n.dang@vnu.edu.vn VNU Key Laboratory for Smart Integrated Systems
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationSequential Circuit Design: Principle
Sequential Circuit Design: Principle Chapter 8 1 Outline 1. Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing asynchronous circuit 4. Inference of basic memory elements
More informationThe Design of the KiloCore Chip
The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory
More informationEECS Components and Design Techniques for Digital Systems. Lec 07 PLAs and FSMs 9/ Big Idea: boolean functions <> gates.
Review: minimum sum-of-products expression from a Karnaugh map EECS 5 - Components and Design Techniques for Digital Systems Lec 7 PLAs and FSMs 9/2- David Culler Electrical Engineering and Computer Sciences
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationBuilt-In Self-Test for Regular Structure Embedded Cores in System-on-Chip
Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip Srinivas Murthy Garimella Master s Thesis Defense Thesis Advisor: Dr. Charles E. Stroud Committee Members: Dr. Victor P. Nelson
More informationDNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses
DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses Paul N. Whatmough 1,2 S. K. Lee 2, N. Mulholland 2, P. Hansen 2, S. Kodali 3, D. Brooks 2, G.-Y. Wei 2 1 ARM Research, Boston,
More informationAn A-FPGA Architecture for Relative Timing Based Asynchronous Designs
An A-FPGA Architecture for Relative Timing Based Asynchronous Designs Jotham Vaddaboina Manoranjan Kenneth S. Stevens Department of Electrical and Computer Engineering University of Utah Abstract This
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More information