Robust System Design with MPSoCs Unique Opportunities
|
|
- Marion Agnes Greene
- 5 years ago
- Views:
Transcription
1 Robust System Design with MPSoCs Unique Opportunities Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University Acknowledgment: Stanford Robust Systems Group Students & Collaborators Robust Systems in Scaled CMOS Failure rate Future: Burn-in difficult Soft errors, erratic bits, process variation Future: NBTI aging, etc. Past Infant mortality Normal lifetime Wearout Time Radiation, Erratic bits Aging, Infant mortality, Process variation 2 Burn-in & Test Socket Workshop Inputs Design errors, Software failures Robust System Malicious attacks, Human errors Acceptable outputs Performance Power Availability, Security 2 Page 3-2-
2 Globally Optimized Robust System Design + Traditional mainframes Globally optimized robust system App. Failure Prediction Recognition Mining Synthesis (RMS) Cost Handheld, mobile devices, ASICs Our Goal Runtime Arch. Failure-Awareness / Resilience + Circuit BISER On-line Self Test 3 Multi-Core Robust System Opportunities Multi-core massive redundancy (e.g., TMR) Reconfigurable reliability turn protection on / off Power expensive, LOCAL gates inexpensive On-line self test unique multi-core opportunities Required for Circuit Failure Prediction Essential for intelligent sparing Asymmetric core-level reliability Globally optimized performance vs. resilience 4 Page
3 Outline Introduction Built-In Soft Error Resilience (BISER) Circuit failure prediction Error Resilient System Architecture (ERSA) Conclusion 5 Key Takeaway Logic soft errors extremely important Classical error detection + recovery Expensive & complex Built-In Soft Error Resilience (BISER) Best of all Efficient & practical Latch erratic errors also corrected CORRECT Errors DON T Detect!! 6 Page
4 Who Cares About Soft Errors? Transient errors: α-particles, neutrons 2K processor server farm Comb. logic major error every 2 days Silent data corruption (SDC) $ 2K $ 3,66 deposit Latch Un protected memory $ 2K $ 52,768 deposit Detected but uncorrected (DUE) Soft error rate contributions Downtime cost: $K - $M / hour 7 BISER for Latch Errors A B C-element (A, B) Previous value Previous value IN Comb. logic OUT Latch D C Weak keeper OUT D Clock C C-element Redundant Latch (Scan Test & Debug reuse) 8 Page
5 BISER Latch Error Correction Principle System Latch D C C- element Weak Keeper D C Redundant Latch (Test & Debug Reuse) Error Blocked System Output Key Observation: Latches vulnerable only in Opaque state (CLK = ) 9 BISER Scan Flip-flop + Reconfigurable Protection Design Reuse Enabled Scan Clock B = Scan Data Scan Clock A Capture = + & Scan / Checking Flip-flop D C 2D C2 D C Scan Output Keeper Update System Data System Clock D C D C 2D C2 System Flip- flop System Output C-element Page
6 Architecture-Aware BISER Insertion cumulative error coverage % 8% 6% 4% 2% X chip-level protection 2X 2.5% power penalty % % 2% 4% 6% 8% % cumulative latch coverage Alpha 2264 Fault Injection 9% chip-level power penalty Ack: Prof. S.J. Patel, UIUC for fault injector Optimized BISER insertion [Seshia, Li & Mitra, DATE 7] Maximize protection, minimize power penalty BISER: Latch + Combinational Logic Errors IN Comb. logic OUT Delay (~ 2ps) OUT2 Incremental power cost over latch error correction: minimal τ Latch D C D C Latch Clock C C- element Weak keeper OUT Required for latch error correction 2 Page
7 BISER vs. Traditional Techniques BISER DICE Duplicate Multi-thread Latch SER > 25X 25X Detected Detected Comb. logic SER 2-64X Ineffective Detected Detected Chip energy 6 9% > 5% 4-% ~ 4 % (?) Speed penalty Latch: % % Minimal ~ 5% Comb: 5% Die size increase ~ None ~ None Yes Unclear Downtime None None Yes Yes Recovery None None Complex Complex Configurability Yes No Yes Yes Applicability Unlimited Unlimited Unlimited Processor 3 Outline Introduction Built-In Soft Error Resilience (BISER) Circuit failure prediction Error Resilient System Architecture (ERSA) Conclusion 4 Page
8 Key Takeaway Failure rate Solution: Circuit Failure Prediction Future: Burn-in difficult Infant mortality Superior to error detection Normal lifetime Practical test chip prototype Future: NBTI aging Wearout Past Time Effective up to 4x aging guardband reduction MPSoC opportunities: On-line self-test & adaptation 5 Circuit Failure Prediction Predict failures BEFORE errors appear Collect data over time: on-chip sensors Applicability: Transistor aging, infant mortality Failure Prediction Before errors appear Error Detection After errors appear + No corrupt data & states Corrupt data & states + Self-diagnosis possible Limited diagnosis A little fire is quickly trodden out; Which, being suffer'd, rivers cannot quench. William Shakespeare King Henry the Sixth, Part III 6 Page
9 Failure Prediction for Transistor Aging Small speed guardband Tg (e.g., for 5 days) Predict aging: Special Sensors + On-line Self-Test Guaranteed correct outputs No End of 5 days Analyze prediction results Enough aging? Yes Self-correct / Self-adapt Special flip-flops with Aging Resistant Built-In Aging Sensor OpenRISC core (65nm technology) Ring osc. / Temp. sensors inadequate Speed penalty Chip-level power penalty Die size increase Design flow impact Adjust V t (Forward body bias?) Adjust V dd (Runaway?) Adjust speed? Core sparing < % <.4% Not expected Minimal 7 Aging Sensor Principle Clock Comb. logic Clock Delayed Clock OUT Clock D Tg Flip-Flop with Built-in Aging Sensor Delay Element Stability Checker Sticky Latch Stability Output OUT (Today) OUT (5 days later) clock D Flip-Flop Output Detect transitions during Tg No additional hold time constraint 8 Page
10 Circuit Failure Prediction Benefits Reduced guardband: close to best-case performance Clock Clock 7 yr. worst-case guardband Reduced initial guardband (Tg) Combinational logic output stable at time Clock speed GHz 3 GHz % Guardband Reduction 7-yr worst guardband (% clock period) Initial guardband (Tg) day None days None 2 48 CORRECT OUTPUTS GUARANTEED 9 Built-In Aging Sensor: Test Chip Prototype Clock Stability checker Test Chip Waveforms (Measured) Signal transition outside Tg, Stability checker output = Signal transition inside Tg, Stability checker output = voltage (au) voltage (au) time (au) time (au) 2 Page 3-2-
11 Outline Introduction Built-In Soft Error Resilience (BISER) Circuit failure prediction Error Resilient System Architecture (ERSA) Conclusion 2 Emerging Probabilistic Killer Applications Recognition, Mining, Synthesis (RMS) Cognitive, computational genomics, vision, Large data sets Highly parallel Core algorithms Probabilistic belief propagation, K-means clustering, Bayesian networks Our society is creating massive amounts of complex data There is a critical need to recognize, mine and synthesize all of this digital data. Pat Gelsinger Intel, Feb Page 3-2-
12 RMS Error Resilience Opportunities Algorithmic resilience probabilistic & iterative Low order bit-errors minimal effects Known for decades Cognitive resilience Acceptable results OK BAD NEWS RMS + unreliable H/W Doesn t work Control errors, High-order bit-errors 23 ERSA Hardware Prototype Results No ERSA: RMS + unreliable H/W Doesn t work Avg. GP + SP + BP errors to crash: FP errors: highly inaccurate results RMS + ERSA at 6 FITs (3, errors / sec.): No crashes, highly accurate results Useful throughput maintained Linear speedup 24 Page
13 ERSA: Asymmetric Reliability is Key Relaxed Reliability Cores (RRCs) Sequestered from OS I$ () Fetch & Decode Issue Logic FPU ALU D$ () Ld/St MMU Reliable Interconnect Strictly Reliable Core (SRC) OS visible 25 ERSA: Error Resilient System Architecture Relaxed Reliability Cores (RRCs) Sequestered from OS SRC specification Highly reliable (expensive) I$ () Fetch & Decode Issue Logic Execute OS + system calls Execute main thread FPU ALU D$ () Ld/St Assign worker threads to RRCs MMU RRC memory bounds spec. RRC Timeout Handle RRC exceptions Reliable Interconnect Strictly Reliable Core (SRC) OS visible 26 Page
14 ERSA: Error Resilient System Architecture Relaxed Reliability Cores (RRCs) Sequestered from OS I$ () Fetch & Decode Issue Logic FPU ALU D$ () Ld/St MMU RRC specification Cheap & unreliable Configurable reliability: general apps Execute worker threads Reliable Reliable Interconnect Memory bounds check Cache Context save & restore Strictly Reliable Core (SRC) Suspend & resume OS visible 27 RMS on ERSA Main Thread (SRC) Setup Work Assignment (Work, Memory bounds, Timeout) Worker thread + bounds check (RRC) Worker thread + bounds check (RRC) Barrier Iterations Timeout check Handle Calculate Calculate exceptions Data Reduction Done? Ignore diverging iteration Error handling on SRC Terminate erroneous thread Continue to next iteration 28 Page
15 ERSA Prototype Application Program Many-Core Runtime Error Injection into Hardware (Virtualization technology used) MISP Emulation Firmware SRC RRC RRC RRC RRC RRC RRC RRC 4-socket, 2.6 GHz, Dual-core IA-32 processors 2 GB main memory Multiple Instruction Stream Processor (MISP) infrastructure [Hankins ISCA 6] 29 ERSA Work Throughput Results Valid Codes Per Sec. (Measured) Correct Bayesian Graph Edge Manipulations Per Sec. (Measured) LDPC Decoder (Error-free: 6 / s.) Error rate (FITs) Bayesian Network (Error-free: 4 / s.) Correct Clustering 3 Runs Per 2 Sec. (Measured) K-Means (Error-free: 32 / s.) Error Rate (FITs) Conservative correctness criteria! Error rate (FITs) 3 Page
16 ERSA Results: High-order Bit Errors LDPC Decoder K-means Clustering % % Increase No ERSA Correctly 8 In Mean ERSA ( 6 FITs) Decoded 6 Point- ( Blocks 6 FITs) 4 Cluster No ERSA Centroid 2 ( 6 FITs) ERSA Distance (. 6 FITs) Error Bit Bayesian Network Error Bit % Degradation No ERSA 3 FITs: 98% accuracy, In Score ( 6 FITs).5% more iterations Of Learned Network 6 FITs: 96% accuracy, ERSA % more iterations. ( 6 FITs) Error Bit 3 ERSA Speedup Results Speedup over Optimized Single Core (Measured) 6 4 LDPC Decoder No ERSA (No Errors Allowed) 2 ERSA ( 3 FITs) No. of Cores Speedup over Optimized Single Core (Measured) 3 2 Speedup over Optimized Single Core (Measured) Bayesian Network No ERSA (No Errors Allowed) ERSA ( 3 FITs) K-Means Clustering No ERSA (No Errors Allowed) ERSA ( 3 FITs) No. of Cores 6 FITs measured results similar No. of Cores 32 Page
17 Outline Introduction Built-In Soft Error Resilience (BISER) Circuit failure prediction Error Resilient System Architecture (ERSA) Conclusion 33 Conclusion Robust system design Global Optimization Unique opportunities: New thinking required BISER unique soft error properties Cost-effective correction Circuit failure prediction unique failure modes Effective prediction possible ERSA unique future killer apps Resilient to extremely high error rates 34 Page
18 3-2-8
Introduction to Robust Systems
Introduction to Robust Systems Subhasish Mitra Stanford University Email: subh@stanford.edu 1 Objective of this Talk Brainstorm What is a robust system? How can we build robust systems? Robust systems
More informationPost-Silicon Validation of Robust Systems
Post-Silicon Validation of Robust Systems Subhasish Mitra Robust Systems Group Dept. of EE & Dept. of CS Stanford University Acknowledgment: Students & collaborators 1 Robust System Challenges Technology
More informationChapter 8. Coping with Physical Failures, Soft Errors, and Reliability Issues. System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P.
Chapter 8 Coping with Physical Failures, Soft Errors, and Reliability Issues System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P. 1 1 What is this chapter about? Gives an Overview of and
More informationParallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik
Parallel Streaming Computation on Error-Prone Processors Yavuz Yetim, Margaret Martonosi, Sharad Malik Upsets/B muons/mb Average Number of Dopant Atoms Hardware Errors on the Rise Soft Errors Due to Cosmic
More informationChip, Heal Thyself. The BulletProof Project
Chip, Heal Thyself Todd Austin Advanced Computer Architecture Lab University of Michigan With Prof. Valeria Bertacco, Prof. Scott Mahlke Kypros Constantinides, Smitha Shyam Mojtaba Mehrara, Mona Attariyan,
More informationARCHITECTURE DESIGN FOR SOFT ERRORS
ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationTransient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.
Transient Fault Detection and Reducing Transient Error Rate Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof. Steven Swanson Outline Motivation What are transient faults? Hardware Fault Detection
More informationEE382C Lecture 14. Reliability and Error Control 5/17/11. EE 382C - S11 - Lecture 14 1
EE382C Lecture 14 Reliability and Error Control 5/17/11 EE 382C - S11 - Lecture 14 1 Announcements Don t forget to iterate with us for your checkpoint 1 report Send time slot preferences for checkpoint
More informationSoft-error and Variability Resilience in Dependable VLSI Platform. Hidetoshi Onodera Kyoto University
Soft-error and Variability Resilience in Dependable VLSI Platform Hidetoshi Onodera Kyoto University Outline: Soft-error and Variability Resilience 1 Background Overview: Dependable VLSI Platform Circuit-level
More informationReliable Architectures
6.823, L24-1 Reliable Architectures Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 6.823, L24-2 Strike Changes State of a Single Bit 10 6.823, L24-3 Impact
More information1. NoCs: What s the point?
1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos
More informationUsing Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance
Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance Outline Introduction and Motivation Software-centric Fault Detection Process-Level Redundancy Experimental Results
More informationOverview ECE 753: FAULT-TOLERANT COMPUTING 1/21/2014. Recap. Fault Modeling. Fault Modeling (contd.) Fault Modeling (contd.)
ECE 753: FAULT-TOLERANT COMPUTING Kewal K.Saluja Department of Electrical and Computer Engineering Fault Modeling Lectures Set 2 Overview Fault Modeling References Fault models at different levels (HW)
More informationUltra Low-Cost Defect Protection for Microprocessor Pipelines
Ultra Low-Cost Defect Protection for Microprocessor Pipelines Smitha Shyam Kypros Constantinides Sujay Phadke Valeria Bertacco Todd Austin Advanced Computer Architecture Lab University of Michigan Key
More informationDependable VLSI Platform using Robust Fabrics
Dependable VLSI Platform using Robust Fabrics Director H. Onodera, Kyoto Univ. Principal Researchers T. Onoye, Y. Mitsuyama, K. Kobayashi, H. Shimada, H. Kanbara, K. Wakabayasi Background: Overall Design
More informationMediaTek Overview AI/5G-enabled Systems Test Challenges Systems Orientation New Opportunities
MediaTek Overview AI/5G-enabled Systems Test Challenges Systems Orientation New Opportunities Source: www.datasciencecentral.com/profiles/blogs/artificial-intelligence-vs-machine-learning-vs-deep-learning
More informationEE434 ASIC & Digital Systems Testing
EE434 ASIC & Digital Systems Testing Spring 2015 Dae Hyun Kim daehyun@eecs.wsu.edu 1 Introduction VLSI realization process Verification and test Ideal and real tests Costs of testing Roles of testing A
More informationRadiation-Induced Error Criticality In Modern HPC Parallel Accelerators
Radiation-Induced Error Criticality In Modern HPC Parallel Accelerators Presented by: Christopher Boggs, Clayton Connors on 09/26/2018 Authored by: Daniel Oliveira, Laercio Pilla, Mauricio Hanzich, Vinicius
More informationECE 574 Cluster Computing Lecture 19
ECE 574 Cluster Computing Lecture 19 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 November 2015 Announcements Projects HW extended 1 MPI Review MPI is *not* shared memory
More informationAR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More informationJeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1. University of Virginia Computer Science 2. NVIDIA Research
A Hardware Redundancy and Recovery Mechanism for Reliable Scientific Computation on Graphics Processors Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1 1 University of Virginia Computer Science
More informationR-CAR GEN3: COMPUTING PLATFORM FOR AUTONOMOUS DRIVING ERA
R-CAR GEN3: COMPUTING PLATFORM FOR AUTONOMOUS DRIVING ERA Mitsuhiko Igarashi, Toyokazu Hori, Yoshihiko Hotta, Kazuki Fukuoka and Hirotaka Hara AUTOMOTIVE SOLUTION BIZ. UNIT, RENESAS ELECTRONICS CORPORATION
More information416 Distributed Systems. Errors and Failures Oct 16, 2018
416 Distributed Systems Errors and Failures Oct 16, 2018 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More informationRELIABILITY and RELIABLE DESIGN. Giovanni De Micheli Centre Systèmes Intégrés
RELIABILITY and RELIABLE DESIGN Giovanni Centre Systèmes Intégrés Outline Introduction to reliable design Design for reliability Component redundancy Communication redundancy Data encoding and error correction
More informationECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University
Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationError Resilience in Digital Integrated Circuits
Error Resilience in Digital Integrated Circuits Heinrich T. Vierhaus BTU Cottbus-Senftenberg Outline 1. Introduction 2. Faults and errors in nano-electronic circuits 3. Classical fault tolerant computing
More informationFinding Resilience-Friendly Compiler Optimizations using Meta-Heuristic Search
Finding Resilience-Friendly Compiler Optimizations using Meta-Heuristic Search Techniques Nithya Narayanamurthy Karthik Pattabiraman Matei Ripeanu University of British Columbia (UBC) 1 Motivation: Hardware
More informationUltra Depedable VLSI by Collaboration of Formal Verifications and Architectural Technologies
Ultra Depedable VLSI by Collaboration of Formal Verifications and Architectural Technologies CREST-DVLSI - Fundamental Technologies for Dependable VLSI Systems - Masahiro Fujita Shuichi Sakai Masahiro
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance
More information416 Distributed Systems. Errors and Failures Feb 1, 2016
416 Distributed Systems Errors and Failures Feb 1, 2016 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More informationLecture 2 VLSI Testing Process and Equipment
Lecture 2 VLSI Testing Process and Equipment Motivation Types of Testing Test Specifications and Plan Test Programming Test Data Analysis Automatic Test Equipment Parametric Testing Summary VLSI Test:
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Part 1.1.2: Introduction (Digital VLSI Systems) Liang Liu liang.liu@eit.lth.se 1 Outline Why Digital? History & Roadmap Device Technology & Platforms System
More informationOutline of Presentation Field Programmable Gate Arrays (FPGAs(
FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable
More informationThe Implications of Multi-core
The Implications of Multi- What I want to do today Given that everyone is heralding Multi- Is it really the Holy Grail? Will it cure cancer? A lot of misinformation has surfaced What multi- is and what
More informationPushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University
PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -
More informationStochastic Processors (or processors that do not always compute correctly by design)
Stochastic Processors (or processors that do not always compute correctly by design) Rakesh Kumar Department of Electrical and Computer Engineering University of Illinois, Urbana-Champaign Insisting on
More informationTowards Approximate Computing: Programming with Relaxed Synchronization
Towards Approximate Computing: Programming with Relaxed Synchronization Lakshminarayanan Renganarayana Vijayalakshmi Srinivasan Ravi Nair (presenting) Dan Prener IBM T.J. Watson Research Center October
More informationCOMP 633 Parallel Computing.
COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!
More informationFault Injection into GPGPU- Applications using GPU-Qin
Fault Injection into GPGPU- Applications using GPU-Qin Anne Gropler, Hasso-Plattner-Institute Prof. Dr. Andreas Polze, Lena Herscheid, M.Sc., Daniel Richter, M.Sc. Seminar: Fault Injection 25/06/15 Motivation
More informationEVALUATING OVERHEADS OF MULTIBIT SOFT-ERROR PROTECTION
[3B2-9] mmi2013040010.3d 11/7/013 17:9 Page 2... EVALUATING OVERHEADS OF MULTIBIT SOFT-ERROR PROTECTION IN THE PROCESSOR CORE... THE SVALINN FRAMEWORK PROVIDES COMPREHENSIVE ANALYSIS OF MULTIBIT ERROR
More informationTesting Principle Verification Testing
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES Test Process and Test Equipment Overview Objective Types of testing Verification testing Characterization testing Manufacturing testing Acceptance
More informationPOWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX
Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core
More informationSOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS
SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS Samira Khan MEMORY IN TODAY S SYSTEM Processor DRAM Memory Storage DRAM is critical for performance
More informationENG04057 Teste de Sistema Integrados. Prof. Eric Ericson Fabris (Marcelo Lubaszewski)
ENG04057 Teste de Sistema Integrados Prof. Eric Ericson Fabris (Marcelo Lubaszewski) Março 2011 Slides adapted from ABRAMOVICI, M.; BREUER, M.; FRIEDMAN, A. Digital Systems Testing and Testable Design.
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationPART I - Fundamentals of Parallel Computing
PART I - Fundamentals of Parallel Computing Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1 What is scientific computing?
More informationOrganic Computing DISCLAIMER
Organic Computing DISCLAIMER The views, opinions, and/or findings contained in this article are those of the author(s) and should not be interpreted as representing the official policies, either expressed
More informationHOME :: FPGA ENCYCLOPEDIA :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE
Page 1 of 8 HOME :: FPGA ENCYCLOPEDIA :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE FPGA I/O When To Go Serial by Brock J. LaMeres, Agilent Technologies Ads by Google Physical Synthesis Tools Learn How to Solve
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationOverview of the SPARC Enterprise Servers
Overview of the SPARC Enterprise Servers SPARC Enterprise Technologies for the Datacenter Ideal for Enterprise Application Deployments System Overview Virtualization technologies > Maximize system utilization
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationDIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design
DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Or, how I learned to stop worrying and love complexity. http://www.eecs.umich.edu/~taustin Microprocessor Verification Task of determining
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects
More informationMulti-site Probing for Wafer-Level Reliability
Multi-site Probing for Wafer-Level Reliability Louis Solis De Ancona 1 Sharad Prasad 2, David Pachura 2 1 Agilent Technologies 2 LSI Logic Corporation s Outline Introduction Multi-Site Probing Challenges
More informationSPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers
X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright
More informationPreface. Fig. 1 Solid-State-Drive block diagram
Preface Solid-State-Drives (SSDs) gained a lot of popularity in the recent few years; compared to traditional HDDs, SSDs exhibit higher speed and reduced power, thus satisfying the tough needs of mobile
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationIntel iapx 432-VLSI building blocks for a fault-tolerant computer
Intel iapx 432-VLSI building blocks for a fault-tolerant computer by DAVE JOHNSON, DAVE BUDDE, DAVE CARSON, and CRAIG PETERSON Intel Corporation Aloha, Oregon ABSTRACT Early in 1983 two new VLSI components
More informationReal Time: Understanding the Trade-offs Between Determinism and Throughput
Real Time: Understanding the Trade-offs Between Determinism and Throughput Roland Westrelin, Java Real-Time Engineering, Brian Doherty, Java Performance Engineering, Sun Microsystems, Inc TS-5609 Learn
More informationBuilding a Bridge: from Pre-Silicon Verification to Post-Silicon Validation
Building a Bridge: from Pre-Silicon Verification to Post-Silicon Validation FMCAD, 2008 Moshe Levinger 26/11/2008 Talk Outline Simulation-Based Functional Verification Pre-Silicon Technologies Random Test
More informationStream Processor Architecture. William J. Dally Stanford University August 22, 2003 Streaming Workshop
Stream Processor Architecture William J. Dally Stanford University August 22, 2003 Streaming Workshop Stream Arch: 1 August 22, 2003 Some Definitions A Stream Program expresses a computation as streams
More informationDistributed Systems (ICE 601) Fault Tolerance
Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability
More informationThe Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce
The Evolving NAND Flash Business Model for SSD Steffen Hellmold VP BD, SandForce Solid State Storage - Vision Solid State Storage in future Enterprise Compute Anything performance sensitive goes solid
More informationData Integrity in Stateful Services. Velocity, China, 2016
Data Integrity in Stateful Services Velocity, China, 2016 Data Integrity Bringing Sexy Back Protect the Data. -Every DBA who doesn t want to be fired Breaking Integrity Down Physical Integrity - Help,
More informationFABRICATION TECHNOLOGIES
FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general
More informationOutline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline
Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication Khanh N. Dang and Xuan-Tu Tran Email: khanh.n.dang@vnu.edu.vn VNU Key Laboratory for Smart Integrated Systems
More informationVerification and Testing
Verification and Testing He s dead Jim... L15 Testing 1 Verification versus Manufacturing Test Design verification determines whether your design correctly implements a specification and hopefully that
More informationContinuous Reliability Monitoring Using Adaptive Critical Path Testing
Continuous Reliability Monitoring Using Adaptive Critical Path Testing Bardia Zandian*, Waleed Dweik, Suk Hun Kang, Thomas Punihaole, Murali Annavaram Electrical Engineering Department, University of Southern
More informationSingle Event Latchup Power Switch Cell Characterisation
Single Event Latchup Power Switch Cell Characterisation Vladimir Petrovic, Marko Ilic, Gunter Schoof Abstract - In this paper are described simulation and measurement processes of a power switch cell used
More informationNoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods
1 NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods V. Venkatraman, A. Laffely, J. Jang, H. Kukkamalla, Z. Zhu & W. Burleson Interconnect Circuit
More informationParallel Methods for Verifying the Consistency of Weakly-Ordered Architectures. Adam McLaughlin, Duane Merrill, Michael Garland, and David A.
Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures Adam McLaughlin, Duane Merrill, Michael Garland, and David A. Bader Challenges of Design Verification Contemporary hardware
More informationDigital Design Methodology
Digital Design Methodology Prof. Soo-Ik Chae Digital System Designs and Practices Using Verilog HDL and FPGAs @ 2008, John Wiley 1-1 Digital Design Methodology (Added) Design Methodology Design Specification
More informationDeep Sub-Micron Cache Design
Cache Design Challenges in Deep Sub-Micron Process Technologies L2 COE Carl Dietz May 25, 2007 Deep Sub-Micron Cache Design Agenda Bitcell Design Array Design SOI Considerations Surviving in the corporate
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationSPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server
SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server 19 th April 2013 Toshio Yoshida Processor Development Division Enterprise Server Business Unit Fujitsu Limited SPARC64 TM SPARC64 TM
More informationUnderstanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE
Understanding Sources of Inefficiency in General-Purpose Chips Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE 1 Outline Motivation H.264 Basics Key ideas Implementation & Evaluation Summary
More informationNETWORKS on CHIP A NEW PARADIGM for SYSTEMS on CHIPS DESIGN
NETWORKS on CHIP A NEW PARADIGM for SYSTEMS on CHIPS DESIGN Giovanni De Micheli Luca Benini CSL - Stanford University DEIS - Bologna University Electronic systems Systems on chip are everywhere Technology
More informationTHE scaling of device geometries improves the performance
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 62, NO. 5, MAY 2015 471 An FPGA-Based Transient Error Simulator for Resilient Circuit and System Design and Evaluation Chia-Hsiang Chen,
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationA Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache
A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in
More informationLast Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks
Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power
More informationHigh Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2894-2900 ISSN: 2249-6645 High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs M. Reddy Sekhar Reddy, R.Sudheer Babu
More informationMultiple Event Upsets Aware FPGAs Using Protected Schemes
Multiple Event Upsets Aware FPGAs Using Protected Schemes Costas Argyrides, Dhiraj K. Pradhan University of Bristol, Department of Computer Science Merchant Venturers Building, Woodland Road, Bristol,
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 5 Processor-Level Techniques & Byzantine Failures Chapter 2 Hardware Fault Tolerance Part.5.1 Processor-Level Techniques
More informationMicroprocessor and DSP Technologies for the Nanoscale Era
Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel Corporation, Hillsboro, OR ram.krishnamurthy@intel.com 1 July 5, 2005 Intel
More informationLLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults
LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults Qining Lu, Mostafa Farahani, Jiesheng Wei, Anna Thomas, and Karthik Pattabiraman Department of Electrical and Computer Engineering,
More informationA 1.5GHz Third Generation Itanium Processor
A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1 Outline Processor highlights Process technology details Itanium processor evolution Block diagram
More informationEvaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability
Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability Liangzhen Lai, Vikas Chandra* and Puneet Gupta UCLA Electrical Engineering Department ARM Research* Radiation-Induced
More informationSelf-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University
Self-Repair for Robust System Design Yanjing Li Intel Labs Stanford University 1 Hardware Failures: Major Concern Permanent: our focus Temporary 2 Tolerating Permanent Hardware Failures Detection Diagnosis
More informationPuey Wei Tan. Danny Lee. IBM zenterprise 196
Puey Wei Tan Danny Lee IBM zenterprise 196 IBM zenterprise System What is it? IBM s product solutions for mainframe computers. IBM s product models: 700/7000 series System/360 System/370 System/390 zseries
More informationA large cluster architecture with Efficient Caching Coherency, Intelligent Management, and High Performance for a Low-Cost storage node
A large cluster architecture with Efficient Caching Coherency, Intelligent Management, and High Performance for a Low-Cost storage node Dr. M. K. Jibbe, Technical Director, NetApp Dean Lang DEV Architect,
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris
More informationCOEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)
1 COEN-4730 Computer Architecture Lecture 12 Testing and Design for Testability (focus: processors) Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University 1 Outline Testing
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationOperating Systems. Introduction & Overview. Outline for today s lecture. Administrivia. ITS 225: Operating Systems. Lecture 1
ITS 225: Operating Systems Operating Systems Lecture 1 Introduction & Overview Jan 15, 2004 Dr. Matthew Dailey Information Technology Program Sirindhorn International Institute of Technology Thammasat
More informationEnergy Efficiency and Resilience in Future ICs
Energy Efficiency and Resilience in Future ICs Andrew B. Kahng UCSD VLSI CAD Laboratory abk@cs.ucsd.edu http://vlsicad.ucsd.edu CSE and ECE Departments University of California, San Diego Outline The Power
More informationActel s SX Family of FPGAs: A New Architecture for High-Performance Designs
Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs A Technology Backgrounder Actel Corporation 955 East Arques Avenue Sunnyvale, California 94086 April 20, 1998 Page 2 Actel Corporation
More information