Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University
|
|
- Lynne Griffin
- 5 years ago
- Views:
Transcription
1 Self-Repair for Robust System Design Yanjing Li Intel Labs Stanford University 1
2 Hardware Failures: Major Concern Permanent: our focus Temporary 2
3 Tolerating Permanent Hardware Failures Detection Diagnosis Recovery Self-repair 3
4 Tolerating Permanent Hardware Failures Detection Diagnosis Recovery Self-repair Concurrent error detection: expensive 4
5 Tolerating Permanent Hardware Failures Detection Diagnosis Online Self-Test and Diagnostics Non-continuous Recovery Low power Self-repair Concurrent No visible downtime 5
6 Tolerating Permanent Hardware Failures Detection Diagnosis Online Self-Test and Diagnostics Localize failures Recovery Diagnosis Self-repair 6
7 Tolerating Permanent Hardware Failures Detection Diagnosis Replace / bypass faulty component(s) Self-repair Recovery Self-repair 7
8 Tolerating Permanent Hardware Failures Detection Diagnosis Correction of corrupted data and states Checkpoint Failure occurs Failure detected Rollback Recovery Self-repair 8
9 This Lecture Detection Diagnosis Recovery Self-repair 9
10 Outline Introduction Self-repair techniques Memories Processor cores Uncore components Conclusion 10
11 Memory Organization en rd/wr logic addr Row decoder Column decoder Column multiplexers data_in Data registers data_out Sense amplifiers 11
12 Memory Functional Fault Models Memory cell array faults stuck-at, transition, coupling, etc. Address decode faults (AFs) no cell accessed, multiple cells accessed, etc. Read/write logic faults Equivalent to memory cell array faults Single bit / word / column / row faults dominate [Aitken 04, Kurdahi 06, Sridharan 12] 12
13 Spare Rows / Columns en rd/wr logic Spare column addr Row decoder Column decoder Column multiplexers Spare rows data_in Data registers data_out Sense amplifiers 13
14 Memory Self-Repair Using Spare Rows en rd/wr logic addr Row decoder Column decoder Column multiplexers data_in Data registers data_out Sense amplifiers Row decoder modified 14
15 Memory Self-Repair Using Spare Columns en rd/wr logic addr Row decoder Column decoder Multiplexers Column multiplexers Selfrepair control data_in Data registers data_out Sense amplifiers Additional multiplexers and select logic 15
16 Spare Words en rd/wr logic Same inputs as RAM addr Row decoder Column decoder Column multiplexers Spare words pool data_in Data registers Sense amplifiers data_out Self-repair control [Kim 98] 16
17 Redundancy in Memories Essential to improve yield and reliability Widely used in commercial RAMs Related topics How much redundancy? Built-in repair analysis: redundancy allocation 17
18 How Much Redundancy? Yield analysis and yield learning Example: negative binomial memory yield model YYYYY = (1 + YY Y) Y: defect desntiy, Y: memory area Y: defect clustering coefficient (measured to be 2 or 3) For a memory array with N rows and 1 spare row YYYYY = YYYYY+(Y + 1)(YYYYY)(1 YYYYY) 18
19 How Much Redundancy? Yield analysis and yield learning Example: negative binomial memory yield model 19
20 Built-In Repair Analysis Spare rows and columns NP complete Various algorithms and EDA tools exist 20
21 Special Self-Repair Techniques for Caches Caches: affects performance, NOT functionality Block address Tag Index Block offset V Tag Data / ECC Decoder =? Hit/miss Data out 21
22 Cache Line Disable/Delete Exist in commercial systems Intel [Chang 07], IBM [Sanda 08], etc. Block address Tag Decoder Index Block offset Fault-tolerance bit V FT Tag Data / ECC =? Hit/miss Data out 22
23 Setting the FT Bit Cache line read ECC calculate and check ECC error? No Yes Correct and flag error; log faulty location Done ECC error at same location already occurred within a time threshold? No Yes Set FT bit 23
24 PADded Cache Reconfigurable cache with programmable decoder Conventional decoder 24
25 PADded Cache Reconfigurable cache with programmable decoder Additional tag bits needed Programmable decoder ~ 5% area cost [Shirvani 99] 16KB, direct-mapped, 1-level programmability Reduce cost at the price of granularity 25
26 Cache Line Delete vs. PADded Cache 26
27 Outline Introduction Self-repair techniques Memories Processor cores Uncore components Conclusion 27
28 Core Sparing Utilize multi-/many-core architectures Core disabling also possible Already in commercial products IBM BlueGene/Q Nvidia Geforce Cisco Metro Fine-grained approaches? 28
29 Core Cannibalization Many-core designs with small in-order cores 3.5% area cost (OpenRISC 1200) [Romanescu 08] 29
30 Core Cannibalization: Discussion What about diagnosis? Routing performance impact Additional wire delay pipeline stages Modified branch resolution, bypass logic, etc. Decreased clock frequency Small vs. large number of faulty cores 30
31 Microarchitectural Block Disabling Disable half pipeline way in superscalar designs 12% area cost (includes diagnosis) Issue queue F D R Rd E M W C F D R Rd E M W C [Schuchman 05] 31
32 Microarchitectural Block Disabling Disable half pipeline way in superscalar designs 12% area cost (includes diagnosis) Issue queue F S D R S Rd E M W C * * * * * * * *modified stages F S D R S Rd E M W C * * * * * * * [Schuchman 05] 2 shift stages (S) added and lots design modifications 32
33 Microarchitectural Block Disabling Disable half pipeline way in superscalar designs 12% area cost (includes diagnosis) Issue queue, old half F S D R S Rd E M W C * * * * * * * *modified stages F S D R S Rd E M W C * * * * * * * Issue queue, new half Split issue queue (same for store buffer, not shown) [Schuchman 05] 33
34 Microarchitectural Block Disabling: Discussion Expensive, complex, intrusive Diagnosis logic Reconfiguration logic Coverage issues E.g., fetch stage not covered 34
35 Architectural Core Salvaging Many-core CISC processor designs Application 1 Application n Advanced ISA Basic ISA... Advanced ISA Basic ISA Core 1 Core n [Powell 09] 35
36 Architectural Core Salvaging Many-core CISC processor designs Application 1 Application n Basic uops Basic uops Advanced ISA Basic ISA... Advanced ISA Basic ISA Core 1 Core n [Powell 09] 36
37 Architectural Core Salvaging Many-core CISC processor designs Application 1 Application n Advanced uops Basic uops Advanced ISA Basic ISA... Advanced ISA Basic ISA Core 1 Core n [Powell 09] 37
38 Architectural Core Salvaging Many-core CISC processor designs Application 1 Thread swap Application n Advanced uops Basic uops Advanced ISA Advanced ISA Basic ISA... Basic ISA Core 1 Core n [Powell 09] 38
39 Architectural Core Salvaging Many-core CISC processor designs Application n Thread swap Application 1 Basic uops Advanced uops Advanced ISA Advanced ISA Basic ISA... Basic ISA Core 1 Core n [Powell 09] 39
40 Architectural Core Salvaging: Discussion What about diagnosis Applicability CISC-like architectures Performance Depends Coverage issues ~50% coverage of execution 40
41 Self-Repair for Processor Cores Which technique to choose? Software-assisted techniques? 41
42 Outline Introduction Self-repair techniques Memories Processor cores Uncore components Conclusion 42
43 Uncore Prevalent in SoCs Uncore examples IBM Power 7 Cache / DRAM controller Accelerators I/O interfaces NVIDIA Tegra Cisco Network Processor 43
44 Uncore Self-Repair Essential OpenSPARC T2 Uncore 12% Processor cores 12% Memories 76% Cores, memories, networks-on-chip Many existing techniques 44
45 Key Message [Li ITC13] 20 Existing sparing techniques Chip area impact (%) 0 Self-repair coverage (%) 98 45
46 Key Message [Li ITC13] 20 Existing sparing techniques Chip area impact (%) 7.5 Our techniques Self-repair coverage (%) 98 46
47 Key Message [Li ITC13] 20 Existing sparing techniques Chip area impact (%) 7.5 Performance: 0% (fault-free) 0.3% - 5% (faulty) Power: 3% Our techniques Self-repair coverage (%) 98 47
48 Existing Techniques Inadequate OpenSPARC T2 Uncore 48
49 Existing Techniques Inadequate OpenSPARC T2 Uncore Original Spare 49
50 Existing Techniques Inadequate OpenSPARC T2 Original Spare Steering logic 50
51 Existing Techniques Inadequate OpenSPARC T2 Original Power gating Spare Steering logic 20% chip area cost 51
52 New Uncore Self-Repair Techniques ERRS: Enhanced Resource Reallocation and Sharing SHE: Sparing through Hierarchical Exploration OpenSPARC T2 8 cores, 64 threads 500M transistors 7.5% chip area cost 52
53 Outline Introduction Self-repair for uncore ERRS SHE Conclusion 53
54 Basic Resource Reallocation & Sharing (RRS) Already-existing similar components No spares 3. Reroute Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank control 0 L2 bank control 1 L2 mem 1 1. Fault detected 8 cycle overhead (faulty case only) 2. Resource sharing 54
55 Basic RRS Performance Impact Fault-free scenario Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank MISS! control 0 L2 bank MISS! control 1 L2 mem 1 Fill buffer 55
56 Basic RRS Performance Impact Single faulty component: 70% impact Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank control 0 L2 bank control 1 L2 mem 1 Fill buffer DATADATA 56
57 Enhanced RRS (ERRS) Idea Identify Mitigate No Desirable tradeoff? Analyze Done Yes 57
58 ERRS Mitigates RRS Performance Impact Single faulty component: 3% impact Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank control 0 L2 bank control 1 L2 mem 1 Fill buffer Fill buffer 58
59 ERRS on OpenSPARC T2 DRAM data return: duplicated DRAM bank control FSM: entries doubled L2 hit processing: duplicated L2 miss fill buffer: entries doubled 3.2% area, 2.7% power 59
60 Performance Evaluation Setup Simulators GEM5 64 single-issue in-order processor cores Simulated CMP 1KB private L1 data and instruction caches 4MByte shared L2 cache (8 banks) 4 DRAM controllers 60
61 ERRS vs. Basic RRS: Performance Impact ERRS: 3% performance impact 3% 1 faulty L2 controller 1 faulty DRAM controller 0.7% Average CPI overhead 0% Basic RRS RRS ERRS 0.0% Basic RRS RRS ERRS 64-core CMP PARSEC benchmark programs 61
62 ERRS vs. Basic RRS: Performance Impact ERRS: 5% performance impact 1 faulty L2 controller 1 faulty DRAM controller 18% 25% Average CPI overhead 0% Basic RRS ERRS 0% 64-core CMP Stressed programs Basic RRS RRS ERRS 62
63 ERRS for Multiple Faulty Components 25 Graceful performance degradation Average CPI impact (%) 0 2 DRAM controllers 4 L2 bank controllers 4 L2 bank controllers + 2 DRAM controllers 2 L2 bank controllers 64-core CMP PARSEC benchmark programs Faulty components 63
64 Which Faults Repairable? delay bridging stuckat All faults inside a component 64
65 Which Faults Not Repairable? Single points of failure Primary inputs Primary outputs Steering logic Metric: self-repair coverage % non-single points of failure 65
66 ERRS Component Self-Repair Coverage 97% 97% 98% 97% 97% 97% 97% 96% 97% 97% 97% 97% 98% 97% 97% 66
67 ERRS Chip Area/Power Impact & Coverage 20 Existing sparing techniques Post-layout chip area impact (%) 3.2 Power: 3% ERRS 0 75 Self-repair coverage (%) 98 67
68 Outline Introduction Self-repair for uncore ERRS SHE Conclusion 68
69 SHE: Sparing through Hierarchical Exploration Component RTL SHE Sparing for component Minimize area cost Identical blocks spare shared Balanced coverage vs. area 69
70 The Design Hierarchy Network controller pio Level 1 rdmc Level 2 chnl_16 chnl_15 chnl_1 Level 3 Lowest level 70
71 Sparing in Different Levels: Coarse-Grained pio Network controller pio pio rdmc rdmc rdmc Original Spare / steering logic Level 2 71
72 Sparing in Different Levels: Mixed Levels Network controller pio rdmc pio pio chnl_16 chnl_15 chnl_1 chnl Level 2 / Level 3 Original Spare / steering logic 72
73 Sparing in Different Levels: Fine-Grained Network controller pio rdmc Original Spare / steering logic Lowest level 73
74 How to Obtain Sweet-Spot? Balanced tradeoff High Area overhead Self-repair coverage Low Deeper in hierarchy Algorithm details in paper 74
75 SHE Results 20 Existing sparing techniques Post-layout chip area impact (%) 3.2 ERRS 0 75 Self-repair coverage (%) 98 75
76 SHE Results 20 Existing sparing techniques Post-layout chip area impact (%) 7.5 SHE power impact: 0.2% SHE performance impact: 0% ERRS + SHE 0 Self-repair coverage (%) 98 76
77 What About Diagnosis? Traditional fault diagnosis difficult Effect-cause: infeasible RTL unavailable Cause-effect: impractical 7 petabyte fault dictionary [Beckler ITC12] 77
78 Fault Diagnosis Simplified CASP [Li DATE08, VTS10] Concurrent, Autonomous, Stored test Patterns 78
79 Fault Diagnosis Simplified CASP [Li DATE08, VTS10] Concurrent, Autonomous, Stored test Patterns Blocks = self-repair granularity Block 1 Block n 79
80 Fault Diagnosis Simplified CASP [Li DATE08, VTS10] Concurrent, Autonomous, Stored test Patterns Pass/fail 1 Local test logic Pass/fail n Fail self-repair needed Diagnosis = detection Block 1 Block n = self-repair granularity 80
81 Summary: Self-Repair of Uncore Components 20 Existing sparing techniques Post-layout chip area impact (%) 7.5 Performance: 0% (fault-free) 0.3% - 5% (faulty) Power: 3% ERRS + SHE 3.2 ERRS 0 75 Self-repair coverage (%) 98 81
82 References [Aitken 04] Aitken, R., A Modular Wrapper Enabling High Speed BIST and Repair for Small Wide Memories, Proc. Intl. Test Conf., pp , [Chang 07] Chang, J., et al., The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series, IEEE Journal of Solid-State Circuits, vol. 42, no. 4, pp , [Kurdahi 06] Kurdahi, F.J., et al., System-Level SRAM Yield Enhancement, Proc. Intl. Symp. Quality Electronic Design,
83 References [Li VTS10] Li, Y., et al., Concurrent Autonomous Self-Test for Uncore Components in System-on-Chips, Proc. VLSI Test Symposium, [Li DATE08] Li, Y., S. Makar, and S. Mitra, CASP: Concurrent Autonomous Chip Self-Test using Stored Test Patterns, Proc. Design, Automation, and Test in Europe, pp , [Li ITC 13] Li, Y., et al., Self-Repair of Uncore Components in Robust System-on-Chips: An OpenSPARC T2 Case Study, Proc. IEEE Intl. Test Conf., pp. 1-10,
84 References [Powell 09] Powell, M.D., et al., Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance, Proc. Intl. Symp. on Computer Architecture, pp , [Romanescu 08] Romanescu, B.F., and D.J. Sorin, Core Cannibalization Architecture: Improving Lifetime Chip Performance for Multicore Processors in the Presence of Hard Faults, Proc. Intl. Conf. on Parallel Architectures and Compilation Techniques, pp ,
85 References [Sanda 08] Sanda, P.N, et al., Fault-Tolerant Design of the IBM Power6 Microprocessor, IEEE Micro, vol. 28, no. 2, pp , [Schuchman 05] Schuchman, E., and T.N. Vijaykumar, Rescue: A Microarchitecture for Testability and Defect Tolerance, Proc. Intl. Symp. on Computer Architecture, pp , [Shirvani 99] Shirvani, P.P., and E.J. McCluskey, PADded Cache: A New Fault-Tolerance Technique for Cache Memories, Proc. VLSI Test Symp., pp ,
86 References [Shivakumar 03] Shivakumar, P., et al., Exploiting Microarchitectural Redundancy for Defect Tolerance, Proc. Intl. Conf. on Computer Design, pp , [Sridharan 12] Sridharan, V., et al., A Study of DRAM Failures in the Field, Proc. Intl. Conf. High Performance Computing, Networking, Storage and Analysis, pp. 1-11, [Schuchman 05] Schuchman, E., and T.N. Vijaykumar, Rescue: A Microarchitecture for Testability and Defect Tolerance, Proc. Intl. Symp. on Computer Architecture, pp ,
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More informationPuey Wei Tan. Danny Lee. IBM zenterprise 196
Puey Wei Tan Danny Lee IBM zenterprise 196 IBM zenterprise System What is it? IBM s product solutions for mainframe computers. IBM s product models: 700/7000 series System/360 System/370 System/390 zseries
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationUltra Low-Cost Defect Protection for Microprocessor Pipelines
Ultra Low-Cost Defect Protection for Microprocessor Pipelines Smitha Shyam Kypros Constantinides Sujay Phadke Valeria Bertacco Todd Austin Advanced Computer Architecture Lab University of Michigan Key
More informationAN OPTIMAL APPROACH FOR TESTING EMBEDDED MEMORIES IN SOCS
International Journal of Engineering Inventions ISSN: 2278-7461, www.ijeijournal.com Volume 1, Issue 8 (October2012) PP: 76-80 AN OPTIMAL APPROACH FOR TESTING EMBEDDED MEMORIES IN SOCS B.Prathap Reddy
More informationHyperthreading Technology
Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached
More informationARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES
ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationPOWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX
Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationA Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies
A Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies Abstract: Performance degradation tolerance (PDT) has been shown to be able to effectively improve the yield, reliability,
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationBuilt-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs
Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs Shyue-Kung Lu and Shih-Chang Huang Department of Electronic Engineering Fu Jen Catholic University Hsinchuang, Taipei, Taiwan 242, R.O.C.
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationReducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip
Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off
More informationudirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults
1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationYield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM
Yield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM Markus Rudack Dirk Niggemeyer Laboratory for Information Technology Division Design & Test University of Hannover
More informationOne-Level Cache Memory Design for Scalable SMT Architectures
One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationTest/Repair Area Overhead Reduction for Small Embedded SRAMs
Test/Repair Area Overhead Reduction for Small Embedded SRAMs Baosheng Wang and Qiang Xu ATI Technologies Inc., 1 Commerce Valley Drive East, Markham, ON, Canada L3T 7X6, bawang@ati.com Dept. of Computer
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationAn Area-Efficient BIRA With 1-D Spare Segments
206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 1, JANUARY 2018 An Area-Efficient BIRA With 1-D Spare Segments Donghyun Kim, Hayoung Lee, and Sungho Kang Abstract The
More informationHardware Design I Chap. 10 Design of microprocessor
Hardware Design I Chap. 0 Design of microprocessor E-mail: shimada@is.naist.jp Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory
More informationRescue: A Microarchitecture for Testability and Defect Tolerance
Rescue: A Microarchitecture for Testability and Defect Tolerance Ethan Schuchman and T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University {erys, vijay}@purdue.edu Abstract
More informationScalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)
Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN) Abstract With increasing design complexity in modern SOC design, many memory
More informationReference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded
Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8
More informationEE 457 Unit 7b. Main Memory Organization
1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationLeveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD
Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation
More informationVery Large Scale Integration (VLSI)
Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationSimultaneous Multithreading on Pentium 4
Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationComputer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationSoft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study
Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Bradley F. Dutton, Graduate Student Member, IEEE, and Charles E. Stroud, Fellow, IEEE Dept. of Electrical and Computer Engineering
More informationCHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI
CHAPTER 2 ARRAY SUBSYSTEMS [2.4-2.9] MANJARI S. KULKARNI OVERVIEW Array classification Non volatile memory Design and Layout Read-Only Memory (ROM) Pseudo nmos and NAND ROMs Programmable ROMS PROMS, EPROMs,
More informationInternational Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)
Programmable FSM based MBIST Architecture Sonal Sharma sonal.sharma30@gmail.com Vishal Moyal vishalmoyal@gmail.com Abstract - SOCs comprise of wide range of memory modules so it is not possible to test
More informationFault Tolerance in Multicore Processors With Reconfigurable Hardware Unit
Fault Tolerance in Multicore Processors With Reconfigurable Hardware Unit Rajesh S, Vinoth Kumar C, Srivatsan R, Harini S and A.P. Shanthi Department of Computer Science & Engineering, College of Engineering
More informationECE 152 Introduction to Computer Architecture
Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course
More informationConstructing Virtual Architectures on Tiled Processors. David Wentzlaff Anant Agarwal MIT
Constructing Virtual Architectures on Tiled Processors David Wentzlaff Anant Agarwal MIT 1 Emulators and JITs for Multi-Core David Wentzlaff Anant Agarwal MIT 2 Why Multi-Core? 3 Why Multi-Core? Future
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationCS250 VLSI Systems Design Lecture 9: Memory
CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationDiagnostic Testing of Embedded Memories Using BIST
Diagnostic Testing of Embedded Memories Using BIST Timothy J. Bergfeld Dirk Niggemeyer Elizabeth M. Rudnick Center for Reliable and High-Performance Computing, University of Illinois 1308 West Main Street,
More informationThe PA 7300LC Microprocessor: A Highly Integrated System on a Chip
The PA 7300LC Microprocessor: A Highly Integrated System on a Chip A collection of design objectives targeted for low-end systems and the legacy of an earlier microprocessor, which was designed for high-volume
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationCOEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)
1 COEN-4730 Computer Architecture Lecture 12 Testing and Design for Testability (focus: processors) Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University 1 Outline Testing
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationIntroduction to Robust Systems
Introduction to Robust Systems Subhasish Mitra Stanford University Email: subh@stanford.edu 1 Objective of this Talk Brainstorm What is a robust system? How can we build robust systems? Robust systems
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationHY225 Lecture 12: DRAM and Virtual Memory
HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access
More informationInternal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.
Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Internal Memory http://www.yildiz.edu.tr/~naydin 1 2 Outline Semiconductor main memory Random Access Memory
More informationMemory. From Chapter 3 of High Performance Computing. c R. Leduc
Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor
More informationLecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )
Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationAUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM
AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM B.HARIKRISHNA 1, DR.S.RAVI 2 1 Sathyabama Univeristy, Chennai, India 2 Department of Electronics Engineering, Dr. M. G. R. Univeristy, Chennai,
More informationReview: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.
Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum
More informationARCHITECTURE DESIGN FOR SOFT ERRORS
ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan
More informationSimultaneous Multithreading (SMT)
Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue
More informationand self-repair for memories, and (iii) support for
A BIST Implementation Framework for Supporting Field Testability and Configurability in an Automotive SOC Amit Dutta, Srinivasulu Alampally, Arun Kumar and Rubin A. Parekhji Texas Instruments, Bangalore,
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationKeywords and Review Questions
Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain
More informationA Comprehensive Analytical Performance Model of DRAM Caches
A Comprehensive Analytical Performance Model of DRAM Caches Authors: Nagendra Gulur *, Mahesh Mehendale *, and R Govindarajan + Presented by: Sreepathi Pai * Texas Instruments, + Indian Institute of Science
More informationRE-CONFIGURABLE BUILT IN SELF REPAIR AND REDUNDANCY MECHANISM FOR RAM S IN SOCS Ravichander Bogam 1, M.Srinivasa Reddy 2 1
RE-CONFIGURABLE BUILT IN SELF REPAIR AND REDUNDANCY MECHANISM FOR RAM S IN SOCS Ravichander Bogam 1, M.Srinivasa Reddy 2 1 Department of Electronics and Communication Engineering St. Martins Engineering
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More informationLecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"
Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More information18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013
18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationPACE: Power-Aware Computing Engines
PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious
More informationArchitecture as Interface
Architecture as Interface André DeHon Friday, June 21, 2002 Previously How do we build efficient, programmable machines How we mix Computational complexity W/ physical landscape
More informationLow-power Architecture. By: Jonathan Herbst Scott Duntley
Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit
More informationFile Memory for Extended Storage Disk Caches
File Memory for Disk Caches A Master s Thesis Seminar John C. Koob January 16, 2004 John C. Koob, January 16, 2004 File Memory for Disk Caches p. 1/25 Disk Cache File Memory ESDC Design Experimental Results
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationStructure of Computer Systems
222 Structure of Computer Systems Figure 4.64 shows how a page directory can be used to map linear addresses to 4-MB pages. The entries in the page directory point to page tables, and the entries in a
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationDual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window
Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Huiyang Zhou School of Computer Science University of Central Florida New Challenges in Billion-Transistor Processor Era
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationEfficient BISR strategy for Embedded SRAM with Selectable Redundancy using MARCH SS algorithm. P. Priyanka 1 and J. Lingaiah 2
Proceedings of International Conference on Emerging Trends in Engineering & Technology (ICETET) 29th - 30 th September, 2014 Warangal, Telangana, India (SF0EC009) ISSN (online): 2349-0020 Efficient BISR
More informationCore Fusion: Accommodating Software Diversity in Chip Multiprocessors
Core Fusion: Accommodating Software Diversity in Chip Multiprocessors Authors: Engin Ipek, Meyrem Kırman, Nevin Kırman, and Jose F. Martinez Navreet Virk Dept of Computer & Information Sciences University
More information15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011
5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer
More information