Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University

Size: px
Start display at page:

Download "Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University"

Transcription

1 Self-Repair for Robust System Design Yanjing Li Intel Labs Stanford University 1

2 Hardware Failures: Major Concern Permanent: our focus Temporary 2

3 Tolerating Permanent Hardware Failures Detection Diagnosis Recovery Self-repair 3

4 Tolerating Permanent Hardware Failures Detection Diagnosis Recovery Self-repair Concurrent error detection: expensive 4

5 Tolerating Permanent Hardware Failures Detection Diagnosis Online Self-Test and Diagnostics Non-continuous Recovery Low power Self-repair Concurrent No visible downtime 5

6 Tolerating Permanent Hardware Failures Detection Diagnosis Online Self-Test and Diagnostics Localize failures Recovery Diagnosis Self-repair 6

7 Tolerating Permanent Hardware Failures Detection Diagnosis Replace / bypass faulty component(s) Self-repair Recovery Self-repair 7

8 Tolerating Permanent Hardware Failures Detection Diagnosis Correction of corrupted data and states Checkpoint Failure occurs Failure detected Rollback Recovery Self-repair 8

9 This Lecture Detection Diagnosis Recovery Self-repair 9

10 Outline Introduction Self-repair techniques Memories Processor cores Uncore components Conclusion 10

11 Memory Organization en rd/wr logic addr Row decoder Column decoder Column multiplexers data_in Data registers data_out Sense amplifiers 11

12 Memory Functional Fault Models Memory cell array faults stuck-at, transition, coupling, etc. Address decode faults (AFs) no cell accessed, multiple cells accessed, etc. Read/write logic faults Equivalent to memory cell array faults Single bit / word / column / row faults dominate [Aitken 04, Kurdahi 06, Sridharan 12] 12

13 Spare Rows / Columns en rd/wr logic Spare column addr Row decoder Column decoder Column multiplexers Spare rows data_in Data registers data_out Sense amplifiers 13

14 Memory Self-Repair Using Spare Rows en rd/wr logic addr Row decoder Column decoder Column multiplexers data_in Data registers data_out Sense amplifiers Row decoder modified 14

15 Memory Self-Repair Using Spare Columns en rd/wr logic addr Row decoder Column decoder Multiplexers Column multiplexers Selfrepair control data_in Data registers data_out Sense amplifiers Additional multiplexers and select logic 15

16 Spare Words en rd/wr logic Same inputs as RAM addr Row decoder Column decoder Column multiplexers Spare words pool data_in Data registers Sense amplifiers data_out Self-repair control [Kim 98] 16

17 Redundancy in Memories Essential to improve yield and reliability Widely used in commercial RAMs Related topics How much redundancy? Built-in repair analysis: redundancy allocation 17

18 How Much Redundancy? Yield analysis and yield learning Example: negative binomial memory yield model YYYYY = (1 + YY Y) Y: defect desntiy, Y: memory area Y: defect clustering coefficient (measured to be 2 or 3) For a memory array with N rows and 1 spare row YYYYY = YYYYY+(Y + 1)(YYYYY)(1 YYYYY) 18

19 How Much Redundancy? Yield analysis and yield learning Example: negative binomial memory yield model 19

20 Built-In Repair Analysis Spare rows and columns NP complete Various algorithms and EDA tools exist 20

21 Special Self-Repair Techniques for Caches Caches: affects performance, NOT functionality Block address Tag Index Block offset V Tag Data / ECC Decoder =? Hit/miss Data out 21

22 Cache Line Disable/Delete Exist in commercial systems Intel [Chang 07], IBM [Sanda 08], etc. Block address Tag Decoder Index Block offset Fault-tolerance bit V FT Tag Data / ECC =? Hit/miss Data out 22

23 Setting the FT Bit Cache line read ECC calculate and check ECC error? No Yes Correct and flag error; log faulty location Done ECC error at same location already occurred within a time threshold? No Yes Set FT bit 23

24 PADded Cache Reconfigurable cache with programmable decoder Conventional decoder 24

25 PADded Cache Reconfigurable cache with programmable decoder Additional tag bits needed Programmable decoder ~ 5% area cost [Shirvani 99] 16KB, direct-mapped, 1-level programmability Reduce cost at the price of granularity 25

26 Cache Line Delete vs. PADded Cache 26

27 Outline Introduction Self-repair techniques Memories Processor cores Uncore components Conclusion 27

28 Core Sparing Utilize multi-/many-core architectures Core disabling also possible Already in commercial products IBM BlueGene/Q Nvidia Geforce Cisco Metro Fine-grained approaches? 28

29 Core Cannibalization Many-core designs with small in-order cores 3.5% area cost (OpenRISC 1200) [Romanescu 08] 29

30 Core Cannibalization: Discussion What about diagnosis? Routing performance impact Additional wire delay pipeline stages Modified branch resolution, bypass logic, etc. Decreased clock frequency Small vs. large number of faulty cores 30

31 Microarchitectural Block Disabling Disable half pipeline way in superscalar designs 12% area cost (includes diagnosis) Issue queue F D R Rd E M W C F D R Rd E M W C [Schuchman 05] 31

32 Microarchitectural Block Disabling Disable half pipeline way in superscalar designs 12% area cost (includes diagnosis) Issue queue F S D R S Rd E M W C * * * * * * * *modified stages F S D R S Rd E M W C * * * * * * * [Schuchman 05] 2 shift stages (S) added and lots design modifications 32

33 Microarchitectural Block Disabling Disable half pipeline way in superscalar designs 12% area cost (includes diagnosis) Issue queue, old half F S D R S Rd E M W C * * * * * * * *modified stages F S D R S Rd E M W C * * * * * * * Issue queue, new half Split issue queue (same for store buffer, not shown) [Schuchman 05] 33

34 Microarchitectural Block Disabling: Discussion Expensive, complex, intrusive Diagnosis logic Reconfiguration logic Coverage issues E.g., fetch stage not covered 34

35 Architectural Core Salvaging Many-core CISC processor designs Application 1 Application n Advanced ISA Basic ISA... Advanced ISA Basic ISA Core 1 Core n [Powell 09] 35

36 Architectural Core Salvaging Many-core CISC processor designs Application 1 Application n Basic uops Basic uops Advanced ISA Basic ISA... Advanced ISA Basic ISA Core 1 Core n [Powell 09] 36

37 Architectural Core Salvaging Many-core CISC processor designs Application 1 Application n Advanced uops Basic uops Advanced ISA Basic ISA... Advanced ISA Basic ISA Core 1 Core n [Powell 09] 37

38 Architectural Core Salvaging Many-core CISC processor designs Application 1 Thread swap Application n Advanced uops Basic uops Advanced ISA Advanced ISA Basic ISA... Basic ISA Core 1 Core n [Powell 09] 38

39 Architectural Core Salvaging Many-core CISC processor designs Application n Thread swap Application 1 Basic uops Advanced uops Advanced ISA Advanced ISA Basic ISA... Basic ISA Core 1 Core n [Powell 09] 39

40 Architectural Core Salvaging: Discussion What about diagnosis Applicability CISC-like architectures Performance Depends Coverage issues ~50% coverage of execution 40

41 Self-Repair for Processor Cores Which technique to choose? Software-assisted techniques? 41

42 Outline Introduction Self-repair techniques Memories Processor cores Uncore components Conclusion 42

43 Uncore Prevalent in SoCs Uncore examples IBM Power 7 Cache / DRAM controller Accelerators I/O interfaces NVIDIA Tegra Cisco Network Processor 43

44 Uncore Self-Repair Essential OpenSPARC T2 Uncore 12% Processor cores 12% Memories 76% Cores, memories, networks-on-chip Many existing techniques 44

45 Key Message [Li ITC13] 20 Existing sparing techniques Chip area impact (%) 0 Self-repair coverage (%) 98 45

46 Key Message [Li ITC13] 20 Existing sparing techniques Chip area impact (%) 7.5 Our techniques Self-repair coverage (%) 98 46

47 Key Message [Li ITC13] 20 Existing sparing techniques Chip area impact (%) 7.5 Performance: 0% (fault-free) 0.3% - 5% (faulty) Power: 3% Our techniques Self-repair coverage (%) 98 47

48 Existing Techniques Inadequate OpenSPARC T2 Uncore 48

49 Existing Techniques Inadequate OpenSPARC T2 Uncore Original Spare 49

50 Existing Techniques Inadequate OpenSPARC T2 Original Spare Steering logic 50

51 Existing Techniques Inadequate OpenSPARC T2 Original Power gating Spare Steering logic 20% chip area cost 51

52 New Uncore Self-Repair Techniques ERRS: Enhanced Resource Reallocation and Sharing SHE: Sparing through Hierarchical Exploration OpenSPARC T2 8 cores, 64 threads 500M transistors 7.5% chip area cost 52

53 Outline Introduction Self-repair for uncore ERRS SHE Conclusion 53

54 Basic Resource Reallocation & Sharing (RRS) Already-existing similar components No spares 3. Reroute Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank control 0 L2 bank control 1 L2 mem 1 1. Fault detected 8 cycle overhead (faulty case only) 2. Resource sharing 54

55 Basic RRS Performance Impact Fault-free scenario Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank MISS! control 0 L2 bank MISS! control 1 L2 mem 1 Fill buffer 55

56 Basic RRS Performance Impact Single faulty component: 70% impact Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank control 0 L2 bank control 1 L2 mem 1 Fill buffer DATADATA 56

57 Enhanced RRS (ERRS) Idea Identify Mitigate No Desirable tradeoff? Analyze Done Yes 57

58 ERRS Mitigates RRS Performance Impact Single faulty component: 3% impact Crossbar blocks 4 cores 4 cores Self-repair control L2 mem 0 L2 bank control 0 L2 bank control 1 L2 mem 1 Fill buffer Fill buffer 58

59 ERRS on OpenSPARC T2 DRAM data return: duplicated DRAM bank control FSM: entries doubled L2 hit processing: duplicated L2 miss fill buffer: entries doubled 3.2% area, 2.7% power 59

60 Performance Evaluation Setup Simulators GEM5 64 single-issue in-order processor cores Simulated CMP 1KB private L1 data and instruction caches 4MByte shared L2 cache (8 banks) 4 DRAM controllers 60

61 ERRS vs. Basic RRS: Performance Impact ERRS: 3% performance impact 3% 1 faulty L2 controller 1 faulty DRAM controller 0.7% Average CPI overhead 0% Basic RRS RRS ERRS 0.0% Basic RRS RRS ERRS 64-core CMP PARSEC benchmark programs 61

62 ERRS vs. Basic RRS: Performance Impact ERRS: 5% performance impact 1 faulty L2 controller 1 faulty DRAM controller 18% 25% Average CPI overhead 0% Basic RRS ERRS 0% 64-core CMP Stressed programs Basic RRS RRS ERRS 62

63 ERRS for Multiple Faulty Components 25 Graceful performance degradation Average CPI impact (%) 0 2 DRAM controllers 4 L2 bank controllers 4 L2 bank controllers + 2 DRAM controllers 2 L2 bank controllers 64-core CMP PARSEC benchmark programs Faulty components 63

64 Which Faults Repairable? delay bridging stuckat All faults inside a component 64

65 Which Faults Not Repairable? Single points of failure Primary inputs Primary outputs Steering logic Metric: self-repair coverage % non-single points of failure 65

66 ERRS Component Self-Repair Coverage 97% 97% 98% 97% 97% 97% 97% 96% 97% 97% 97% 97% 98% 97% 97% 66

67 ERRS Chip Area/Power Impact & Coverage 20 Existing sparing techniques Post-layout chip area impact (%) 3.2 Power: 3% ERRS 0 75 Self-repair coverage (%) 98 67

68 Outline Introduction Self-repair for uncore ERRS SHE Conclusion 68

69 SHE: Sparing through Hierarchical Exploration Component RTL SHE Sparing for component Minimize area cost Identical blocks spare shared Balanced coverage vs. area 69

70 The Design Hierarchy Network controller pio Level 1 rdmc Level 2 chnl_16 chnl_15 chnl_1 Level 3 Lowest level 70

71 Sparing in Different Levels: Coarse-Grained pio Network controller pio pio rdmc rdmc rdmc Original Spare / steering logic Level 2 71

72 Sparing in Different Levels: Mixed Levels Network controller pio rdmc pio pio chnl_16 chnl_15 chnl_1 chnl Level 2 / Level 3 Original Spare / steering logic 72

73 Sparing in Different Levels: Fine-Grained Network controller pio rdmc Original Spare / steering logic Lowest level 73

74 How to Obtain Sweet-Spot? Balanced tradeoff High Area overhead Self-repair coverage Low Deeper in hierarchy Algorithm details in paper 74

75 SHE Results 20 Existing sparing techniques Post-layout chip area impact (%) 3.2 ERRS 0 75 Self-repair coverage (%) 98 75

76 SHE Results 20 Existing sparing techniques Post-layout chip area impact (%) 7.5 SHE power impact: 0.2% SHE performance impact: 0% ERRS + SHE 0 Self-repair coverage (%) 98 76

77 What About Diagnosis? Traditional fault diagnosis difficult Effect-cause: infeasible RTL unavailable Cause-effect: impractical 7 petabyte fault dictionary [Beckler ITC12] 77

78 Fault Diagnosis Simplified CASP [Li DATE08, VTS10] Concurrent, Autonomous, Stored test Patterns 78

79 Fault Diagnosis Simplified CASP [Li DATE08, VTS10] Concurrent, Autonomous, Stored test Patterns Blocks = self-repair granularity Block 1 Block n 79

80 Fault Diagnosis Simplified CASP [Li DATE08, VTS10] Concurrent, Autonomous, Stored test Patterns Pass/fail 1 Local test logic Pass/fail n Fail self-repair needed Diagnosis = detection Block 1 Block n = self-repair granularity 80

81 Summary: Self-Repair of Uncore Components 20 Existing sparing techniques Post-layout chip area impact (%) 7.5 Performance: 0% (fault-free) 0.3% - 5% (faulty) Power: 3% ERRS + SHE 3.2 ERRS 0 75 Self-repair coverage (%) 98 81

82 References [Aitken 04] Aitken, R., A Modular Wrapper Enabling High Speed BIST and Repair for Small Wide Memories, Proc. Intl. Test Conf., pp , [Chang 07] Chang, J., et al., The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series, IEEE Journal of Solid-State Circuits, vol. 42, no. 4, pp , [Kurdahi 06] Kurdahi, F.J., et al., System-Level SRAM Yield Enhancement, Proc. Intl. Symp. Quality Electronic Design,

83 References [Li VTS10] Li, Y., et al., Concurrent Autonomous Self-Test for Uncore Components in System-on-Chips, Proc. VLSI Test Symposium, [Li DATE08] Li, Y., S. Makar, and S. Mitra, CASP: Concurrent Autonomous Chip Self-Test using Stored Test Patterns, Proc. Design, Automation, and Test in Europe, pp , [Li ITC 13] Li, Y., et al., Self-Repair of Uncore Components in Robust System-on-Chips: An OpenSPARC T2 Case Study, Proc. IEEE Intl. Test Conf., pp. 1-10,

84 References [Powell 09] Powell, M.D., et al., Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance, Proc. Intl. Symp. on Computer Architecture, pp , [Romanescu 08] Romanescu, B.F., and D.J. Sorin, Core Cannibalization Architecture: Improving Lifetime Chip Performance for Multicore Processors in the Presence of Hard Faults, Proc. Intl. Conf. on Parallel Architectures and Compilation Techniques, pp ,

85 References [Sanda 08] Sanda, P.N, et al., Fault-Tolerant Design of the IBM Power6 Microprocessor, IEEE Micro, vol. 28, no. 2, pp , [Schuchman 05] Schuchman, E., and T.N. Vijaykumar, Rescue: A Microarchitecture for Testability and Defect Tolerance, Proc. Intl. Symp. on Computer Architecture, pp , [Shirvani 99] Shirvani, P.P., and E.J. McCluskey, PADded Cache: A New Fault-Tolerance Technique for Cache Memories, Proc. VLSI Test Symp., pp ,

86 References [Shivakumar 03] Shivakumar, P., et al., Exploiting Microarchitectural Redundancy for Defect Tolerance, Proc. Intl. Conf. on Computer Design, pp , [Sridharan 12] Sridharan, V., et al., A Study of DRAM Failures in the Field, Proc. Intl. Conf. High Performance Computing, Networking, Storage and Analysis, pp. 1-11, [Schuchman 05] Schuchman, E., and T.N. Vijaykumar, Rescue: A Microarchitecture for Testability and Defect Tolerance, Proc. Intl. Symp. on Computer Architecture, pp ,

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance

More information

Puey Wei Tan. Danny Lee. IBM zenterprise 196

Puey Wei Tan. Danny Lee. IBM zenterprise 196 Puey Wei Tan Danny Lee IBM zenterprise 196 IBM zenterprise System What is it? IBM s product solutions for mainframe computers. IBM s product models: 700/7000 series System/360 System/370 System/390 zseries

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Ultra Low-Cost Defect Protection for Microprocessor Pipelines

Ultra Low-Cost Defect Protection for Microprocessor Pipelines Ultra Low-Cost Defect Protection for Microprocessor Pipelines Smitha Shyam Kypros Constantinides Sujay Phadke Valeria Bertacco Todd Austin Advanced Computer Architecture Lab University of Michigan Key

More information

AN OPTIMAL APPROACH FOR TESTING EMBEDDED MEMORIES IN SOCS

AN OPTIMAL APPROACH FOR TESTING EMBEDDED MEMORIES IN SOCS International Journal of Engineering Inventions ISSN: 2278-7461, www.ijeijournal.com Volume 1, Issue 8 (October2012) PP: 76-80 AN OPTIMAL APPROACH FOR TESTING EMBEDDED MEMORIES IN SOCS B.Prathap Reddy

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141

EECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141 EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role

More information

Write only as much as necessary. Be brief!

Write only as much as necessary. Be brief! 1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached

More information

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

POWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX

POWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

A Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies

A Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies A Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies Abstract: Performance degradation tolerance (PDT) has been shown to be able to effectively improve the yield, reliability,

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs

Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs Shyue-Kung Lu and Shih-Chang Huang Department of Electronic Engineering Fu Jen Catholic University Hsinchuang, Taipei, Taiwan 242, R.O.C.

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COSC 6385 Computer Architecture - Memory Hierarchies (III) COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory

More information

Yield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM

Yield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM Yield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM Markus Rudack Dirk Niggemeyer Laboratory for Information Technology Division Design & Test University of Hannover

More information

One-Level Cache Memory Design for Scalable SMT Architectures

One-Level Cache Memory Design for Scalable SMT Architectures One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Lecture-14 (Memory Hierarchy) CS422-Spring

Lecture-14 (Memory Hierarchy) CS422-Spring Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect

More information

Test/Repair Area Overhead Reduction for Small Embedded SRAMs

Test/Repair Area Overhead Reduction for Small Embedded SRAMs Test/Repair Area Overhead Reduction for Small Embedded SRAMs Baosheng Wang and Qiang Xu ATI Technologies Inc., 1 Commerce Valley Drive East, Markham, ON, Canada L3T 7X6, bawang@ati.com Dept. of Computer

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information

An Area-Efficient BIRA With 1-D Spare Segments

An Area-Efficient BIRA With 1-D Spare Segments 206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 1, JANUARY 2018 An Area-Efficient BIRA With 1-D Spare Segments Donghyun Kim, Hayoung Lee, and Sungho Kang Abstract The

More information

Hardware Design I Chap. 10 Design of microprocessor

Hardware Design I Chap. 10 Design of microprocessor Hardware Design I Chap. 0 Design of microprocessor E-mail: shimada@is.naist.jp Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory

More information

Rescue: A Microarchitecture for Testability and Defect Tolerance

Rescue: A Microarchitecture for Testability and Defect Tolerance Rescue: A Microarchitecture for Testability and Defect Tolerance Ethan Schuchman and T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University {erys, vijay}@purdue.edu Abstract

More information

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN) Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN) Abstract With increasing design complexity in modern SOC design, many memory

More information

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8

More information

EE 457 Unit 7b. Main Memory Organization

EE 457 Unit 7b. Main Memory Organization 1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Bradley F. Dutton, Graduate Student Member, IEEE, and Charles E. Stroud, Fellow, IEEE Dept. of Electrical and Computer Engineering

More information

CHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI

CHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI CHAPTER 2 ARRAY SUBSYSTEMS [2.4-2.9] MANJARI S. KULKARNI OVERVIEW Array classification Non volatile memory Design and Layout Read-Only Memory (ROM) Pseudo nmos and NAND ROMs Programmable ROMS PROMS, EPROMs,

More information

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

International Journal of Digital Application & Contemporary research Website:   (Volume 1, Issue 7, February 2013) Programmable FSM based MBIST Architecture Sonal Sharma sonal.sharma30@gmail.com Vishal Moyal vishalmoyal@gmail.com Abstract - SOCs comprise of wide range of memory modules so it is not possible to test

More information

Fault Tolerance in Multicore Processors With Reconfigurable Hardware Unit

Fault Tolerance in Multicore Processors With Reconfigurable Hardware Unit Fault Tolerance in Multicore Processors With Reconfigurable Hardware Unit Rajesh S, Vinoth Kumar C, Srivatsan R, Harini S and A.P. Shanthi Department of Computer Science & Engineering, College of Engineering

More information

ECE 152 Introduction to Computer Architecture

ECE 152 Introduction to Computer Architecture Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course

More information

Constructing Virtual Architectures on Tiled Processors. David Wentzlaff Anant Agarwal MIT

Constructing Virtual Architectures on Tiled Processors. David Wentzlaff Anant Agarwal MIT Constructing Virtual Architectures on Tiled Processors David Wentzlaff Anant Agarwal MIT 1 Emulators and JITs for Multi-Core David Wentzlaff Anant Agarwal MIT 2 Why Multi-Core? 3 Why Multi-Core? Future

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Diagnostic Testing of Embedded Memories Using BIST

Diagnostic Testing of Embedded Memories Using BIST Diagnostic Testing of Embedded Memories Using BIST Timothy J. Bergfeld Dirk Niggemeyer Elizabeth M. Rudnick Center for Reliable and High-Performance Computing, University of Illinois 1308 West Main Street,

More information

The PA 7300LC Microprocessor: A Highly Integrated System on a Chip

The PA 7300LC Microprocessor: A Highly Integrated System on a Chip The PA 7300LC Microprocessor: A Highly Integrated System on a Chip A collection of design objectives targeted for low-end systems and the legacy of an earlier microprocessor, which was designed for high-volume

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

Lecture 11 Cache. Peng Liu.

Lecture 11 Cache. Peng Liu. Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative

More information

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors) 1 COEN-4730 Computer Architecture Lecture 12 Testing and Design for Testability (focus: processors) Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University 1 Outline Testing

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Introduction to Robust Systems

Introduction to Robust Systems Introduction to Robust Systems Subhasish Mitra Stanford University Email: subh@stanford.edu 1 Objective of this Talk Brainstorm What is a robust system? How can we build robust systems? Robust systems

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

HY225 Lecture 12: DRAM and Virtual Memory

HY225 Lecture 12: DRAM and Virtual Memory HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access

More information

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved. Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Internal Memory http://www.yildiz.edu.tr/~naydin 1 2 Outline Semiconductor main memory Random Access Memory

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections ) Lecture 8: Virtual Memory Today: DRAM innovations, virtual memory (Sections 5.3-5.4) 1 DRAM Technology Trends Improvements in technology (smaller devices) DRAM capacities double every two years, but latency

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Memory - I. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Memory - I Bei Yu CEG3420 L08.1 Spring 2016 Outline q Why Memory Hierarchy q How Memory Hierarchy? SRAM (Cache) & DRAM (main memory) Memory System

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM B.HARIKRISHNA 1, DR.S.RAVI 2 1 Sathyabama Univeristy, Chennai, India 2 Department of Electronics Engineering, Dr. M. G. R. Univeristy, Chennai,

More information

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds. Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum

More information

ARCHITECTURE DESIGN FOR SOFT ERRORS

ARCHITECTURE DESIGN FOR SOFT ERRORS ARCHITECTURE DESIGN FOR SOFT ERRORS Shubu Mukherjee ^ШВпШшр"* AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO T^"ТГПШГ SAN FRANCISCO SINGAPORE SYDNEY TOKYO ^ P f ^ ^ ELSEVIER Morgan

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

and self-repair for memories, and (iii) support for

and self-repair for memories, and (iii) support for A BIST Implementation Framework for Supporting Field Testability and Configurability in an Automotive SOC Amit Dutta, Srinivasulu Alampally, Arun Kumar and Rubin A. Parekhji Texas Instruments, Bangalore,

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

A Comprehensive Analytical Performance Model of DRAM Caches

A Comprehensive Analytical Performance Model of DRAM Caches A Comprehensive Analytical Performance Model of DRAM Caches Authors: Nagendra Gulur *, Mahesh Mehendale *, and R Govindarajan + Presented by: Sreepathi Pai * Texas Instruments, + Indian Institute of Science

More information

RE-CONFIGURABLE BUILT IN SELF REPAIR AND REDUNDANCY MECHANISM FOR RAM S IN SOCS Ravichander Bogam 1, M.Srinivasa Reddy 2 1

RE-CONFIGURABLE BUILT IN SELF REPAIR AND REDUNDANCY MECHANISM FOR RAM S IN SOCS Ravichander Bogam 1, M.Srinivasa Reddy 2 1 RE-CONFIGURABLE BUILT IN SELF REPAIR AND REDUNDANCY MECHANISM FOR RAM S IN SOCS Ravichander Bogam 1, M.Srinivasa Reddy 2 1 Department of Electronics and Communication Engineering St. Martins Engineering

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu

CENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as

More information

LECTURE 10: Improving Memory Access: Direct and Spatial caches

LECTURE 10: Improving Memory Access: Direct and Spatial caches EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses

More information

Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)"

Lecture 28 Multicore, Multithread Suggested reading: (H&P Chapter 7.4) Lecture 28 Multicore, Multithread" Suggested reading:" (H&P Chapter 7.4)" 1" Processor components" Multicore processors and programming" Processor comparison" CSE 30321 - Lecture 01 - vs." Goal: Explain

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

PACE: Power-Aware Computing Engines

PACE: Power-Aware Computing Engines PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious

More information

Architecture as Interface

Architecture as Interface Architecture as Interface André DeHon Friday, June 21, 2002 Previously How do we build efficient, programmable machines How we mix Computational complexity W/ physical landscape

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit

More information

File Memory for Extended Storage Disk Caches

File Memory for Extended Storage Disk Caches File Memory for Disk Caches A Master s Thesis Seminar John C. Koob January 16, 2004 John C. Koob, January 16, 2004 File Memory for Disk Caches p. 1/25 Disk Cache File Memory ESDC Design Experimental Results

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Structure of Computer Systems

Structure of Computer Systems 222 Structure of Computer Systems Figure 4.64 shows how a page directory can be used to map linear addresses to 4-MB pages. The entries in the page directory point to page tables, and the entries in a

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Huiyang Zhou School of Computer Science University of Central Florida New Challenges in Billion-Transistor Processor Era

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Efficient BISR strategy for Embedded SRAM with Selectable Redundancy using MARCH SS algorithm. P. Priyanka 1 and J. Lingaiah 2

Efficient BISR strategy for Embedded SRAM with Selectable Redundancy using MARCH SS algorithm. P. Priyanka 1 and J. Lingaiah 2 Proceedings of International Conference on Emerging Trends in Engineering & Technology (ICETET) 29th - 30 th September, 2014 Warangal, Telangana, India (SF0EC009) ISSN (online): 2349-0020 Efficient BISR

More information

Core Fusion: Accommodating Software Diversity in Chip Multiprocessors

Core Fusion: Accommodating Software Diversity in Chip Multiprocessors Core Fusion: Accommodating Software Diversity in Chip Multiprocessors Authors: Engin Ipek, Meyrem Kırman, Nevin Kırman, and Jose F. Martinez Navreet Virk Dept of Computer & Information Sciences University

More information

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011 5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer

More information