Answers to comments from Reviewer 1

Size: px
Start display at page:

Download "Answers to comments from Reviewer 1"

Transcription

1 Answers to comments from Reviewer 1 Question A-1: Though I have one correction to authors response to my Question A-1,... parity protection needs a correction mechanism (e.g., checkpoints and roll-backward recovery), which can incur significant overheads in terms of runtime and energy consumption due to the recovery mechanism once an error is detected, is not accurate. I suggest the authors make it clear that the reliability mechanism supporting error recovery may incur significant performance and energy overheads, not the error recovery itself since the actual soft error rate is extremely low and the error recovery is merely invoked during the execution. Answer A-1: Yes, reviewer is right. What we liked to point out in the last response was that parity protection needs a correction mechanism such as checkpoints and roll-backward recovery and the mechanism supporting error recovery may incur high overheads in terms of performance and energy consumption as reviewer corrected. Question A-2: One more comment: the comparison results in Section 7.2 and Section 7.3 show that PPC is in difficulty to compete with ECCs in terms of vulnerability reduction (around 50% for PPC vs. 99% for ECC), even though the incurred performance and energy overheads are much less. Will this kind of design trade-offs be acceptable in real system design? Please justify the applicability of your proposed PPCs with potential applications. Answer A-2: One of potential applications for PPCs in real system design is portable video surveillance systems in hazardous areas. Mobile embedded systems such as smart phones, PDAs, and portable systems demand energy efficiency mainly because they are running with limited battery. Also, the reliability is becoming important as these mobile devices are running close to the human since the failure of the functionality in these devices may cause catastrophic results. Thus, designers of mobile embedded systems can offer design space to tradeoff the reliability for performance/energy costs or vice versa. These examples include the portable video surveillance systems installed in hazardous areas. Since it is almost impossible to physically replace these systems while running video surveillance as long as possible, they can tradeoff reliability (in particular, less reliable design for architectures dealing with video data itself) for performance/power overheads.

2 Answers to comments from Reviewer 2 Comment B-1: The authors have addressed most of my prior concerns satisfactorily and thus I would like to recommend acceptance of this manuscript. Answer B-1: Thank you for your comments and concerns.

3 Answers to comments from Reviewer 3 Question C-1: In my opinion, you emphasized not methodological but algorithmic aspect in this paper too much. The algorithmic part, however, is not novel because your page assignment problem is categorized into the classical assignment combinatorial problem, for which many people have studied their algorithms. Please try to emphasize the methodological aspect of your research. Or please compare your algorithm with the other ones comprehensively if you still want to emphasize the algorithmic aspect. More detail of the PPC architecture should be explained in this paper. This is more important than discussing the ad-hoc algorithms. Please show some quantitative evaluations on cycle time of the PPC. More concretely, please show several graphs which show the cache size vs cycle time for both a protected part and an unprotected part of the PPC. I think your PPC architecture causes some performance overhead over the TLB. The quantitative evaluation on TLB should also be shown in this paper. Without the quantitative discussion on the PPC, your paper would not be helpful to designers because they could not decide the optimal sizes of unprotected and protected caches for their design. Figure 53 in the following paper would be helpful in your plotting graphs on the cache size vs cycle time. Steven J.E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for On-Chip Caches," WRL Research Report 93/5. Answer C-1: Thank you for your suggestion. The main focus of this paper is to propose approaches for interesting partitions of general applications among an unprotected cache and a protected cache in a PPC architecture, which has been already published in our previous work [Lee et al. CASES 06 and TVLSI 09]. Our previous work has been focused on the architectural novelty and tradeoffs among performance, energy consumption, and reliability for multimedia applications where obvious partitioning approach exists for PPC architectures such as multimedia data and non-multimedia data. However, this paper presents that PPC architecture is effective not only for multimedia applications but also for general applications with the proposed algorithms for data and instruction partitions in PPCs. Since we have published comprehensive experimental results in terms of performance (runtime), power consumption overhead, reliability (failure rate), and area penalty in our previous papers and technical report, this article emphasizes the effectiveness of our proposed algorithms to partition data and instruction in our previously proposed PPC architecture for general applications including multimedia applications. Further, this article expands the effectiveness of PPC architectures to the instruction caches as well. In this answering document, we provide all our experimental results and comparisons of ECC (Error Correction Code)-protected cache, unprotected (plain or normal) cache, and our PPC architecture in terms of cache access time, power consumption, and area with different cache sizes as the reviewer suggested. Figure 1 shows the cache area in cm 2 for unprotected caches and ECC-protected caches. All results are from Cacti 3.2 with the following parameters: (i) block size is set to 32 Byte for unprotected cache and 38 Byte for ECC-protected cache, (ii) set-associativity is set to 4, (iii) technology is set to 0.18 micro-

4 meter, and (iv) VDD is set to 1.7 Volt. We considered a Hamming Code (32, 38) as our ECC coding, and it demands extra 6 bits for 32 bit protection. This ratio is why we have set 38 Byte for ECC-protected cache, i.e., 6 Byte is assigned for control bits and 32 Byte for data bits. This 6 Byte overhead causes 22% area increase of ECC-protected caches on average (up to 59%) as compared to the area of unprotected caches as shown Figure 1. Note that these overheads result from storing 6 Byte control codes, not the logic of coder and decoder. Figure 1 Cache areas of unprotected caches and protected caches To estimate the area overhead due to ECC codes, we have implemented a Hamming code (32, 38) in VHDL, synthesized it using the Synopsys design compiler with lsi10k libraries of 0.5 micro-meter technology, and scaled it to 0.55 mm 2 for 0.18 micro-meter technology. For PPC architecture, we considered 16:1 ratio between the unprotected cache and the protected cache to the overhead equal to or less than that of only ECC-protected caches. For example, we selected 32 KB unprotected cache and 2 KB protected cache (with ECC). Figure 2 shows the overall comparison of cache areas for the unprotected cache, the ECC-protected cache, and PPC architecture when PPC and ECC-protected cache include this area overhead. The ECC-protected caches incur high overheads while the area of PPC architecture is located between those of the ECC-protected caches and of the unprotected caches. For example, as compared to 32 KB unprotected cache, 32 KB ECC-protected cache incurs about 22% area overhead while PPC (32 KB unprotected cache and 2 KB ECC-protected cache) incurs just 7%, which is about 12% reduction compared to the area of 32 KB ECC-protected cache.

5 Figure 2 Cache areas among unprotected caches, ECC-protected caches, and PPCs Figure 3 shows the access latencies of the unprotected caches and ECC-protected caches in ns over variable cache sizes. All the parameters and configurations are the same as the cache area evaluation. The increased block size (6 Byte from 32 Byte to 38 Byte) in ECC-protected caches causes about 5% overhead on average in terms of cache access latency as compared to that of the unprotected caches. Note that the cache access latency of ECC-protected caches (larger block size) is smaller than that of unprotected caches in several sizes such as 128 KB, 64 KB, 32 KB, 2 KB, etc. Also, implementing speculative operation of ECC does not increase the overall access latency of caches. Thus, we consider the cache access latency of ECC-protected caches identical to that of unprotected caches, which is 1 cycle. Note that we consider that the Instruction Per Cycle (IPC) of the processor is 1 at 400 MHz. Figure 3 Cache access latencies of the unprotected caches and ECC-protected caches

6 Figure 4 shows the power consumption of the unprotected caches and ECC-protected caches in njoules over variable cache sizes. All the parameters and configurations are the same as the cache area evaluation. As shown in cache access latency evaluation, power consumption comparison with Cacti 3.2 doesn t show high overheads due to 6 Byte storage overhead in ECC-protected caches. However, the ECC-protection incurs high overheads in terms of energy consumption, while access latency of the coding and decoding for ECC protection can be optimized or minimized. The experimental estimation of ECC with the Synopsys design compiler shows 0.39 njoules for decoding and 0.22 njoules for encoding as presented in the 1 st revision answering sheet. The average overhead of energy consumption for page partitions discovered by proposed approaches in the article is about 7% for data PPCs and 13% for instruction PPCs as compared to those of the unprotected caches over benchmarks. Figure 4 Power consumption of unprotected caches and ECCprotected caches Question C-2: The reason why I asked Question C-3 is that cache access time (ps/cycle) affects both entire execution time and vulnerability. I cannot understand execution time exactly if you don't show any concrete values on cache access time (ps/cycle). As an answer to my question (C-3), you have shown Figure 2 in which cache access time should be shown because cache access time affects an entire runtime. It is an interesting result that Figure 2 in the reply letter shows that sizing an unprotected cache reduces vulnerability drastically. As a matter of fact, vulnerability seems to be linear to the size of unprotected cache. Sizing the size of unprotected cache is an effective approach to reduce vulnerability especially in a large-cache configuration. How do you justify your approach, comparing with sizing the size of unprotected cache? Answer C-2: Reviewer is right. Figure 2 in the previous reply letter shows interesting results that sizing of unprotected cache is an effective approach to reduce the vulnerability. Indeed, Kim et al. [KimDATE06]

7 presented the impacts of cache size on performance, energy consumption, and reliability. Sizing of (unprotected) cache is an effective tradeoff technique since the decrease of cache size can reduce the vulnerability while incurring high rate of cache misses, causing high overheads in terms of performance and energy consumption due to frequent accesses to off-chip memory. On the other hand, the increase of cache size raises the vulnerability while improving performance with higher energy consumption of caches and lower energy consumption of memory access. Our approach is orthogonal to sizing of unprotected cache, i.e., we can combine PPC and the sizing approach to further tradeoff among performance, energy consumption, and vulnerability. Question C-3: Several definitions and lemmas are written in Page 50. These lemmas have been introduced and proved by not you but the others, I think. These are actually written in Statistics textbooks. If this is true, please refer to an appropriate reference. Even if these lemmas are proved by you, it is unclear what you want to discuss with the lemmas. You implicitly assumed that a binomial distribution is approximated by a normal distribution. Though it is well known that a normal distribution can be an approximation to a binomial distribution for a large number of samples, you should mention that if you include the first lemma in your paper. In my opinion, the first lemma is unnecessary, though. More importantly, masking probability becomes a normal distribution, which is quite important from the aspect of IC reliability. You must explain more about probability, convexity, and skewness if you discuss statistics on masking effect. It is quite important how a real distribution of masking probability is, and how much masking probability is for typical microprocessors and benchmark programs. If what you wanted to do were to estimate how many times simulations were required, it would be meaningless without a real value of "p" which was a masking probability. Real distribution in masking upsets is much more interesting to the readers than the well-known lemmas. Answer C-3: The reviewer is right. We have just applied statistics lemmas from the textbook. The only reason for the existence of Section 4.1 was to give the readers a sense of how computing-intensive simulation-based techniques for estimating reliability are. This motivates for the use of architecturelevel metrics for reliability estimation even the architecture-level metrics are not ours. Indeed, it was proposed in MICRO 03 by Mukherjee et al. We just use the existing architecture-level reliability metric to evaluate page partitions onto the previously proposed PPC (Partially Protected Caches) architecture. We understand the reviewer s concern that this is easy extension of textbook material, and is also clear intuitively. Thus, we have removed Section 4.1 from this article.

8 Question C-4: From Pages 50 to 52, you have shown a vulnerability estimation algorithm which is basically same as the Asadi's work except that you estimate bytewise vulnerability. If you insist on the novelty in the byte-wise vulnerability estimation, please compare byte-wise and word-wise vulnerability estimations with some quantitative evaluation. Extention of wordwise to bytewise one is not so novel. I wonder whether bytewise estimation is better than wordwise one because an entire cache line, whose size is typically not a byte but four or eight bytes, is written out to the lower hierarchy of memory. Bytewise estimation is effective in the write-through policy. You seem not to assume the write-through policy because you mentioned "dirty". I'm not sure whether or not you should include the text in this paper as if the work was done by you. If I were you, I would just summarize the Asadi's work as their work (NOT MINE) with less space. Answer C-4: We do not claim that byte-level vulnerability is novel. We added Fig. 3. and the comprehensive idea to estimate vulnerability in the paper since we like to make our vulnerability metric self-contained as one of reviewers suggested in the prior revision. However, our comprehensive bytelevel vulnerability metric shows experimental results closer to the failure rate than Asadi s word-level critical time. Figure 5 shows these results. X-axis in Figure 5 represents the cache size and Y-axis represents the critical time of Asadi s work, vulnerability of ours, and the failure rates. In lower size of caches, word-level critical time captures well but it loses the accuracy in higher size of caches. For example, word-level critical time estimates the half of failure rate while our byte-level vulnerability is closer to the failure rate than word-level critical time. These results are because our byte-level vulnerability captures each byte vulnerable time (to be closer to the failure rate) and considers more comprehensive cases. Figure 5 Comparison of Vulnerability, Critical Time, and Failure Rate Question C-5: I don't understand your profiling strategy in Page 53 very much. I think that a combination of pages assigned to a cache affects its behavior as well as critical time. Is the interaction between pages negligible regarding vulnerability? You should show profiling results other than "Blowfish

9 Decryption" in order to justify your experiments. Assuming that the number of pages is N, there are 2^N page assignments for the PPC. There are about 50 pages in the example and 2^50 assignments theoretically. I can understand that you wanted to avoid the impossible number of simulations. However, you should mention the limitation of your experiments if you take approximate vulnerabilities. Answer C-5: Figures 6 to 14 below show profiling results with other benchmarks. As shown in Fig. 7. in the paper, the following figures present two observations: (i) the vulnerability decreases when each page is mapped from the unprotected cache to the protected cache in a PPC in the order of page vulnerability and (ii) page partitions significantly affect the performance. We believe that we mentioned the limitation of simulations for all possible combinations. This limitation applies for our approach as well. We did not claim that ours is the best out of all possible combinations of pages. The goal of our approaches is to efficiently figure out the interesting page assignments to two caches in a PPC in terms of vulnerability with least overheads of performance and energy consumption than those of random simulations and genetic algorithms. Figure 6 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Blowfish Encryption)

10 Figure 7 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (CRC) Figure 8 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (djpeg)

11 Figure 9 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (susan edges) Figure 10 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (FFT)

12 Figure 11 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Rijndael Decryption)

13 Figure 12 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Rijndael Encryption) Figure 13 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (SHA)

14 Figure 14 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Stringsearch) Question C-6: You showed several experimental results in which you compared your algorithms with the Monte Carlo method and Genetic Algorithm. I think that is unfair and intentional because it is obvious that the Monte Carlo method does not converge quickly. Experimental results by the MC is unnecessary. It is common to compare a newly-proposed algorithm with simulated annealing and genetic algorithm. If you fairly compare your algorithms with the others, you should show the results by the simulated annealing, which is a common optimization metaheuristic method. This kind of an assignment problem can also be solved by integer linear programming. You should try to solve your problem as an ILP by using a commercial solver such as ILOG CPLEX, Dash Optimization Xpress-MP, and LINDO Systems LINDO. I think your problem might be solved with the evaluation version of LINDO, which is available for free. GLPK and LPSOLVE are also available for free, which are not so fast that you might not be able to use it as a baseline. Answer C-6: Thank you for your suggestion. However, we think the MC method and its evaluation still should be in the paper since the MC method is a typical random approach and we show this typical random approach does not guarantee the effectiveness of randomly explored partitions for our PPC architecture. We also compare our approach with a genetic algorithm to find the best partitions in terms of vulnerability with minimal overheads of performance and energy consumption, and our approach is more effective and efficient than a genetic algorithm. We are definitely interested in ILP approach to find out interesting partitions in terms of vulnerability, performance, and energy consumption in the next work. We already have some preliminary experimental results about instruction PPC architecture, in particular, and ILP approach will be evaluated as a main comparison.

15 Question C-7: You examined both vulnerability and runtime in a PPC structure. I think that you should show some circuit delay, cache access time in ps, with cache parameters changed. This is essential in justifying the effectiveness of the PPC. I think you should examine vulnerability, runtime, chip area of the other cache structures: (i) SEC-DED-protected and non-hybrid cache for various cache sizes, (ii) parityprotected and non-hybrid cache for various cache sizes, and (iii) plain and non-hybrid cache for various cache sizes. It is quite important to justify the advantages of the PPC by showing quantitative values in the non-hybrid caches and comparing vulnerability, runtime, and chip area of the PPC with those of the other cache structures. Answer C-7: Thank you for your suggestion. The main focus of this paper is to propose approaches for interesting partitions among an unprotected cache and a protected cache in a PPC architecture, which has been already published in our previous work [Lee et al. CASES 06 and TVLSI 09]. We provide all our experimental results and comparisons of ECC-protected cache, unprotected (plain or normal) cache, and our PPC architecture in terms of cache access time, power consumption, and area with different cache sizes at Answer C-1. We excluded parity-protected caches mainly because it only detects an error, not correct it. Question C-8: Is the energy consumption model in Page 59 correct? Let's think about a write miss on which a clean line is overwritten with a datum. Writing a datum onto a clean cache line does not cause any eviction. Is decoding necessary in writing a datum onto a clean cache line? Only checking a dirty bit would be enough. If writing a datum onto a clean cache line is negligible, mention that. Answer C-8: It is necessary to decode the data in writing a datum onto a clean cache line. Checking a dirty bit doesn t tell us whether a soft error occurs or not on this cache line. Decoding a cache line with a Hamming code tells whether a soft error occurs or not whenever a write operation happens. Question C-9: The followings are minor comments. - "projected" in line 2 in page 42 should be followed by "to". - The period after "signal interference" in page 42 is unnecessary. - You should mention the parity coding around the description on SEC-DED. - "Luccetti" should be "Lucchetti" in Page 47. Answer C-9: Thank the reviewer for these corrections. We have updated what the reviewer pointed out.

Data Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures

Data Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures Data Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures Kyoungwoo Lee, Aviral Shrivastava, Nikil Dutt, and Nalini Venkatasubramanian Abstract Exponentially increasing

More information

Flexible Cache Error Protection using an ECC FIFO

Flexible Cache Error Protection using an ECC FIFO Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead

More information

Partially Protected Caches to Reduce Failures due to Soft Errors in Multimedia Applications

Partially Protected Caches to Reduce Failures due to Soft Errors in Multimedia Applications Partially Protected Caches to Reduce Failures due to Soft Errors in Multimedia Applications Kyoungwoo Lee 1, Aviral Shrivastava 2, Ilya Issenin 1, Nikil Dutt 1, and Nalini Venkatasubramanian 1 1 Department

More information

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various

More information

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Dan Nicolaescu Alex Veidenbaum Alex Nicolau Dept. of Information and Computer Science University of California at Irvine

More information

A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction

A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction Hamed Farbeh, Student Member, IEEE, Nooshin Sadat Mirzadeh, Nahid Farhady Ghalaty, Seyed-Ghassem Miremadi, Senior Member, IEEE, Mahdi

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department SIP: Speculative Insertion Policy for High Performance Caching Hongil Yoon Tan Zhang Mikko H. Lipasti Technical Report #1676 June 2010 SIP: Speculative Insertion Policy for

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Selective Fill Data Cache

Selective Fill Data Cache Selective Fill Data Cache Rice University ELEC525 Final Report Anuj Dharia, Paul Rodriguez, Ryan Verret Abstract Here we present an architecture for improving data cache miss rate. Our enhancement seeks

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection

More information

Available online at ScienceDirect. Procedia Technology 25 (2016 )

Available online at  ScienceDirect. Procedia Technology 25 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Technology 25 (2016 ) 544 551 Global Colloquium in Recent Advancement and Effectual Researches in Engineering, Science and Technology (RAEREST

More information

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal

More information

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication Khanh N. Dang and Xuan-Tu Tran Email: khanh.n.dang@vnu.edu.vn VNU Key Laboratory for Smart Integrated Systems

More information

A Low-Power ECC Check Bit Generator Implementation in DRAMs

A Low-Power ECC Check Bit Generator Implementation in DRAMs 252 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS A Low-Power ECC Check Bit Generator Implementation in DRAMs Sang-Uhn Cha *, Yun-Sang Lee **, and Hongil Yoon * Abstract

More information

Threshold-Based Markov Prefetchers

Threshold-Based Markov Prefetchers Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this

More information

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck

More information

[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY SYSTEMATIC ERROR-CORRECTING CODES IMPLEMENTATION FOR MATCHING OF DATA ENCODED M.Naga Kalyani*, K.Priyanka * PG Student [VLSID]

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Jian Chen, Ruihua Peng, Yuzhuo Fu School of Micro-electronics, Shanghai Jiao Tong University, Shanghai 200030, China {chenjian,

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Instruction Cache Energy Saving Through Compiler Way-Placement

Instruction Cache Energy Saving Through Compiler Way-Placement Instruction Cache Energy Saving Through Compiler Way-Placement Timothy M. Jones, Sandro Bartolini, Bruno De Bus, John Cavazosζ and Michael F.P. O Boyle Member of HiPEAC, School of Informatics University

More information

NAME: Problem Points Score. 7 (bonus) 15. Total

NAME: Problem Points Score. 7 (bonus) 15. Total Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 NAME: Problem Points Score 1 40

More information

Limiting the Number of Dirty Cache Lines

Limiting the Number of Dirty Cache Lines Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

More information

Technique to Mitigate Soft Errors in Caches with CAM-based Tags

Technique to Mitigate Soft Errors in Caches with CAM-based Tags THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. Technique to Mitigate Soft Errors in Caches with CAM-based Tags Luong D. HUNG, Masahiro GOSHIMA, and Shuichi

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

SFS: Random Write Considered Harmful in Solid State Drives

SFS: Random Write Considered Harmful in Solid State Drives SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea

More information

Evaluating the Effects of Compiler Optimisations on AVF

Evaluating the Effects of Compiler Optimisations on AVF Evaluating the Effects of Compiler Optimisations on AVF Timothy M. Jones, Michael F.P. O Boyle Member of HiPEAC, School of Informatics University of Edinburgh, UK {tjones1,mob}@inf.ed.ac.uk Oğuz Ergin

More information

Caches Concepts Review

Caches Concepts Review Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on

More information

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points

More information

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications

An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications Byung In Moon, Hongil Yoon, Ilgu Yun, and Sungho Kang Yonsei University, 134 Shinchon-dong, Seodaemoon-gu, Seoul

More information

Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD

Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD Soojun Im School of ICE Sungkyunkwan University Suwon, Korea Email: lang33@skku.edu Dongkun Shin School of ICE Sungkyunkwan

More information

250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011

250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011 250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011 Energy-Efficient Hardware Data Prefetching Yao Guo, Member, IEEE, Pritish Narayanan, Student Member,

More information

80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD

80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD 80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY 2011 Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD Soojun Im and Dongkun Shin, Member, IEEE Abstract Solid-state

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY

HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY Proceedings of the 1998 Winter Simulation Conference D.J. Medeiros, E.F. Watson, J.S. Carson and M.S. Manivannan, eds. HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A

More information

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be

More information

Intra-Task Dynamic Cache Reconfiguration *

Intra-Task Dynamic Cache Reconfiguration * Intra-Task Dynamic Cache Reconfiguration * Hadi Hajimiri, Prabhat Mishra Department of Computer & Information Science & Engineering University of Florida, Gainesville, Florida, USA {hadi, prabhat}@cise.ufl.edu

More information

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Prof. Wayne Wolf Overview Why Programmable Media Processors?

More information

Errors. Chapter Extension of System Model

Errors. Chapter Extension of System Model Chapter 4 Errors In Chapter 2 we saw examples of how symbols could be represented by arrays of bits. In Chapter 3 we looked at some techniques of compressing the bit representations of such symbols, or

More information

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,

More information

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture HPCA 18 Reliability-aware Data Placement for Heterogeneous memory Architecture Manish Gupta Ψ, Vilas Sridharan*, David Roberts*, Andreas Prodromou Ψ, Ashish Venkat Ψ, Dean Tullsen Ψ, Rajesh Gupta Ψ Ψ *

More information

An Approach for Adaptive DRAM Temperature and Power Management

An Approach for Adaptive DRAM Temperature and Power Management IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance

More information

On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors

On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors Shuai Wang, Jie Hu, and Sotirios G. Ziavras Department of Electrical and Computer Engineering New Jersey

More information

Duke University Department of Electrical and Computer Engineering

Duke University Department of Electrical and Computer Engineering Duke University Department of Electrical and Computer Engineering Senior Honors Thesis Spring 2008 Proving the Completeness of Error Detection Mechanisms in Simple Core Chip Multiprocessors Michael Edward

More information

Sustainable Computing: Informatics and Systems

Sustainable Computing: Informatics and Systems Sustainable Computing: Informatics and Systems 2 (212) 71 8 Contents lists available at SciVerse ScienceDirect Sustainable Computing: Informatics and Systems j ourna l ho me page: www.elsevier.com/locate/suscom

More information

A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization

A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization Prabhat K. Saraswat, Ghazal Haghani and Appiah Kubi Bernard Advanced Learning and Research Institute,

More information

Towards Optimal Custom Instruction Processors

Towards Optimal Custom Instruction Processors Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors

More information

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative

More information

Tail Latency in ZooKeeper and a Simple Reimplementation

Tail Latency in ZooKeeper and a Simple Reimplementation Tail Latency in ZooKeeper and a Simple Reimplementation Michael Graczyk Abstract ZooKeeper [1] is a commonly used service for coordinating distributed applications. ZooKeeper uses leader-based atomic broadcast

More information

Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs

Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs Hamid R. Zarandi,2, Seyed Ghassem Miremadi, Costas Argyrides 2, Dhiraj K. Pradhan 2 Department of Computer Engineering, Sharif

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T. R.

GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T. R. GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T R FPGA PLACEMENT PROBLEM Input A technology mapped netlist of Configurable Logic Blocks (CLB) realizing a given circuit Output

More information

Understanding The Effects of Wrong-path Memory References on Processor Performance

Understanding The Effects of Wrong-path Memory References on Processor Performance Understanding The Effects of Wrong-path Memory References on Processor Performance Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt The University of Texas at Austin 2 Motivation Processors spend

More information

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COSC 6385 Computer Architecture - Memory Hierarchies (III) COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory

More information

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Subash Chandar G (g-chandar1@ti.com), Vaideeswaran S (vaidee@ti.com) DSP Design, Texas Instruments India

More information

Architecture Tuning Study: the SimpleScalar Experience

Architecture Tuning Study: the SimpleScalar Experience Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.

More information

The Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store

The Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store The Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store Building a new class of H.264 devices without external DRAM Power is an increasingly important consideration

More information

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes E. Jebamalar Leavline Assistant Professor, Department of ECE, Anna University, BIT Campus, Tiruchirappalli, India Email: jebilee@gmail.com

More information

A Formal Verification Methodology for Checking Data Integrity

A Formal Verification Methodology for Checking Data Integrity A Formal Verification Methodology for ing Data Integrity Yasushi Umezawa, Takeshi Shimizu Fujitsu Laboratories of America, Inc., Sunnyvale, CA, USA yasushi.umezawa@us.fujitsu.com, takeshi.shimizu@us.fujitsu.com

More information

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation http://dx.doi.org/10.5573/jsts.2012.12.4.418 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.4, DECEMBER, 2012 Efficient Implementation of Single Error Correction and Double Error Detection

More information

High Performance SMIPS Processor

High Performance SMIPS Processor High Performance SMIPS Processor Jonathan Eastep 6.884 Final Project Report May 11, 2005 1 Introduction 1.1 Description This project will focus on producing a high-performance, single-issue, in-order,

More information

Supplementary Material for The Generalized PatchMatch Correspondence Algorithm

Supplementary Material for The Generalized PatchMatch Correspondence Algorithm Supplementary Material for The Generalized PatchMatch Correspondence Algorithm Connelly Barnes 1, Eli Shechtman 2, Dan B Goldman 2, Adam Finkelstein 1 1 Princeton University, 2 Adobe Systems 1 Overview

More information

Implementing a Statically Adaptive Software RAID System

Implementing a Statically Adaptive Software RAID System Implementing a Statically Adaptive Software RAID System Matt McCormick mattmcc@cs.wisc.edu Master s Project Report Computer Sciences Department University of Wisconsin Madison Abstract Current RAID systems

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN 255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Performance-Aware Speculation Control Using Wrong Path Usefulness Prediction. Chang Joo Lee Hyesoon Kim Onur Mutlu Yale N. Patt

Performance-Aware Speculation Control Using Wrong Path Usefulness Prediction. Chang Joo Lee Hyesoon Kim Onur Mutlu Yale N. Patt Performance-Aware Speculation Control Using Wrong Path Usefulness Prediction Chang Joo Lee Hyesoon Kim Onur Mutlu Yale N. Patt High Performance Systems Group Department of Electrical and Computer Engineering

More information

On the Security of the 128-Bit Block Cipher DEAL

On the Security of the 128-Bit Block Cipher DEAL On the Security of the 128-Bit Block Cipher DAL Stefan Lucks Theoretische Informatik University of Mannheim, 68131 Mannheim A5, Germany lucks@th.informatik.uni-mannheim.de Abstract. DAL is a DS-based block

More information

High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring

High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring Daniel Y. Deng and G. Edward Suh Computer Systems Laboratory, Cornell University Ithaca, New York 14850 {deng, suh}@csl.cornell.edu

More information

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC April 12, 2012 John Wawrzynek Spring 2012 EECS150 - Lec24-hdl3 Page 1 Parallelism Parallelism is the act of doing more than one thing

More information

Pipelined processors and Hazards

Pipelined processors and Hazards Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Survey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed

Survey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed Survey results CS 6354: Memory Hierarchy I 29 August 2016 1 2 Processor/Memory Gap Variety in memory technologies SRAM approx. 4 6 transitors/bit optimized for speed DRAM approx. 1 transitor + capacitor/bit

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

DRAM Tutorial Lecture. Vivek Seshadri

DRAM Tutorial Lecture. Vivek Seshadri DRAM Tutorial 18-447 Lecture Vivek Seshadri DRAM Module and Chip 2 Goals Cost Latency Bandwidth Parallelism Power Energy 3 DRAM Chip Bank I/O 4 Sense Amplifier top enable Inverter bottom 5 Sense Amplifier

More information

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014 120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014 VL-ECC: Variable Data-Length Error Correction Code for Embedded Memory in DSP Applications Jangwon Park,

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Classes of Computers Personal computers General purpose, variety of software

More information

A General Sign Bit Error Correction Scheme for Approximate Adders

A General Sign Bit Error Correction Scheme for Approximate Adders A General Sign Bit Error Correction Scheme for Approximate Adders Rui Zhou and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai,

More information

A Comparison of Capacity Management Schemes for Shared CMP Caches

A Comparison of Capacity Management Schemes for Shared CMP Caches A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip

More information

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015 CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable

More information

Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive

Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive Abstract: A NAND flash memory/storage-class memory (SCM) hybrid solid-state drive (SSD) can

More information

CS3350B Computer Architecture CPU Performance and Profiling

CS3350B Computer Architecture CPU Performance and Profiling CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada

More information

ECEC 355: Cache Design

ECEC 355: Cache Design ECEC 355: Cache Design November 28, 2007 Terminology Let us first define some general terms applicable to caches. Cache block or line. The minimum unit of information (in bytes) that can be either present

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems

An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems [DOI: 10.2197/ipsjtsldm.8.100] Short Paper An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems Takuya Hatayama 1,a) Hideki Takase 1 Kazuyoshi Takagi 1 Naofumi

More information

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs

More information

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng Slide Set 5 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide

More information

Low-Power Data Address Bus Encoding Method

Low-Power Data Address Bus Encoding Method Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,

More information

Two hours. No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date. Time

Two hours. No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date. Time Two hours No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date Time Please answer any THREE Questions from the FOUR questions provided Use a SEPARATE answerbook

More information

Reducing Instruction Fetch Cost by Packing Instructions into Register Windows

Reducing Instruction Fetch Cost by Packing Instructions into Register Windows Reducing Instruction Fetch Cost by Packing Instructions into Register Windows Stephen Hines, Gary Tyson, David Whalley Computer Science Dept. Florida State University November 14, 2005 ➊ Introduction Reducing

More information