Answers to comments from Reviewer 1
|
|
- Kelley Malone
- 5 years ago
- Views:
Transcription
1 Answers to comments from Reviewer 1 Question A-1: Though I have one correction to authors response to my Question A-1,... parity protection needs a correction mechanism (e.g., checkpoints and roll-backward recovery), which can incur significant overheads in terms of runtime and energy consumption due to the recovery mechanism once an error is detected, is not accurate. I suggest the authors make it clear that the reliability mechanism supporting error recovery may incur significant performance and energy overheads, not the error recovery itself since the actual soft error rate is extremely low and the error recovery is merely invoked during the execution. Answer A-1: Yes, reviewer is right. What we liked to point out in the last response was that parity protection needs a correction mechanism such as checkpoints and roll-backward recovery and the mechanism supporting error recovery may incur high overheads in terms of performance and energy consumption as reviewer corrected. Question A-2: One more comment: the comparison results in Section 7.2 and Section 7.3 show that PPC is in difficulty to compete with ECCs in terms of vulnerability reduction (around 50% for PPC vs. 99% for ECC), even though the incurred performance and energy overheads are much less. Will this kind of design trade-offs be acceptable in real system design? Please justify the applicability of your proposed PPCs with potential applications. Answer A-2: One of potential applications for PPCs in real system design is portable video surveillance systems in hazardous areas. Mobile embedded systems such as smart phones, PDAs, and portable systems demand energy efficiency mainly because they are running with limited battery. Also, the reliability is becoming important as these mobile devices are running close to the human since the failure of the functionality in these devices may cause catastrophic results. Thus, designers of mobile embedded systems can offer design space to tradeoff the reliability for performance/energy costs or vice versa. These examples include the portable video surveillance systems installed in hazardous areas. Since it is almost impossible to physically replace these systems while running video surveillance as long as possible, they can tradeoff reliability (in particular, less reliable design for architectures dealing with video data itself) for performance/power overheads.
2 Answers to comments from Reviewer 2 Comment B-1: The authors have addressed most of my prior concerns satisfactorily and thus I would like to recommend acceptance of this manuscript. Answer B-1: Thank you for your comments and concerns.
3 Answers to comments from Reviewer 3 Question C-1: In my opinion, you emphasized not methodological but algorithmic aspect in this paper too much. The algorithmic part, however, is not novel because your page assignment problem is categorized into the classical assignment combinatorial problem, for which many people have studied their algorithms. Please try to emphasize the methodological aspect of your research. Or please compare your algorithm with the other ones comprehensively if you still want to emphasize the algorithmic aspect. More detail of the PPC architecture should be explained in this paper. This is more important than discussing the ad-hoc algorithms. Please show some quantitative evaluations on cycle time of the PPC. More concretely, please show several graphs which show the cache size vs cycle time for both a protected part and an unprotected part of the PPC. I think your PPC architecture causes some performance overhead over the TLB. The quantitative evaluation on TLB should also be shown in this paper. Without the quantitative discussion on the PPC, your paper would not be helpful to designers because they could not decide the optimal sizes of unprotected and protected caches for their design. Figure 53 in the following paper would be helpful in your plotting graphs on the cache size vs cycle time. Steven J.E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for On-Chip Caches," WRL Research Report 93/5. Answer C-1: Thank you for your suggestion. The main focus of this paper is to propose approaches for interesting partitions of general applications among an unprotected cache and a protected cache in a PPC architecture, which has been already published in our previous work [Lee et al. CASES 06 and TVLSI 09]. Our previous work has been focused on the architectural novelty and tradeoffs among performance, energy consumption, and reliability for multimedia applications where obvious partitioning approach exists for PPC architectures such as multimedia data and non-multimedia data. However, this paper presents that PPC architecture is effective not only for multimedia applications but also for general applications with the proposed algorithms for data and instruction partitions in PPCs. Since we have published comprehensive experimental results in terms of performance (runtime), power consumption overhead, reliability (failure rate), and area penalty in our previous papers and technical report, this article emphasizes the effectiveness of our proposed algorithms to partition data and instruction in our previously proposed PPC architecture for general applications including multimedia applications. Further, this article expands the effectiveness of PPC architectures to the instruction caches as well. In this answering document, we provide all our experimental results and comparisons of ECC (Error Correction Code)-protected cache, unprotected (plain or normal) cache, and our PPC architecture in terms of cache access time, power consumption, and area with different cache sizes as the reviewer suggested. Figure 1 shows the cache area in cm 2 for unprotected caches and ECC-protected caches. All results are from Cacti 3.2 with the following parameters: (i) block size is set to 32 Byte for unprotected cache and 38 Byte for ECC-protected cache, (ii) set-associativity is set to 4, (iii) technology is set to 0.18 micro-
4 meter, and (iv) VDD is set to 1.7 Volt. We considered a Hamming Code (32, 38) as our ECC coding, and it demands extra 6 bits for 32 bit protection. This ratio is why we have set 38 Byte for ECC-protected cache, i.e., 6 Byte is assigned for control bits and 32 Byte for data bits. This 6 Byte overhead causes 22% area increase of ECC-protected caches on average (up to 59%) as compared to the area of unprotected caches as shown Figure 1. Note that these overheads result from storing 6 Byte control codes, not the logic of coder and decoder. Figure 1 Cache areas of unprotected caches and protected caches To estimate the area overhead due to ECC codes, we have implemented a Hamming code (32, 38) in VHDL, synthesized it using the Synopsys design compiler with lsi10k libraries of 0.5 micro-meter technology, and scaled it to 0.55 mm 2 for 0.18 micro-meter technology. For PPC architecture, we considered 16:1 ratio between the unprotected cache and the protected cache to the overhead equal to or less than that of only ECC-protected caches. For example, we selected 32 KB unprotected cache and 2 KB protected cache (with ECC). Figure 2 shows the overall comparison of cache areas for the unprotected cache, the ECC-protected cache, and PPC architecture when PPC and ECC-protected cache include this area overhead. The ECC-protected caches incur high overheads while the area of PPC architecture is located between those of the ECC-protected caches and of the unprotected caches. For example, as compared to 32 KB unprotected cache, 32 KB ECC-protected cache incurs about 22% area overhead while PPC (32 KB unprotected cache and 2 KB ECC-protected cache) incurs just 7%, which is about 12% reduction compared to the area of 32 KB ECC-protected cache.
5 Figure 2 Cache areas among unprotected caches, ECC-protected caches, and PPCs Figure 3 shows the access latencies of the unprotected caches and ECC-protected caches in ns over variable cache sizes. All the parameters and configurations are the same as the cache area evaluation. The increased block size (6 Byte from 32 Byte to 38 Byte) in ECC-protected caches causes about 5% overhead on average in terms of cache access latency as compared to that of the unprotected caches. Note that the cache access latency of ECC-protected caches (larger block size) is smaller than that of unprotected caches in several sizes such as 128 KB, 64 KB, 32 KB, 2 KB, etc. Also, implementing speculative operation of ECC does not increase the overall access latency of caches. Thus, we consider the cache access latency of ECC-protected caches identical to that of unprotected caches, which is 1 cycle. Note that we consider that the Instruction Per Cycle (IPC) of the processor is 1 at 400 MHz. Figure 3 Cache access latencies of the unprotected caches and ECC-protected caches
6 Figure 4 shows the power consumption of the unprotected caches and ECC-protected caches in njoules over variable cache sizes. All the parameters and configurations are the same as the cache area evaluation. As shown in cache access latency evaluation, power consumption comparison with Cacti 3.2 doesn t show high overheads due to 6 Byte storage overhead in ECC-protected caches. However, the ECC-protection incurs high overheads in terms of energy consumption, while access latency of the coding and decoding for ECC protection can be optimized or minimized. The experimental estimation of ECC with the Synopsys design compiler shows 0.39 njoules for decoding and 0.22 njoules for encoding as presented in the 1 st revision answering sheet. The average overhead of energy consumption for page partitions discovered by proposed approaches in the article is about 7% for data PPCs and 13% for instruction PPCs as compared to those of the unprotected caches over benchmarks. Figure 4 Power consumption of unprotected caches and ECCprotected caches Question C-2: The reason why I asked Question C-3 is that cache access time (ps/cycle) affects both entire execution time and vulnerability. I cannot understand execution time exactly if you don't show any concrete values on cache access time (ps/cycle). As an answer to my question (C-3), you have shown Figure 2 in which cache access time should be shown because cache access time affects an entire runtime. It is an interesting result that Figure 2 in the reply letter shows that sizing an unprotected cache reduces vulnerability drastically. As a matter of fact, vulnerability seems to be linear to the size of unprotected cache. Sizing the size of unprotected cache is an effective approach to reduce vulnerability especially in a large-cache configuration. How do you justify your approach, comparing with sizing the size of unprotected cache? Answer C-2: Reviewer is right. Figure 2 in the previous reply letter shows interesting results that sizing of unprotected cache is an effective approach to reduce the vulnerability. Indeed, Kim et al. [KimDATE06]
7 presented the impacts of cache size on performance, energy consumption, and reliability. Sizing of (unprotected) cache is an effective tradeoff technique since the decrease of cache size can reduce the vulnerability while incurring high rate of cache misses, causing high overheads in terms of performance and energy consumption due to frequent accesses to off-chip memory. On the other hand, the increase of cache size raises the vulnerability while improving performance with higher energy consumption of caches and lower energy consumption of memory access. Our approach is orthogonal to sizing of unprotected cache, i.e., we can combine PPC and the sizing approach to further tradeoff among performance, energy consumption, and vulnerability. Question C-3: Several definitions and lemmas are written in Page 50. These lemmas have been introduced and proved by not you but the others, I think. These are actually written in Statistics textbooks. If this is true, please refer to an appropriate reference. Even if these lemmas are proved by you, it is unclear what you want to discuss with the lemmas. You implicitly assumed that a binomial distribution is approximated by a normal distribution. Though it is well known that a normal distribution can be an approximation to a binomial distribution for a large number of samples, you should mention that if you include the first lemma in your paper. In my opinion, the first lemma is unnecessary, though. More importantly, masking probability becomes a normal distribution, which is quite important from the aspect of IC reliability. You must explain more about probability, convexity, and skewness if you discuss statistics on masking effect. It is quite important how a real distribution of masking probability is, and how much masking probability is for typical microprocessors and benchmark programs. If what you wanted to do were to estimate how many times simulations were required, it would be meaningless without a real value of "p" which was a masking probability. Real distribution in masking upsets is much more interesting to the readers than the well-known lemmas. Answer C-3: The reviewer is right. We have just applied statistics lemmas from the textbook. The only reason for the existence of Section 4.1 was to give the readers a sense of how computing-intensive simulation-based techniques for estimating reliability are. This motivates for the use of architecturelevel metrics for reliability estimation even the architecture-level metrics are not ours. Indeed, it was proposed in MICRO 03 by Mukherjee et al. We just use the existing architecture-level reliability metric to evaluate page partitions onto the previously proposed PPC (Partially Protected Caches) architecture. We understand the reviewer s concern that this is easy extension of textbook material, and is also clear intuitively. Thus, we have removed Section 4.1 from this article.
8 Question C-4: From Pages 50 to 52, you have shown a vulnerability estimation algorithm which is basically same as the Asadi's work except that you estimate bytewise vulnerability. If you insist on the novelty in the byte-wise vulnerability estimation, please compare byte-wise and word-wise vulnerability estimations with some quantitative evaluation. Extention of wordwise to bytewise one is not so novel. I wonder whether bytewise estimation is better than wordwise one because an entire cache line, whose size is typically not a byte but four or eight bytes, is written out to the lower hierarchy of memory. Bytewise estimation is effective in the write-through policy. You seem not to assume the write-through policy because you mentioned "dirty". I'm not sure whether or not you should include the text in this paper as if the work was done by you. If I were you, I would just summarize the Asadi's work as their work (NOT MINE) with less space. Answer C-4: We do not claim that byte-level vulnerability is novel. We added Fig. 3. and the comprehensive idea to estimate vulnerability in the paper since we like to make our vulnerability metric self-contained as one of reviewers suggested in the prior revision. However, our comprehensive bytelevel vulnerability metric shows experimental results closer to the failure rate than Asadi s word-level critical time. Figure 5 shows these results. X-axis in Figure 5 represents the cache size and Y-axis represents the critical time of Asadi s work, vulnerability of ours, and the failure rates. In lower size of caches, word-level critical time captures well but it loses the accuracy in higher size of caches. For example, word-level critical time estimates the half of failure rate while our byte-level vulnerability is closer to the failure rate than word-level critical time. These results are because our byte-level vulnerability captures each byte vulnerable time (to be closer to the failure rate) and considers more comprehensive cases. Figure 5 Comparison of Vulnerability, Critical Time, and Failure Rate Question C-5: I don't understand your profiling strategy in Page 53 very much. I think that a combination of pages assigned to a cache affects its behavior as well as critical time. Is the interaction between pages negligible regarding vulnerability? You should show profiling results other than "Blowfish
9 Decryption" in order to justify your experiments. Assuming that the number of pages is N, there are 2^N page assignments for the PPC. There are about 50 pages in the example and 2^50 assignments theoretically. I can understand that you wanted to avoid the impossible number of simulations. However, you should mention the limitation of your experiments if you take approximate vulnerabilities. Answer C-5: Figures 6 to 14 below show profiling results with other benchmarks. As shown in Fig. 7. in the paper, the following figures present two observations: (i) the vulnerability decreases when each page is mapped from the unprotected cache to the protected cache in a PPC in the order of page vulnerability and (ii) page partitions significantly affect the performance. We believe that we mentioned the limitation of simulations for all possible combinations. This limitation applies for our approach as well. We did not claim that ours is the best out of all possible combinations of pages. The goal of our approaches is to efficiently figure out the interesting page assignments to two caches in a PPC in terms of vulnerability with least overheads of performance and energy consumption than those of random simulations and genetic algorithms. Figure 6 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Blowfish Encryption)
10 Figure 7 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (CRC) Figure 8 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (djpeg)
11 Figure 9 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (susan edges) Figure 10 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (FFT)
12 Figure 11 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Rijndael Decryption)
13 Figure 12 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Rijndael Encryption) Figure 13 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (SHA)
14 Figure 14 Tradeoffs among vulnerability and runtime when moving pages from unprotected cache to protected cache in a PPC (Stringsearch) Question C-6: You showed several experimental results in which you compared your algorithms with the Monte Carlo method and Genetic Algorithm. I think that is unfair and intentional because it is obvious that the Monte Carlo method does not converge quickly. Experimental results by the MC is unnecessary. It is common to compare a newly-proposed algorithm with simulated annealing and genetic algorithm. If you fairly compare your algorithms with the others, you should show the results by the simulated annealing, which is a common optimization metaheuristic method. This kind of an assignment problem can also be solved by integer linear programming. You should try to solve your problem as an ILP by using a commercial solver such as ILOG CPLEX, Dash Optimization Xpress-MP, and LINDO Systems LINDO. I think your problem might be solved with the evaluation version of LINDO, which is available for free. GLPK and LPSOLVE are also available for free, which are not so fast that you might not be able to use it as a baseline. Answer C-6: Thank you for your suggestion. However, we think the MC method and its evaluation still should be in the paper since the MC method is a typical random approach and we show this typical random approach does not guarantee the effectiveness of randomly explored partitions for our PPC architecture. We also compare our approach with a genetic algorithm to find the best partitions in terms of vulnerability with minimal overheads of performance and energy consumption, and our approach is more effective and efficient than a genetic algorithm. We are definitely interested in ILP approach to find out interesting partitions in terms of vulnerability, performance, and energy consumption in the next work. We already have some preliminary experimental results about instruction PPC architecture, in particular, and ILP approach will be evaluated as a main comparison.
15 Question C-7: You examined both vulnerability and runtime in a PPC structure. I think that you should show some circuit delay, cache access time in ps, with cache parameters changed. This is essential in justifying the effectiveness of the PPC. I think you should examine vulnerability, runtime, chip area of the other cache structures: (i) SEC-DED-protected and non-hybrid cache for various cache sizes, (ii) parityprotected and non-hybrid cache for various cache sizes, and (iii) plain and non-hybrid cache for various cache sizes. It is quite important to justify the advantages of the PPC by showing quantitative values in the non-hybrid caches and comparing vulnerability, runtime, and chip area of the PPC with those of the other cache structures. Answer C-7: Thank you for your suggestion. The main focus of this paper is to propose approaches for interesting partitions among an unprotected cache and a protected cache in a PPC architecture, which has been already published in our previous work [Lee et al. CASES 06 and TVLSI 09]. We provide all our experimental results and comparisons of ECC-protected cache, unprotected (plain or normal) cache, and our PPC architecture in terms of cache access time, power consumption, and area with different cache sizes at Answer C-1. We excluded parity-protected caches mainly because it only detects an error, not correct it. Question C-8: Is the energy consumption model in Page 59 correct? Let's think about a write miss on which a clean line is overwritten with a datum. Writing a datum onto a clean cache line does not cause any eviction. Is decoding necessary in writing a datum onto a clean cache line? Only checking a dirty bit would be enough. If writing a datum onto a clean cache line is negligible, mention that. Answer C-8: It is necessary to decode the data in writing a datum onto a clean cache line. Checking a dirty bit doesn t tell us whether a soft error occurs or not on this cache line. Decoding a cache line with a Hamming code tells whether a soft error occurs or not whenever a write operation happens. Question C-9: The followings are minor comments. - "projected" in line 2 in page 42 should be followed by "to". - The period after "signal interference" in page 42 is unnecessary. - You should mention the parity coding around the description on SEC-DED. - "Luccetti" should be "Lucchetti" in Page 47. Answer C-9: Thank the reviewer for these corrections. We have updated what the reviewer pointed out.
Data Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures
Data Partitioning Techniques for Partially Protected Caches to Reduce Soft Error Induced Failures Kyoungwoo Lee, Aviral Shrivastava, Nikil Dutt, and Nalini Venkatasubramanian Abstract Exponentially increasing
More informationFlexible Cache Error Protection using an ECC FIFO
Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead
More informationPartially Protected Caches to Reduce Failures due to Soft Errors in Multimedia Applications
Partially Protected Caches to Reduce Failures due to Soft Errors in Multimedia Applications Kyoungwoo Lee 1, Aviral Shrivastava 2, Ilya Issenin 1, Nikil Dutt 1, and Nalini Venkatasubramanian 1 1 Department
More informationArea-Efficient Error Protection for Caches
Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various
More informationEnhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension
Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information
More informationA Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal
More informationReducing Power Consumption for High-Associativity Data Caches in Embedded Processors
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Dan Nicolaescu Alex Veidenbaum Alex Nicolau Dept. of Information and Computer Science University of California at Irvine
More informationA Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction
A Cache-Assisted ScratchPad Memory for Multiple Bit Error Correction Hamed Farbeh, Student Member, IEEE, Nooshin Sadat Mirzadeh, Nahid Farhady Ghalaty, Seyed-Ghassem Miremadi, Senior Member, IEEE, Mahdi
More informationComputer Sciences Department
Computer Sciences Department SIP: Speculative Insertion Policy for High Performance Caching Hongil Yoon Tan Zhang Mikko H. Lipasti Technical Report #1676 June 2010 SIP: Speculative Insertion Policy for
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationSelective Fill Data Cache
Selective Fill Data Cache Rice University ELEC525 Final Report Anuj Dharia, Paul Rodriguez, Ryan Verret Abstract Here we present an architecture for improving data cache miss rate. Our enhancement seeks
More informationECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation
ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating
More informationMemory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez
Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection
More informationAvailable online at ScienceDirect. Procedia Technology 25 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 25 (2016 ) 544 551 Global Colloquium in Recent Advancement and Effectual Researches in Engineering, Science and Technology (RAEREST
More informationA Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory
Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal
More informationOutline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline
Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication Khanh N. Dang and Xuan-Tu Tran Email: khanh.n.dang@vnu.edu.vn VNU Key Laboratory for Smart Integrated Systems
More informationA Low-Power ECC Check Bit Generator Implementation in DRAMs
252 SANG-UHN CHA et al : A LOW-POWER ECC CHECK BIT GENERATOR IMPLEMENTATION IN DRAMS A Low-Power ECC Check Bit Generator Implementation in DRAMs Sang-Uhn Cha *, Yun-Sang Lee **, and Hongil Yoon * Abstract
More informationThreshold-Based Markov Prefetchers
Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this
More informationReducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity Donghyuk Lee Carnegie Mellon University Problem: High DRAM Latency processor stalls: waiting for data main memory high latency Major bottleneck
More information[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY SYSTEMATIC ERROR-CORRECTING CODES IMPLEMENTATION FOR MATCHING OF DATA ENCODED M.Naga Kalyani*, K.Priyanka * PG Student [VLSID]
More informationExploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors
Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,
More informationUsing a Victim Buffer in an Application-Specific Memory Hierarchy
Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science
More informationSTLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip
STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore
More informationLow Power Set-Associative Cache with Single-Cycle Partial Tag Comparison
Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Jian Chen, Ruihua Peng, Yuzhuo Fu School of Micro-electronics, Shanghai Jiao Tong University, Shanghai 200030, China {chenjian,
More informationDistributed Systems COMP 212. Revision 2 Othon Michail
Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise
More informationInstruction Cache Energy Saving Through Compiler Way-Placement
Instruction Cache Energy Saving Through Compiler Way-Placement Timothy M. Jones, Sandro Bartolini, Bruno De Bus, John Cavazosζ and Michael F.P. O Boyle Member of HiPEAC, School of Informatics University
More informationNAME: Problem Points Score. 7 (bonus) 15. Total
Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 NAME: Problem Points Score 1 40
More informationLimiting the Number of Dirty Cache Lines
Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology
More informationTechnique to Mitigate Soft Errors in Caches with CAM-based Tags
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. Technique to Mitigate Soft Errors in Caches with CAM-based Tags Luong D. HUNG, Masahiro GOSHIMA, and Shuichi
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications
More informationSFS: Random Write Considered Harmful in Solid State Drives
SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea
More informationEvaluating the Effects of Compiler Optimisations on AVF
Evaluating the Effects of Compiler Optimisations on AVF Timothy M. Jones, Michael F.P. O Boyle Member of HiPEAC, School of Informatics University of Edinburgh, UK {tjones1,mob}@inf.ed.ac.uk Oğuz Ergin
More informationCaches Concepts Review
Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on
More informationEXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu
Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points
More informationAn In-order SMT Architecture with Static Resource Partitioning for Consumer Applications
An In-order SMT Architecture with Static Resource Partitioning for Consumer Applications Byung In Moon, Hongil Yoon, Ilgu Yun, and Sungho Kang Yonsei University, 134 Shinchon-dong, Seodaemoon-gu, Seoul
More informationDelayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD
Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD Soojun Im School of ICE Sungkyunkwan University Suwon, Korea Email: lang33@skku.edu Dongkun Shin School of ICE Sungkyunkwan
More information250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011
250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011 Energy-Efficient Hardware Data Prefetching Yao Guo, Member, IEEE, Pritish Narayanan, Student Member,
More information80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD
80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY 2011 Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD Soojun Im and Dongkun Shin, Member, IEEE Abstract Solid-state
More informationAchieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation
Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation
More informationHEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A PHARMACEUTICAL MANUFACTURING LABORATORY
Proceedings of the 1998 Winter Simulation Conference D.J. Medeiros, E.F. Watson, J.S. Carson and M.S. Manivannan, eds. HEURISTIC OPTIMIZATION USING COMPUTER SIMULATION: A STUDY OF STAFFING LEVELS IN A
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More informationIntra-Task Dynamic Cache Reconfiguration *
Intra-Task Dynamic Cache Reconfiguration * Hadi Hajimiri, Prabhat Mishra Department of Computer & Information Science & Engineering University of Florida, Gainesville, Florida, USA {hadi, prabhat}@cise.ufl.edu
More informationMulti-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview
Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Prof. Wayne Wolf Overview Why Programmable Media Processors?
More informationErrors. Chapter Extension of System Model
Chapter 4 Errors In Chapter 2 we saw examples of how symbols could be represented by arrays of bits. In Chapter 3 we looked at some techniques of compressing the bit representations of such symbols, or
More informationExam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence
Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,
More informationHPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture
HPCA 18 Reliability-aware Data Placement for Heterogeneous memory Architecture Manish Gupta Ψ, Vilas Sridharan*, David Roberts*, Andreas Prodromou Ψ, Ashish Venkat Ψ, Dean Tullsen Ψ, Rajesh Gupta Ψ Ψ *
More informationAn Approach for Adaptive DRAM Temperature and Power Management
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance
More informationOn the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors
On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors Shuai Wang, Jie Hu, and Sotirios G. Ziavras Department of Electrical and Computer Engineering New Jersey
More informationDuke University Department of Electrical and Computer Engineering
Duke University Department of Electrical and Computer Engineering Senior Honors Thesis Spring 2008 Proving the Completeness of Error Detection Mechanisms in Simple Core Chip Multiprocessors Michael Edward
More informationSustainable Computing: Informatics and Systems
Sustainable Computing: Informatics and Systems 2 (212) 71 8 Contents lists available at SciVerse ScienceDirect Sustainable Computing: Informatics and Systems j ourna l ho me page: www.elsevier.com/locate/suscom
More informationA Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization
A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization Prabhat K. Saraswat, Ghazal Haghani and Appiah Kubi Bernard Advanced Learning and Research Institute,
More informationTowards Optimal Custom Instruction Processors
Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors
More informationA Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures
A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative
More informationTail Latency in ZooKeeper and a Simple Reimplementation
Tail Latency in ZooKeeper and a Simple Reimplementation Michael Graczyk Abstract ZooKeeper [1] is a commonly used service for coordinating distributed applications. ZooKeeper uses leader-based atomic broadcast
More informationFast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs
Fast SEU Detection and Correction in LUT Configuration Bits of SRAM-based FPGAs Hamid R. Zarandi,2, Seyed Ghassem Miremadi, Costas Argyrides 2, Dhiraj K. Pradhan 2 Department of Computer Engineering, Sharif
More informationLow-power Architecture. By: Jonathan Herbst Scott Duntley
Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media
More informationGENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T. R.
GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T R FPGA PLACEMENT PROBLEM Input A technology mapped netlist of Configurable Logic Blocks (CLB) realizing a given circuit Output
More informationUnderstanding The Effects of Wrong-path Memory References on Processor Performance
Understanding The Effects of Wrong-path Memory References on Processor Performance Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt The University of Texas at Austin 2 Motivation Processors spend
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationAddressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers
Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Subash Chandar G (g-chandar1@ti.com), Vaideeswaran S (vaidee@ti.com) DSP Design, Texas Instruments India
More informationArchitecture Tuning Study: the SimpleScalar Experience
Architecture Tuning Study: the SimpleScalar Experience Jianfeng Yang Yiqun Cao December 5, 2005 Abstract SimpleScalar is software toolset designed for modeling and simulation of processor performance.
More informationThe Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store
The Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store Building a new class of H.264 devices without external DRAM Power is an increasingly important consideration
More informationFPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes
FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes E. Jebamalar Leavline Assistant Professor, Department of ECE, Anna University, BIT Campus, Tiruchirappalli, India Email: jebilee@gmail.com
More informationA Formal Verification Methodology for Checking Data Integrity
A Formal Verification Methodology for ing Data Integrity Yasushi Umezawa, Takeshi Shimizu Fujitsu Laboratories of America, Inc., Sunnyvale, CA, USA yasushi.umezawa@us.fujitsu.com, takeshi.shimizu@us.fujitsu.com
More informationEfficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation
http://dx.doi.org/10.5573/jsts.2012.12.4.418 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.4, DECEMBER, 2012 Efficient Implementation of Single Error Correction and Double Error Detection
More informationHigh Performance SMIPS Processor
High Performance SMIPS Processor Jonathan Eastep 6.884 Final Project Report May 11, 2005 1 Introduction 1.1 Description This project will focus on producing a high-performance, single-issue, in-order,
More informationSupplementary Material for The Generalized PatchMatch Correspondence Algorithm
Supplementary Material for The Generalized PatchMatch Correspondence Algorithm Connelly Barnes 1, Eli Shechtman 2, Dan B Goldman 2, Adam Finkelstein 1 1 Princeton University, 2 Adobe Systems 1 Overview
More informationImplementing a Statically Adaptive Software RAID System
Implementing a Statically Adaptive Software RAID System Matt McCormick mattmcc@cs.wisc.edu Master s Project Report Computer Sciences Department University of Wisconsin Madison Abstract Current RAID systems
More informationSF-LRU Cache Replacement Algorithm
SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,
More informationMemory. Objectives. Introduction. 6.2 Types of Memory
Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN
255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationPerformance-Aware Speculation Control Using Wrong Path Usefulness Prediction. Chang Joo Lee Hyesoon Kim Onur Mutlu Yale N. Patt
Performance-Aware Speculation Control Using Wrong Path Usefulness Prediction Chang Joo Lee Hyesoon Kim Onur Mutlu Yale N. Patt High Performance Systems Group Department of Electrical and Computer Engineering
More informationOn the Security of the 128-Bit Block Cipher DEAL
On the Security of the 128-Bit Block Cipher DAL Stefan Lucks Theoretische Informatik University of Mannheim, 68131 Mannheim A5, Germany lucks@th.informatik.uni-mannheim.de Abstract. DAL is a DS-based block
More informationHigh-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring
High-Performance Parallel Accelerator for Flexible and Efficient Run-Time Monitoring Daniel Y. Deng and G. Edward Suh Computer Systems Laboratory, Cornell University Ithaca, New York 14850 {deng, suh}@csl.cornell.edu
More informationEECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC
EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC April 12, 2012 John Wawrzynek Spring 2012 EECS150 - Lec24-hdl3 Page 1 Parallelism Parallelism is the act of doing more than one thing
More informationPipelined processors and Hazards
Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor
More informationQUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose
QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,
More informationSurvey results. CS 6354: Memory Hierarchy I. Variety in memory technologies. Processor/Memory Gap. SRAM approx. 4 6 transitors/bit optimized for speed
Survey results CS 6354: Memory Hierarchy I 29 August 2016 1 2 Processor/Memory Gap Variety in memory technologies SRAM approx. 4 6 transitors/bit optimized for speed DRAM approx. 1 transitor + capacitor/bit
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationDRAM Tutorial Lecture. Vivek Seshadri
DRAM Tutorial 18-447 Lecture Vivek Seshadri DRAM Module and Chip 2 Goals Cost Latency Bandwidth Parallelism Power Energy 3 DRAM Chip Bank I/O 4 Sense Amplifier top enable Inverter bottom 5 Sense Amplifier
More information120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014
120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014 VL-ECC: Variable Data-Length Error Correction Code for Embedded Memory in DSP Applications Jangwon Park,
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Classes of Computers Personal computers General purpose, variety of software
More informationA General Sign Bit Error Correction Scheme for Approximate Adders
A General Sign Bit Error Correction Scheme for Approximate Adders Rui Zhou and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai,
More informationA Comparison of Capacity Management Schemes for Shared CMP Caches
A Comparison of Capacity Management Schemes for Shared CMP Caches Carole-Jean Wu and Margaret Martonosi Princeton University 7 th Annual WDDD 6/22/28 Motivation P P1 P1 Pn L1 L1 L1 L1 Last Level On-Chip
More informationCS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015
CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable
More informationUnderstanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive
Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive Abstract: A NAND flash memory/storage-class memory (SCM) hybrid solid-state drive (SSD) can
More informationCS3350B Computer Architecture CPU Performance and Profiling
CS3350B Computer Architecture CPU Performance and Profiling Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada
More informationECEC 355: Cache Design
ECEC 355: Cache Design November 28, 2007 Terminology Let us first define some general terms applicable to caches. Cache block or line. The minimum unit of information (in bytes) that can be either present
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationAn Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems
[DOI: 10.2197/ipsjtsldm.8.100] Short Paper An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems Takuya Hatayama 1,a) Hideki Takase 1 Kazuyoshi Takagi 1 Naofumi
More informationImproving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy
Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,
More informationLossless Compression using Efficient Encoding of Bitmasks
Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA
More information/$ IEEE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,
More informationVirtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1
Virtual Memory Patterson & Hennessey Chapter 5 ELEC 5200/6200 1 Virtual Memory Use main memory as a cache for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs
More informationSlide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng
Slide Set 5 for ENCM 501 in Winter Term, 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2017 ENCM 501 W17 Lectures: Slide
More informationLow-Power Data Address Bus Encoding Method
Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,
More informationTwo hours. No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date. Time
Two hours No special instructions. UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date Time Please answer any THREE Questions from the FOUR questions provided Use a SEPARATE answerbook
More informationReducing Instruction Fetch Cost by Packing Instructions into Register Windows
Reducing Instruction Fetch Cost by Packing Instructions into Register Windows Stephen Hines, Gary Tyson, David Whalley Computer Science Dept. Florida State University November 14, 2005 ➊ Introduction Reducing
More information