Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao
|
|
- Sylvia Jackson
- 6 years ago
- Views:
Transcription
1 Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor and the external world. The increasing gap between interfaces has pushed CPU designers to increase the bandwidth of the data transfer. Moreover, modern software applications span a very large address space. With very wide address and data busses, the power dissipation on bus interfaces is becoming a major concern. In microprocessor-based systems, large power savings can be achieved through reduction of the transition activity of the on- and off-chip busses. This is because the total capacitance being switched when a voltage change occurs on a bus line is usually larger than the capacitive load that must be charged/discharged when internal nodes toggle. Encoding techniques are very effective in limiting the number of signal transitions on the bus lines. The microprocessor-based systems are incorporated with hierarchical memory systems. The characteristics of addresses on the bus in hierarchical memory can be very random. So our goal is to study the performance of existing bus encoding techniques (T0 code [1] and Bus-invert code [2]) for different types of memory hierarchies (e.g. main memory, L1, and L2 caches). 1 Introduction Due to the intrinsic capacitances of the bus lines, a considerable amount of power is required at the I/O pins of a microprocessor when data have to be transmitted over the bus. More specifically, the capacitance driven by the I/O node is usually much larger than the one seen by the internal nodes of microprocessors. As a consequence, dramatic optimization of the average power consumption can be achieved by minimizing the number of transitions (i.e. the switching activity) on system-level buses. Encoding paradigms for reducing the switching activity in the bus lines have been recently investigated. In [4], the authors proposed a bit encoding approach to reduce the average number of switching occur on a bus. The basic observation, which originated their work, is that using a transition-based encoding instead of a level encoding may limit the number of transitions in the case of non-equiprobable input lines. The 1
2 technique in [4] first encodes the data words in such a way that the probabilities of each bit become as unbalanced as possible, and then applies transition encoding at the bit level. In a later work [2], the Bus-Invert code was proposed. This scheme uses redundancy to save power. If the Hamming distance between two successive patterns is larger than N / 2, where N is the bus width, the new pattern is transmitted with inverted polarity, thereby achieving a maximum of N / 2 signal transitions on the bus. An extra line I is needed to signal to the receiving end of the bus which polarity is used for the transmission of the incoming pattern. The bus-invert code works fine when data patterns to be transmitted are randomly distributed in time. Therefore, it is appropriate for encoding the information traveling on data buses. When the objective shifts to address bus encoding, a radically different behavior is observed. The addresses generated by a running microprocessor are often consecutive, since instructions are stored in adjacent sections of the memory space, and structured data are stored in consecutive memory locations for better locality. To exploit this property, [5] proposed to reduce the switching activity on address busses by adopting Gray code. Gray code is particularly attractive since it guarantees single bit transitions when consecutive addresses are accessed. However, Gray code does not achieve the minimum switching activity. As a result, in [1], the T0 code was proposed. The main idea of the T0 code is to avoid the transfer of consecutive addresses on the bus by using a redundant line, INC. The T0 code can achieve zero switching activity for consecutive addresses. 2 Previous Work 2.1 Bus-Invert Encoding Bus-invert [2] is a method of coding I/O which lowers the bus activity, and thus decreases the I/O peak power dissipation and the I/O average power dissipation. This method is best applied to buses, which are most likely to have very large capacitances associated with them and as a consequence, dissipate a lot of power. The activity on a typical data bus is characterized by a random uniformly distributed sequence of value. With this assumption, for any given time slot, the data on an n-bit wide bus can be any values with equal probability. The average number of transition per time slot will be n/2. Thus the average power dissipation for the I/O will be proportional to n/2. When all the bus-line toggle at the same time there will 2
3 be a maximum of n transitions in a time slot and thus the worst-case power dissipation is proportional to n. Data value is the piece of information that has to be transmitted over the bus in a given time slot. The bus value is the actual value on the bus. One control bit called invert is needed in order to do the coding. If invert equals to zero, the bus value is equal to the data value. If invert equals to one, the bus value is the inverted of the data value. Invert equals to one if the hamming distance (number of bits different) between the present bus value (also counting the present invert line) and the next data value is larger than n/2. The worst-case power dissipation can then be decreased by half by coding the data value with this technique. 2.2 T0 code The T0 code [3] exploits the property of consecutive addresses to reduce the switching activity of address busses. In the T0 code, there is an additional redundant line, INC, to the address bus. Its purpose is to signal with value one that a consecutive stream of addresses is output on the bus. If INC is high, all other lines on the bus are frozen. When the redundant line is driven to zero, the remaining bus lines are used as standard binary codes for the new addresses. If all the addresses of the ideal stream are consecutive, the INC line is always high, and the bus lines never transition. As a consequence, the asymptotic performance of the T0 code is zero transitions per emitted consecutive address. More formally, the encoding and decoding scheme of the T0 code can be described as Equation 1 and 2, where B (t) is the value on the encoded bus lines at time t, INC (t) is the additional bus line, b (t) is the address value at time t and S is a constant power of 2, which is called stride. ( B, INC ( t 1) ( t 1) ( B,1) if t > 0 and b = b + s ) = ( b,0) otherwise (1) b ( ( b = ( t B t 1) ) + s) if if INC = 1 and t INC = 0 > 0 (2) 2.3 Hybrid Bus Encoding Technique In [4], new encoding schemes were proposed for bus encoding. Those new schemes actually combine the properties of existing approaches, which are mainly the T0 code and the Bus-Invert code. In this section, 3
4 we will discuss the coding schemes proposed in [4]. In [3] analytical performance is compared between T0 and Bus-Invert techniques using the address trace generated by a RISC microprocessor. Three distinct cases are considered: an instruction address bus, a data address bus, and an instruction /data multiplexed address bus. The average percentage of sequential addresses in the benchmark stream is higher for an instruction address than for a data address stream. Therefore the T0 code outperforms the Bus-Invert technique. On the other hand, when the probability of in-sequence addresses is very low, as in the case of data addresses, the Bus-Invert technique outperforms the T0 Code technique. When the address bus is multiplexed, as in MIPS architecture, the sequential behavior is often interrupted when the selection signal switches from instruction to data and vice versa. Hence, the multiplexed address bus shows an intermediate behavior. Thus, the hybrid method is proposed to exploit the best properties of each method. There are three hybrid methods proposed in [4]: T0 BI, Dual T0, and Dual T0 BI encoding. The T0 BI encoding requires 2 redundant lines, INC and INV. When both INC and INV are zero, the original address is sent without any encoding. When INC is zero and INV is one, the invert of the address is sent. When INC is one, the address bus content is frozen to avoid switching and the decoder at the destination will increment the address by the amount specified by a stride. The Dual T0 encoding requires one redundant line, INC. When the address bus is multiplexed, the control signal, SEL, is asserted when an instruction address is transmitted, and de-asserted when the data address is transmitted. When both SEL and INC are one, the address bus content is frozen and the decoder at the destination will increment the address by the amount specified by a stride. When both INC and SEL are zero, the original address is sent without any encoding. The corresponding decoding scheme simply accepts the address when the INC is zero and increases the previous address of the previous time frame by the amount specified by a stride when the INC is one. The Dual T0 BI encoding is the combination of the previous two methods and requires one redundant line, INCV. When both SEL and INCV are one, the address bus content is frozen and the decoder at the destination will increment the address by the amount specified by a stride. When SEL is zero, INCV is one and the hamming distance is greater than N/2, where N is the total number of address buses, the invert of the address is sent. 4
5 3 Methodology Figure 1: Overview of the project. This project consists of four main components. (1) Address Trace Generator: Simulate input SPECint95 for 10 million cycles and generate 6 address trace binary files. - Instruction address stream from CPU to L1 I-cache - Instruction address stream from L1 I-cache to L2 Unified cache - Instruction address stream from L2 Unified cache to memory - Data address stream from CPU to L1 D-cache - Data address stream from L1 D-cache to L2 Unified cache - Data address stream from L2 Unified cache to memory (2) The translation counter reads the input binary file and outputs the total number of bus transitions. (3) The Endian converter is used to convert Big-endian binary to small endian binary and vice versa. Address Trace generator runs on a SUN SPARC platform so the address trace binary files are in big endian format. The other 3 components develop and run under a Linux platform so the binary format is in small endian. (4) T0 and Bus Invert Encoders/Decoders encode the input binary file and write the encoded result to an output binary file. It also provides the encoder statistic and uses decoder as an error checking mechanism. 5
6 The following table shows the system configuration we used to get the address trace Issue width RUU size LSQ size L1 I-Cache L1 D-Cache L2 U-Cache Memory Width 4 inst/cycle 16 entries 8 entries 16 KB, DM, 32 B Block 4 KB, 4 way SA, 32 B Block 64 KB, 4 way SA, 64 B Block 8 Bytes 4 Simulation results Percentage of Bus Transition Reduction using T0 bus encoding technique % reduction IL1 IL2 Imem Types of Address Traces Go Gcc Vortex Test-math Figure 2 For the Instruction Address stream in figure 2, T0 code is always able to reduce the switching activity while the bus invert failed even for an address stream from L1 to L2 and L2 to memory. The performance of T0 code decreases when the address travels further away from CPU due to the less consecutive address pattern. For a data Address stream, the performance benefits from both techniques are very random. For data stream in figure 3, in some cases, the performance of T0 code increases when the address travels further away from the CPU, which is the opposite case for the instruction address stream. This might be 6
7 because the L1 D-cache is 4 times smaller than L1 I-cache. So the miss rate of data cache is higher. And therefore, generates more addresses than the case for an address stream which increases the effect of T0 bus-encoding technique. Even though Test-math application is a small benchmark, seem like it still has misses stream to memory. This might be because the program life is very short so most fractions of the misses are from cold misses. Vortex is a data base application. It seems like vortex also has a lot of misses to memory due to large working set that is not fit in the cache. And probably the 10M-cycle range that we run might happen to access data in a fix distance pattern so T0 performs very well. Percentage of Bus Transition Reduction using T0 and Bus Invert bus encoding technique 100 % Reduction DL1 DL2 Dmem Type of Address Traces GoT0 GoBI GccT0 GccBI VortexT0 VortexBI Test-mathT0 Test-mathBI Figure 3 In figure 4, for application Go, even though bus-invert has a substantial percentage of encoding activation, the performance is very small. This shows the effect of the high probability of the humming distance of n/2(16 in this case). We expect that this effect should be reduced if we partition the bus into 8 of 4-sub buses. From figure 5, it can be seen that the percentage of encoding activation for data address traces are very random and this reflects the random performance benefit seen from figure3. Again, we see the effect of performance degradation due to a high probability of humming distance of n/2 of bus-invert technique in application Vortex. 7
8 Percentage of encoded Activation of T0 and Bus Invert bus encoding technique % Encoded IL1 IL2 Imem Type of Address Traces GoT0 GoBI GccT0 GccBI VortexT0 VortexBI Test-mathT0 Test-mathBI Figure 4 Percentage of encoded Activation of T0 and Bus Invert bus encoding technique 100 % Encoded DL1 DL2 Dmem Type of Address Traces GoT0 GoBI GccT0 GccBI VortexT0 VortexBI Test-mathT0 Test-mathBI Figure 5 8
9 5 Conclusion and Future work In most cases, T0 code outperforms Bus Invert encoding. So T0 code still give performance benefit even for hierarchical memory system for both instruction and data stream. There are some cases where Bus Invert coding outperforms T0 coding so it is useful to implement both techniques for hierarchical memory system. The characteristic of the data address stream of hierarchical memory system is very random and hard to predict and thus there are many cases that both techniques give a very small performance gain so it is necessary to explore other bus coding technique such as T0-Xor code, Offset Code[4] to improve the performance. Since we only run the experiment base on a single system configuration, it is interesting to study the effect of different cache/memory configuration and characteristics (e.g. miss/hit rate and latency) on each bus encoding technique to get more concrete summary of performance for both techniques. Moreover, most of the programs contain many phases with vary memory access characteristic. So the implementation of the phase change detection mechanism will be useful for the decision of switching from one encoding technique to another and helps improve performance. The phase change detection can be implemented by a counter of switching activities of the bus, a Reference history register, or a Hamming distance history register. References [1] L.Benini, G.Micheli, E.Macii, D.Sciuto, and C.Silvano. Asymptotic zero-transition activity encoding for address busses in low- power microprocessor-based systems. In Proc. Of GLS-VLSI-97, March1997 [2] M.R.Stan. Bus-invert coding for low-power I/O. IEEE Trans. On VLSI Systems, p 49-58, March 1995 [3] L.Benini, G.Micheli, E.Macii, D.Sciuto, and C.Silvano. Address bus encoding techniques for systemlevel power optimization. In Proc. Of DATE-98, Feb 1998 [4] Y.Aghaghiri, F.Fallah, and M.Pedram. Irredundant address bus encoding for low power. In Proc. Of ISLPED-01, Aug [5] H.Metha, R.M. Owens, M.J. Irwin, Some Issues in Gray Code Addressing, IEEE 6 th Great Lakes Symposium on VLSI, p , March
Low-Power Data Address Bus Encoding Method
Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,
More informationBus Encoding Techniques for System- Level Power Optimization
Chapter 5 Bus Encoding Techniques for System- Level Power Optimization The switching activity on system-level buses is often responsible for a substantial fraction of the total power consumption for large
More informationAddress Bus Encoding Techniques for System-Level Power Optimization. Dip. di Automatica e Informatica. Dip. di Elettronica per l'automazione
Address Bus Encoding Techniques for System-Level Power Optimization Luca Benini $ Giovanni De Micheli $ Enrico Macii Donatella Sciuto z Cristina Silvano # z Politecnico di Milano Dip. di Elettronica e
More informationMemory Bus Encoding for Low Power: A Tutorial
Memory Bus Encoding for Low Power: A Tutorial Wei-Chung Cheng and Massoud Pedram University of Southern California Department of EE-Systems Los Angeles CA 90089 Outline Background Memory Bus Encoding Techniques
More informationPower Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study
Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico
More informationOUTLINE Introduction Power Components Dynamic Power Optimization Conclusions
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism
More informationEfficient Power Reduction Techniques for Time Multiplexed Address Buses
Efficient Power Reduction Techniques for Time Multiplexed Address Buses Mahesh Mamidipaka enter for Embedded omputer Systems Univ. of alifornia, Irvine, USA maheshmn@cecs.uci.edu Nikil Dutt enter for Embedded
More informationA Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization
A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization Prabhat K. Saraswat, Ghazal Haghani and Appiah Kubi Bernard Advanced Learning and Research Institute,
More informationShift Invert Coding (SINV) for Low Power VLSI
Shift Invert oding (SINV) for Low Power VLSI Jayapreetha Natesan* and Damu Radhakrishnan State University of New York Department of Electrical and omputer Engineering New Paltz, NY, U.S. email: natesa76@newpaltz.edu
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation
ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating
More informationTradeoff between coverage of a Markov prefetcher and memory bandwidth usage
Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory.
More informationVERY large scale integration (VLSI) design for power
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,
More information2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]
EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian
More informationCache Justification for Digital Signal Processors
Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose
More informationNew Advances in Micro-Processors and computer architectures
New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationPower Efficient Arithmetic Operand Encoding
Power Efficient Arithmetic Operand Encoding Eduardo Costa, Sergio Bampi José Monteiro UFRGS IST/INESC P. Alegre, Brazil Lisboa, Portugal ecosta,bampi@inf.ufrgs.br jcm@algos.inesc.pt Abstract This paper
More informationEE414 Embedded Systems Ch 5. Memory Part 2/2
EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage
More informationISSN Vol.04,Issue.01, January-2016, Pages:
WWW.IJITECH.ORG ISSN 2321-8665 Vol.04,Issue.01, January-2016, Pages:0077-0082 Implementation of Data Encoding and Decoding Techniques for Energy Consumption Reduction in NoC GORANTLA CHAITHANYA 1, VENKATA
More informationComputer Organization & Assembly Language Programming
Computer Organization & Assembly Language Programming CSE 2312-002 (Fall 2011) Lecture 5 Memory Junzhou Huang, Ph.D. Department of Computer Science and Engineering Fall 2011 CSE 2312 Computer Organization
More informationReference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses
Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Tony Givargis and David Eppstein Department of Information and Computer Science Center for Embedded Computer
More informationTransition Reduction in Memory Buses Using Sector-based Encoding Techniques
Transition Reduction in Memory Buses Using Sector-based Encoding Techniques Yazdan Aghaghiri University of Southern California 3740 McClintock Ave Los Angeles, CA 90089 yazdan@sahand.usc.edu Farzan Fallah
More informationSF-LRU Cache Replacement Algorithm
SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationAdvanced Caching Techniques (2) Department of Electrical Engineering Stanford University
Lecture 4: Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee282 Lecture 4-1 Announcements HW1 is out (handout and online) Due on 10/15
More informationArea-Efficient Error Protection for Caches
Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various
More informationMultimedia Streaming. Mike Zink
Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=
More informationASSEMBLY LANGUAGE MACHINE ORGANIZATION
ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction
More informationSPECULATIVE MULTITHREADED ARCHITECTURES
2 SPECULATIVE MULTITHREADED ARCHITECTURES In this Chapter, the execution model of the speculative multithreading paradigm is presented. This execution model is based on the identification of pairs of instructions
More informationReducing Transitions on Memory Buses Using Sectorbased Encoding Technique
Reducing Transitions on Memory Buses Using Sectorbased Encoding Technique Yazdan Aghaghiri University of Southern California 3740 McClintock Ave Los Angeles, CA 90089 yazdan@sahand.usc.edu Farzan Fallah
More informationComparing Multiported Cache Schemes
Comparing Multiported Cache Schemes Smaїl Niar University of Valenciennes, France Smail.Niar@univ-valenciennes.fr Lieven Eeckhout Koen De Bosschere Ghent University, Belgium {leeckhou,kdb}@elis.rug.ac.be
More informationDigital Semiconductor Alpha Microprocessor Product Brief
Digital Semiconductor Alpha 21164 Microprocessor Product Brief March 1995 Description The Alpha 21164 microprocessor is a high-performance implementation of Digital s Alpha architecture designed for application
More informationPower Reduction Techniques in the Memory System. Typical Memory Hierarchy
Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache
More informationHardware Design I Chap. 10 Design of microprocessor
Hardware Design I Chap. 0 Design of microprocessor E-mail: shimada@is.naist.jp Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationPower Aware Encoding for the Instruction Address Buses Using Program Constructs
Power Aware Encoding for the Instruction Address Buses Using Program Constructs Prakash Krishnamoorthy and Meghanad D. Wagh Abstract This paper examines the address traces produced by various program constructs.
More informationFunctional Units of a Modern Computer
Functional Units of a Modern Computer We begin this lecture by repeating a figure from a previous lecture. Logically speaking a computer has four components. Connecting the Components Early schemes for
More informationComputer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing
More informationPredictive Line Buffer: A fast, Energy Efficient Cache Architecture
Predictive Line Buffer: A fast, Energy Efficient Cache Architecture Kashif Ali MoKhtar Aboelaze SupraKash Datta Department of Computer Science and Engineering York University Toronto ON CANADA Abstract
More informationRTL Power Estimation and Optimization
Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL
More informationArchitectures and Synthesis Algorithms for Power-Efficient Bus Interfaces
IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 9, SEPTEMBER 2000 969 Architectures and Synthesis Algorithms for Power-Efficient Bus Interfaces Luca Benini,
More informationMIPS) ( MUX
Memory What do we use for accessing small amounts of data quickly? Registers (32 in MIPS) Why not store all data and instructions in registers? Too much overhead for addressing; lose speed advantage Register
More informationChapter Seven. Large & Fast: Exploring Memory Hierarchy
Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy
More informationChapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs
Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple
More informationCHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER
84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The
More informationARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES
ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu
More informationLecture 25: Busses. A Typical Computer Organization
S 09 L25-1 18-447 Lecture 25: Busses James C. Hoe Dept of ECE, CMU April 27, 2009 Announcements: Project 4 due this week (no late check off) HW 4 due today Handouts: Practice Final Solutions A Typical
More informationAdaptive Low-Power Address Encoding Techniques Using Self-Organizing Lists
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO.5, OCTOBER 2003 827 Adaptive Low-Power Address Encoding Techniques Using Self-Organizing Lists Mahesh N. Mamidipaka, Daniel
More informationChapter-5 Memory Hierarchy Design
Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or
More informationCHAPTER 5 : Introduction to Intel 8085 Microprocessor Hardware BENG 2223 MICROPROCESSOR TECHNOLOGY
CHAPTER 5 : Introduction to Intel 8085 Hardware BENG 2223 MICROPROCESSOR TECHNOLOGY The 8085A(commonly known as the 8085) : Was first introduced in March 1976 is an 8-bit microprocessor with 16-bit address
More informationI) The Question paper contains 40 multiple choice questions with four choices and student will have
Time: 3 Hrs. Model Paper I Examination-2016 BCA III Advanced Computer Architecture MM:50 I) The Question paper contains 40 multiple choice questions with four choices and student will have to pick the
More informationPower-Aware Bus Encoding Techniques for I/O and Data Busses in an Embedded System
Power-Aware Bus Encoding Techniques for I/O and Data Busses in an Embedded System Wei-Chung Cheng and Massoud Pedram Dept. of EE-Systems University of Southern California Los Angeles, CA 90089 ABSTRACT
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationCSE 380 Computer Operating Systems
CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Note on Disk I/O 1 I/O Devices Storage devices Floppy, Magnetic disk, Magnetic tape, CD-ROM, DVD User
More informationA hardware operating system kernel for multi-processor systems
A hardware operating system kernel for multi-processor systems Sanggyu Park a), Do-sun Hong, and Soo-Ik Chae School of EECS, Seoul National University, Building 104 1, Seoul National University, Gwanakgu,
More informationArchitectures of Flynn s taxonomy -- A Comparison of Methods
Architectures of Flynn s taxonomy -- A Comparison of Methods Neha K. Shinde Student, Department of Electronic Engineering, J D College of Engineering and Management, RTM Nagpur University, Maharashtra,
More informationThe Impact of Write Back on Cache Performance
The Impact of Write Back on Cache Performance Daniel Kroening and Silvia M. Mueller Computer Science Department Universitaet des Saarlandes, 66123 Saarbruecken, Germany email: kroening@handshake.de, smueller@cs.uni-sb.de,
More informationECE 571 Advanced Microprocessor-Based Design Lecture 13
ECE 571 Advanced Microprocessor-Based Design Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements More on HW#6 When ask for reasons why cache
More informationDesign and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor
Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Abstract The proposed work is the design of a 32 bit RISC (Reduced Instruction Set Computer) processor. The design
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationINTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design
INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history
More informationThe Memory Component
The Computer Memory Chapter 6 forms the first of a two chapter sequence on computer memory. Topics for this chapter include. 1. A functional description of primary computer memory, sometimes called by
More informationAccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors*
Appears in the Proceedings of Design, Automation and Test in Europe Conference, March 2002 AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors* Dmitry Ponomarev, Gurhan Kucuk and
More informationLow Power Set-Associative Cache with Single-Cycle Partial Tag Comparison
Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Jian Chen, Ruihua Peng, Yuzhuo Fu School of Micro-electronics, Shanghai Jiao Tong University, Shanghai 200030, China {chenjian,
More informationPower Protocol: Reducing Power Dissipation on Off-Chip Data Buses
Power Protocol: Reducing Power Dissipation on Off-Chip Data Buses K. Basu, A. Choudhary, J. Pisharath ECE Department Northwestern University Evanston, IL 60208, USA fkohinoor,choudhar,jayg@ece.nwu.edu
More informationI/O Management and Disk Scheduling. Chapter 11
I/O Management and Disk Scheduling Chapter 11 Categories of I/O Devices Human readable used to communicate with the user video display terminals keyboard mouse printer Categories of I/O Devices Machine
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationMultiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University
A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor
More informationEncoding Scheme for Power Reduction in Network on Chip Links
RESEARCH ARICLE OPEN ACCESS Encoding Scheme for Power Reduction in Network on Chip Links Chetan S.Behere*, Somulu Gugulothu** *(Department of Electronics, YCCE, Nagpur-10 Email: chetanbehere@gmail.com)
More informationTypical Processor Execution Cycle
Typical Processor Execution Cycle Instruction Fetch Obtain instruction from program storage Instruction Decode Determine required actions and instruction size Operand Fetch Locate and obtain operand data
More information18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013
18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,
More informationIntroduction to Microprocessor
Introduction to Microprocessor The microprocessor is a general purpose programmable logic device. It is the brain of the computer and it performs all the computational tasks, calculations data processing
More informationOne-Level Cache Memory Design for Scalable SMT Architectures
One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract
More informationLossless Compression using Efficient Encoding of Bitmasks
Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached
More informationIntroduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem
Introduction Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: Increase computation power Make the best use of available bandwidth We study the bandwidth
More informationReducing Data Cache Energy Consumption via Cached Load/Store Queue
Reducing Data Cache Energy Consumption via Cached Load/Store Queue Dan Nicolaescu, Alex Veidenbaum, Alex Nicolau Center for Embedded Computer Systems University of Cafornia, Irvine {dann,alexv,nicolau}@cecs.uci.edu
More informationA Scalable Multiprocessor for Real-time Signal Processing
A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch
More informationCS 24: INTRODUCTION TO. Spring 2015 Lecture 2 COMPUTING SYSTEMS
CS 24: INTRODUCTION TO Spring 2015 Lecture 2 COMPUTING SYSTEMS LAST TIME! Began exploring the concepts behind a simple programmable computer! Construct the computer using Boolean values (a.k.a. bits )
More informationCache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics
More informationComputer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can
More informationModule 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.
MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line
More informationMain Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1
Main Memory Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 Main Memory Background Swapping Contiguous allocation Paging Segmentation Segmentation with paging
More informationCHAPTER 5 A Closer Look at Instruction Set Architectures
CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 5.2 Instruction Formats 5.2.1 Design Decisions for Instruction Sets 5.2.2 Little versus Big Endian 5.2.3 Internal Storage in the
More informationChapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall 1 2 Differences
More informationArchitectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad
nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses
More informationIntroduction to Computers - Chapter 4
Introduction to Computers - Chapter 4 Since the invention of the transistor and the first digital computer of the 1940s, computers have been increasing in complexity and performance; however, their overall
More informationHybrid Signed Digit Representation for Low Power Arithmetic Circuits
Hybrid Signed Digit Representation for Low Power Arithmetic Circuits Dhananjay S. Phatak Steffen Kahle, Hansoo Kim and Jason Lue Electrical Engineering Department State University of New York Binghamton,
More informationDelay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier
Delay Optimised 16 Bit Twin Precision Baugh Wooley Multiplier Vivek. V. Babu 1, S. Mary Vijaya Lense 2 1 II ME-VLSI DESIGN & The Rajaas Engineering College Vadakkangulam, Tirunelveli 2 Assistant Professor
More informationFast Design Space Subsetting. University of Florida Electrical and Computer Engineering Department Embedded Systems Lab
Fast Design Space Subsetting University of Florida Electrical and Computer Engineering Department Embedded Systems Lab Motivation & Greater Impact Energy & Data Centers Estimated¹ energy by servers data
More informationAlternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model
What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:
More informationChapter 13 Reduced Instruction Set Computers
Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining
More informationControl Hazards. Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More information