Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose
|
|
- August Neal
- 5 years ago
- Views:
Transcription
1 Journal From the SelectedWorks of Kirat Pal Singh Winter December 28, 203 Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose Hadeel Sh. Mahmood, College of Electrical and Electronic Techniques, Baghdad, Iraq Safaa S. Omran, College of Electrical and Electronic Techniques, Baghdad, Iraq This work is licensed under a Creative Commons CC_BY-NC International License. Available at:
2 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 Pipelined MIPS Processor with Cache Controller using VHDL Implementation for Educational Purposes Hadeel Sh. Mahmood Computer Engineering Techniques College of Electrical and Electronic Techniques Baghdad, Iraq Safaa S. Omran Computer Engineering Techniques College of Electrical and Electronic Techniques Baghdad, Iraq Abstract This research adopts the VHDL (Very high speed IC Hardware Description Language) design of a direct mapped cache controller for a pipelined MIPS (Microprocessor without Interlocked Pipeline Stages) processor. In this design, the instruction cache and data cache are separated and located in the CPU (Central Processing Unit) core. Write back policy is used while no replacement algorithm is required. After completing the cache controller design, it is combined with a pipelined MIPS processor and used in programs execution. These designs are synthesized using (Xilinx ISE Design Suite 3.4) and simulated using (Xilinx ISim simulator). Keywords VHDL, MIPS, instruction cache, data cache, CPU, write back. I. INTRODUCTION Computer pioneers correctly predicted that programmers would want unlimited amounts of fast memory. An economical solution to that desire is a memory hierarchy, which takes advantage of locality and trade-offs in the costperformance of memory technologies. The principle of locality says that most programs do not access all code or data uniformly. Locality occurs in time (temporal locality) and in space (spatial locality) []. A typical memory hierarchy starts with a small, expensive, and relatively fast unit, called the cache, followed by a larger, less expensive, and relatively slow main memory unit [2]. Many previous researches have made the VHDL (Very high speed IC Hardware Description Language) design of a pipelined MIPS (Microprocessor without Interlocked Pipeline Stages) processor that works with an main ideal memory that could be accessed in a single clock cycle [3-5]. However, this would be true only for a very small memory or a very slow processor. Since VHDL code is simply one of the methods to describe a hardware design, this research used it to implement the design of the cache controller for the pipelined MIPS processor [6]. II. CACHE MEMORY PRINCIPLES The cache contains a copy of portions of main memory. A request for accessing a memory element is made by the processor through issuing the address of the requested element. If the data requested by the processor appears in some block in the cache, this is called a hit. If the data is not found in the cache, the request is called a miss. The main memory is then accessed to retrieve the block containing the requested data, during that the whole pipelined MIPS must be stalled [7]. III. CACHE DESIGN ELEMENTS As shown in Fig., the cache lies on the same chip as the processor and has the following design choices: A. Number of Caches In order to avoid structural hazards, the cache has been spliced into two: one dedicated to instructions and one dedicated to data. These two caches both exist at the CPU (Central Processing Unit) core, typically as two level one (L) caches. When the processor attempts to fetch an instruction from main memory, it first consults the instruction L cache, and when the processor attempts to fetch data from main memory, it first consults the data L cache. B. Cache Addresses There is no need to include a Memory Management Unit (MMU) in this design because the cache memory directly receives physical addresses instead of virtual addresses from the MIPS processor. C. Mapping Function Because there are fewer cache lines than main memory blocks, direct mapping is used in this design to map each block of main memory into only one possible cache line. D. Write policy This design minimizes memory writes by using write back policy where a copy of the data is written to cache by the processor and not to main memory. When new data is written to cache, a dirty bit associated with the line is set. Then, when a block is replaced, it is written back to main memory if and only if the dirty bit is set.
3 Fig. complete dasign of the MIPS processor with cache memories The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203
4 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 E. Block replacement algorithm For direct mapping, there is only one possible line for any particular block to be replaced, and no choice is possible. F. Cache Size In order to keep design simple, each cache size is limited to 64 bytes, organized as 4 lines, each line has 4 words, and each word is 4 bytes in length. Therefore, main memory address is organized as shown in Fig. 2. Fig. 2 Address bit field format IV. CACHE CONTROLLER The cache memory is coordinated by a cache controller. All addresses are first sent by the MIPS processor to the cache controller which decides whether the needed data is in the cache. If it is, then no memory access is needed, the data is provided to the MIPS processor directly from the cache; if not, then the cache controller fetches several words from main memory consecutively to fill the corresponding line in the cache. Each of data cache and instruction cache has a cache controller, the cache controller consists of: ) Finite State Machine (FSM): the FSM of instruction cache differs from that of data cache because instructions are fetched by the processor and executed without modification while data is accessed for read or write. The FSM for data cache is explained in Fig. 3, while table explains its work, and table 2 describes the function of the nine output signals of the cache FSM. state TABLE I TRUTH TABLE OF DATA CACHE FSM TABLE II THE EFFECT OF EACH OF THE NINE OUTPUT SIGNALS OF CACHE FSM. Signal Signal name Signal effect value stall cachewr cacherd memrd memwr cache_data_src mem_addr_src rst_dly wsel(:0) Hit Main memory is accessed and the whole 0 pipeline is stalled. Cache memory is accessed and the pipelined registers are captured on the next falling edge. When cache hit occurs, data supplied by the processor is written into cache memory. When cache hit occurs, data is supplied to the processor from cache memory. When cache miss occurs, data is supplied to the cache memory from main memory. When cache miss occurs and dirty bit is set, data block which is supplied by cache memory is written into main memory. The value fed to the cache_data_in input of 0 cache memory comes from the processor. The value fed to the cache_data_in input of cache memory comes from main memory. 0 inputs read write dirty stall outputs Start (st0) x 0 0 x x x 00 Write cache (st) x x 00 Read cache (st2) 0 x x x 00 WW 0 (st3) 0 or x 0 00 WW (st4) 0 or x 0 0 WW 2 (st5) 0 or x 0 0 WW 3 (st6) 0 or x 0 RW 0 (st7) RW (st8) RW 2 (st9) RW 3 (st0) cachewr cacherd memrd memwr Cache_data_src The address fed to the amem input of main memory comes from the processor. The address fed to the amem input of main memory equals to (tag & I & 0). 0 There is main memory read or write. Mem_addr_src Rst_dly wsel There is no main memory activity. 00 The first (least significant) word of memory block is selected. 0 The second word of memory block is selected. 0 The third word of memory block is selected. The fourth (most significant) word of memory block is selected. Fig. 3 State machine for data cache
5 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 The state machine of instruction cache is a sub-state machine of data cache that does not contain the states performing write actions. The FSM for instruction cache is explained in Fig. 4, while table 3 explains its work. Fig. 4 State machine for instruction cache TABLE III TRUTH TABLE OF INSTRUCTION CACHE FSM inputs outputs Fig. 5 Complete data cache system state Hit Mem_rdy stall cachewr cacherd memrd Rst_dly wsel Start (st0) x x Read cache (st) Rw 0 (st2) Rw (st3) Rw 2 (st4) Rw3 (st5) ) Tag cache: data tag cache contains 26 tag bits, valid bit and dirty bit for each data cache line. Tag bits are used for holding the 26 most significant bits of the address being accessed, valid bit indicates whether the cache line is valid or not and dirty bit is set when cache line is over written without updating the corresponding main memory block. All valid and dirty bits are reset when the machine restarts. Instruction tag cache is similar to data tag but does not have dirty bits. V. COMPLETE MEMORY SYSTEM After completing the cache controllers design, data cache controller is combined with data cache while instruction controller is combined with instruction cache. Both of them access the main memory that consists of 52 bytes, arranged as 2 segments; one segment for data and the other for instruction, each segment has 6 blocks, each block consists of 4 words and each word contains 4 bytes. Fig. 5 shows the connections between data cache, data controller and main memory, while Connections between instruction cache, instruction controller and main memory are shown in Fig. 6. Fig. 6 Complete instruction cache system VI. VHDL TOP- LEVEL IMPLEMENTATION Fig 7. shows the pipelined MIPS connected to Instruction and data memories that can be accessed in four clock cycles. The pipelined MIPS consists of 5 pipeline stages: fetch (F) stage, decode (D) stage, execute (EXE) stage, memory (MEM) stage and write back (WB) stage. Actually processor speed is faster than memory speed, to decrease the gap between processor and memory, data and instruction caches are placed inside the MIPS CPU core as shown in Fig. 8. Whereas instruction cache is located and accessed at fetch (F) stage, data cache is located and accessed at the memory (MEM) stage.
6 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 VII. RESULTS The procedure shown in Fig. 9 is stored in main memory. This procedure finds the factorial of number 6 and it should write (0) h to memory location (84) h and (2d0) h to memory location (80) h if all instructions run correctly, if not, it means the VHDL cache design is incorrect. Fig. 7 Pipelined MIPS with memories that can be accessed in four clock cycles In this design, several VHDL components are combined together to form the pipelined MIPS processor, data Cache, instruction cache and main memory. All these components are connected together as shown in Fig. 8 in order to compose the top level. Later a test bench is written and used to execute a program. Fig. 9 Top-level test procedure By using the VHDL top-level to execute this procedure, results shown in Fig. 0 have been gotten which indicate the design correctness. As memwrite signal is high, (000002d0)h is stored at memory location (80)h as well as ( )h is stored at memory location (84)h. After that, a comparison between the two systems is made in terms of performance and speedup as shown in table 4. The CPI (Clock Per Instruction) metric is calculated by using equation (program Execution time = Instruction count CPI Clock period), where instruction count in the test procedure is 90 instructions. TABLE IV PERFORMANCE COMPARISON BETWEEN MIPS SYSTEMS WITH REAL MEMORY processor Program execution time No. of clock cycles Clock period CPI Speedup Pipelined MIPS Pipelined MIPS with cache 5030 ns ns ns ns Fig. 8 Pipelined MIPS with cache memories
7 The First International Conference of Electrical, Communication, Computer, Power and Control Engineering ICECCPCE'3/December7-8, 203 This design is configured on Xilinx Spartan-3AN FPGA (Field Programmable Gate Array) starter kit and the results shown in Fig. have been gotten. Fig. 0 Test procedure simulation Fig. Results on SPARTAN-3AN VIII. CONCLUSIONS In this paper, the VHDL design of a cache system for the pipelined MIPS has been implemented. This design used direct mapping function, write back policy. The cache system consists of two separated on chip caches; one for data and one for instruction. A cache controller consists of finite state machine and tag directory is associated with each cache. After completing cache system design, it is combined with a pipelined MIPS processor, then several programs were executed and simulated on both systems (pipelined MIPS system and pipelined MIPS with cache memories system) and the desired results were obtained which indicates the correctness of the design. The Xilinx ISE Design Suite 3.4 program is used for design synthesis while the Xilinx ISim simulator program is used to simulate this design which is then configured on a Xilinx Spartan-3AN FPGA starter kit. REFERENCES [] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 5 th ed., San Francisco, USA: Morgan Kaufmann, 202. [2] L. Yin, L. Tingtao, MIPS CPU Design, Sanghai Jiao Tong University, School of Software, CPU Design Report, [3] Alekya N., P. G. Kumar, Design Of 32-Bit RISC CPU Based On MIPS, Journal of Global Research in Computer Science, vol. 2, no. 9, pp , Sept. 20. [4] K. P. Singh, S. Parmar, "Vhdl Implementation of a MIPS-32 Pipeline Processor ", International Journal of Applied Engineering Research, vol. 7, No., pp , 202. [5] S. P. Katke, G. P. Jain, Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor, International Journal of Emerging Technology and Advanced Engineering, vol. 2, no. 4, pp , Apr [6] D. L. Perry, VHDL: Programming by Example, 4 th ed., America: McGraw-Hill, [7] J. L. Hennessy and D. A. Patterson, Computer Organization and Design: The Hardware/Software Interface, 4 th ed., Waltham, USA: Morgan Kaufmann, 202.
VHDL Implementation of a MIPS-32 Pipeline Processor
Journal From the SelectedWorks of Kirat Pal Singh Winter November 9, 2012 VHDL Implementation of a MIPS-32 Pipeline Processor Kirat Pal Singh Shivani Parmar This work is licensed under a Creative Commons
More informationNovel Design of Dual Core RISC Architecture Implementation
Journal From the SelectedWorks of Kirat Pal Singh Spring May 18, 2015 Novel Design of Dual Core RISC Architecture Implementation Akshatha Rai K, VTU University, MITE, Moodbidri, Karnataka Basavaraj H J,
More informationFPGA Implementation of MIPS RISC Processor
FPGA Implementation of MIPS RISC Processor S. Suresh 1 and R. Ganesh 2 1 CVR College of Engineering/PG Student, Hyderabad, India 2 CVR College of Engineering/ECE Department, Hyderabad, India Abstract The
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140
More informationDesign of a Pipelined 32 Bit MIPS Processor with Floating Point Unit
Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3
More informationregisters data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.
13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we
More information3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationComputer Architecture CS372 Exam 3
Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card
More informationLaboratory Pipeline MIPS CPU Design (2): 16-bits version
Laboratory 10 10. Pipeline MIPS CPU Design (2): 16-bits version 10.1. Objectives Study, design, implement and test MIPS 16 CPU, pipeline version with the modified program without hazards Familiarize the
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main
More informationDesign of 16-bit RISC Processor Supraj Gaonkar 1, Anitha M. 2
Design of 16-bit RISC Processor Supraj Gaonkar 1, Anitha M. 2 1 M.Tech student, Sir M Visvesvaraya Institute of Technology Bangalore. Karnataka, India 2 Associate Professor Department of Telecommunication
More informationLaboratory Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction
Laboratory 6 6. Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction 6.1. Objectives Study, design, implement and test Instruction Decode Unit for the 16-bit Single-Cycle MIPS
More informationCOE758 Digital Systems Engineering
COE758 Digital Systems Engineering Project #1 Memory Hierarchy: Cache Controller Objectives To learn the functionality of a cache controller and its interaction with blockmemory (SRAM based) and SDRAM-controllers.
More informationDesign of High Performance MIPS-32 Pipeline Processor
Journal From the SelectedWorks of Kirat Pal Singh Summer April 21, 2012 Design of High Performance MIPS-32 Pipeline Processor Kirat Pal Singh Dilip Kumar This work is licensed under a Creative Commons
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationFinal Exam Fall 2007
ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationLaboratory Single-Cycle MIPS CPU Design (4): 16-bits version One clock cycle per instruction
Laboratory 7 7. Single-Cycle MIPS CPU Design (4): 16-bits version One clock cycle per instruction 7.1. Objectives Study, design, implement and test Instruction Execute Unit for the 16-bit Single-Cycle
More informationDesign and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor
Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor Abstract The proposed work is the design of a 32 bit RISC (Reduced Instruction Set Computer) processor. The design
More informationTEACHING COMPUTER ARCHITECTURE THROUGH DESIGN PRACTICE. Guoping Wang 1. INTRODUCTION
TEACHING COMPUTER ARCHITECTURE THROUGH DESIGN PRACTICE Guoping Wang Indiana University Purdue University Fort Wayne, Indiana; Email:wang@engr.ipfw.edu 1. INTRODUCTION Computer Architecture is a common
More informationENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013
ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of
More informationVHDL Design and Implementation of ASIC Processor Core by Using MIPS Pipelining
Journal From the SelectedWorks of Journal April, 2014 VHDL Design and Implementation of ASIC Processor Core by Using MIPS Pipelining G. Triveni Aswini Kumar Gadige This work is licensed under a Creative
More informationWhat is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.
PIPELINING What is Pipelining A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed. - A Pipeline is a series of
More informationFinal Exam Spring 2017
COE 3 / ICS 233 Computer Organization Final Exam Spring 27 Friday, May 9, 27 7:3 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of Petroleum & Minerals
More informationCSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much
More informationCSE 2021: Computer Organization
CSE 2021: Computer Organization Lecture-12a Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB
More informationEECS151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: John Wawrzynek and Nick Weaver. Lecture 19: Caches EE141
EECS151/251A Spring 2018 Digital Design and Integrated Circuits Instructors: John Wawrzynek and Nick Weaver Lecture 19: Caches Cache Introduction 40% of this ARM CPU is devoted to SRAM cache. But the role
More informationTHE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination
THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This
More informationEFFICIENT HARDWARE DESIGN AND IMPLEMENTATION OF ENCRYPTED MIPS PROCESSOR
EFFICIENT HARDWARE DESIGN AND IMPLEMENTATION OF ENCRYPTED MIPS PROCESSOR Kirat Pal Singh, Centre for Development of Advanced Computing (C-DAC), Mohali, Punjab, India Kirat_addiwal@yahoo.com Dilip Kumar,
More informationPerformance Improvement in MIPS Pipeline Processor based on FPGA
Performance Improvement in MIPS Pipeline Processor based on FPGA Kirat Pal Singh 1, Shiwani Dod 2 Senior Project Fellow 1, Student 2 1 CSIR-Central Scientific Instruments Organisation, Chandigarh, India
More informationMulti Cycle Implementation Scheme for 8 bit Microprocessor by VHDL
Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL Sharmin Abdullah, Nusrat Sharmin, Nafisha Alam Department of Electrical & Electronic Engineering Ahsanullah University of Science & Technology
More informationPipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010
Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationTHE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE
THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Assertion Based Verification of I2C Master Bus Controller with RTC Sagar T. D. M.Tech Student, VLSI Design and Embedded Systems BGS Institute of Technology,
More informationElectronics Engineering, DBACER, Nagpur, Maharashtra, India 5. Electronics Engineering, RGCER, Nagpur, Maharashtra, India.
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Design and Implementation
More informationAgenda. EE 260: Introduction to Digital Design Memory. Naive Register File. Agenda. Memory Arrays: SRAM. Memory Arrays: Register File
EE 260: Introduction to Digital Design Technology Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa 2 Technology Naive Register File Write Read clk Decoder Read Write 3 4 Arrays:
More informationCSE 2021: Computer Organization
CSE 2021: Computer Organization Lecture-12 Caches-1 The basics of caches Shakil M. Khan Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationMemory Hierarchy: Motivation
Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationRegisters. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH
PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationV. Primary & Secondary Memory!
V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)
More informationMemory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy
ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses
More informationChapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)
Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationSingle cycle MIPS data path without Forwarding, Control, or Hazard Unit
Single cycle MIPS data path without Forwarding, Control, or Hazard Unit Figure 1: an Overview of a MIPS datapath without Control and Forwarding (Patterson & Hennessy, 2014, p. 287) A MIPS 1 single cycle
More informationMemory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy
Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Levels in
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationCache Optimization. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Cache Optimization Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Cache Misses On cache hit CPU proceeds normally On cache miss Stall the CPU pipeline
More informationCSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1
CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson
More informationFinal Exam Fall 2008
COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of
More informationChapter 5. Memory Technology
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCSF Improving Cache Performance. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]
CSF Improving Cache Performance [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationMemory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB
Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationDESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS
International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 2, Number 4 (August 2013), pp. 140-146 MEACSE Publications http://www.meacse.org/ijcar DESIGN AND IMPLEMENTATION OF VLSI
More informationMemory Hierarchy: The motivation
Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory
More informationWhy memory hierarchy
Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast memory fast memory expensive, slow memory cheap cache: small, fast memory near CPU large, slow memory (main memory,
More informationCACHE MEMORIES ADVANCED COMPUTER ARCHITECTURES. Slides by: Pedro Tomás
CACHE MEMORIES Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix B, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationCOSC3330 Computer Architecture Lecture 19. Cache
COSC3330 Computer Architecture Lecture 19 Cache Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Cache Topics 3 Cache Hardware Cost How many total bits are required
More informationFull Name: NetID: Midterm Summer 2017
Full Name: NetID: Midterm Summer 2017 OAKLAND UNIVERSITY, School of Engineering and Computer Science CSE 564: Computer Architecture Please write and/or mark your answers clearly and neatly; answers that
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationFundamentals of Computer Systems
Fundamentals of Computer Systems Caches Martha A. Kim Columbia University Fall 215 Illustrations Copyright 27 Elsevier 1 / 23 Computer Systems Performance depends on which is slowest: the processor or
More informationDesign and Functional Verification of Four Way Set Associative Cache Controller
International Journal of Research in Computer and Communication Technology, Vol 4, Issue 3, March -2015 ISSN (Online) 2278-5841 ISSN (Print) 2320-5156 Design and Functional Verification of Four Way Set
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationCycle Time for Non-pipelined & Pipelined processors
Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all
More informationCSEE W4824 Computer Architecture Fall 2012
CSEE W4824 Computer Architecture Fall 2012 Lecture 8 Memory Hierarchy Design: Memory Technologies and the Basics of Caches Luca Carloni Department of Computer Science Columbia University in the City of
More informationECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories
ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-06 1 Processor and L1 Cache Interface
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationLECTURE 11. Memory Hierarchy
LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed
More informationCENG 3420 Computer Organization and Design. Lecture 08: Cache Review. Bei Yu
CENG 3420 Computer Organization and Design Lecture 08: Cache Review Bei Yu CEG3420 L08.1 Spring 2016 A Typical Memory Hierarchy q Take advantage of the principle of locality to present the user with as
More informationFPGA Implementation of A Pipelined MIPS Soft Core Processor
FPGA Implementation of A Pipelined MIPS Soft Core Processor Lakshmi S.S 1, Chandrasekhar N.S 2 P.G. Student, Department of Electronics and Communication Engineering, DBIT, Bangalore, India 1 Assistant
More informationThe CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas
The CPU Design Kit: An Instructional Prototyping Platform for Teaching Processor Design Anujan Varma, Lampros Kalampoukas Dimitrios Stiliadis, and Quinn Jacobson Computer Engineering Department University
More informationRECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA
RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA 1 HESHAM ALOBAISI, 2 SAIM MOHAMMED, 3 MOHAMMAD AWEDH 1,2,3 Department of Electrical and Computer Engineering, King Abdulaziz University
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationEEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?
EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology
More informationc. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?
Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationLocality. Cache. Direct Mapped Cache. Direct Mapped Cache
Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be
More informationSISTEMI EMBEDDED. Computer Organization Memory Hierarchy, Cache Memory. Federico Baronti Last version:
SISTEMI EMBEDDED Computer Organization Memory Hierarchy, Cache Memory Federico Baronti Last version: 20160524 Ideal memory is fast, large, and inexpensive Not feasible with current memory technology, so
More informationCENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.
Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationTopics. Digital Systems Architecture EECE EECE Need More Cache?
Digital Systems Architecture EECE 33-0 EECE 9-0 Need More Cache? Dr. William H. Robinson March, 00 http://eecs.vanderbilt.edu/courses/eece33/ Topics Cache: a safe place for hiding or storing things. Webster
More informationQ1: Finite State Machine (8 points)
Q1: Finite State Machine (8 points) Answer the questions below for the finite state machine in this diagram: 1. Complete the truth table shown below. (2 points) Input Output State In State Out S 0 = 00
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationComputer Systems Laboratory Sungkyunkwan University
Caches Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns
More informationPage 1. Memory Hierarchies (Part 2)
Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationEE414 Embedded Systems Ch 5. Memory Part 2/2
EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage
More informationTransistor: Digital Building Blocks
Final Exam Review Transistor: Digital Building Blocks Logically, each transistor acts as a switch Combined to implement logic functions (gates) AND, OR, NOT Combined to build higher-level structures Multiplexer,
More information