Application-Specific Design of Low Power Instruction Cache Hierarchy for Embedded Processors
|
|
- Rebecca Baker
- 5 years ago
- Views:
Transcription
1 Agenda Application-Specific Design of Low Power Instruction ache ierarchy for mbedded Processors Ji Gu Onodera Laboratory Department of ommunications & omputer ngineering Graduate School of Informatics Kyoto University 2 mbedded Systems: What are they? mbedded System is a special-purpose computer system designed to perform one or a few dedicated functions [Wikipedia] Processor based General processors Microcontrollers DSPs A subsystem Part of a larger system which it controls With special purpose In general, it does not provide programmability to users, as opposed to general purpose computer systems like P mbedded Systems: Where are they? verywhere! Our daily lives depend on embedded systems 3 4 Why use microprocessors? Some ommon haracteristics (1/2) 5 Microprocessors are very efficient: can use same logic to perform many different functions Microprocessors simplify the design of families of products Sonicare Plus toothbrush: 8-bit Zilog Z8 microprocessor BMW 745i: 53 8-bit microprocessor bit microprocessor 7 16-bit microprocessor NASA's Mars Rover: 8-bit Intel 8085 microprocessor annon OS 3: 3 32-bit microprocessor 6 While embedded systems cover a wide range of special-purpose systems, there are common characteristics Low cost Memory is typically small compared to a general purpose system Lightweight processors are used in embedded systems Low power Should consume low power especially in case of portable devices Low-power processors are used in embedded systems igh performance Playing video on portable devices, audio should be in sync with video Gaming gadgets of high performance Real-time property Job should be done within a time limit Aerospace applications, ar control systems, Medical gadgets are critical in terms of time constraint
2 Some ommon haracteristics (2/2) It is challenging to satisfy all the characteristics You may not be able to achieve high performance while maintaining low power consumption and making use of cheap components This talk focuses on the low power design of embedded systems Agenda 7 8 Processor power Research overview Processors in data centers consume 1.5% of the worldwide energy Processors and memories Typical components in embedded systems ores, control logic, on-chip caches, registers, off-chip caches/memories aches are energy-consuming due to instruction/data supply Typical for instruction cache due to frequent access Target System Architecture I ID X MM WB 3 DLI PU core I-cache 2 IP D-cache 1 IMem DMem Processor power Instruction supply power 1. Off-chip instruction bus: SBI design 2. Instruction cache: ROBTI design 3. Instruction fetch and decode: DLI design 9 10 Agenda Bus ncoding for Switching Reduction Transition-Zero (T0) (Benini1997) Gray ( Su1994) BI Bus-Invert (BI) (Stan1995) 11 12
3 Recent work: ode ompression Probe: Instruction Memory Data Bus? BI Switching Pattern: IMDB vs. IMAB vs. DMDB? BI Lv2003: dictionary-based opcode compression Petrov2004: major s compression - complex codec units - performance overhead Bit Switching Probability IMDB - Generally random, though not typical - BI encoding potentially applicable Probe: BI ncoding for IMDB Probe: IMDB Partition for BI ncoding BI encoding requirements: amming Distance (#bit switches) > bus bit width/2 Probability of BI Applicable NP [39..0] P1 [39..30] (Mibench.qsort) P2 [29..20] P3 [19..10] P4 [9..0] Op [39..32] Rs [31..24] Rt [23..16] Imm [15..0] To apply BI on IMDB, partition first Based on the instruction fields Instruction type of major s Partition as ing point, searching for most active bus lines Searching Bus Lines for BI Window-based Segment Search igh orrelation oefficient based: proposed in Shin2001, Partial Bus-Invert (PBI) coding for address bus Line Pairs orrelation: IMAB vs. IMDB Searching windows of different sizes and bit positions One segment for each partition P i-1 P i P i+1 or each window, use: δ = (D) + Dev(D) Larger (D) means more bus lines switch per bus cycle xplore amming Distance directly, window based searching Larger δ => better switching reduction 17 18
4 Results: Switching Reduction Results: Power valuation Average Red.[%] ontrol lines BI PBI MPBI SBI Power saving [%] Switching red. [%] Agenda Introduction Observations for embedded applications onsist of numerous basic s whose executions are of high locality Such locality can be exploited for cache tag reduction Objective: Reduced tag array & tag compare for low power A Reduced One-Bit Tag Instruction ache (ROBTI) Power optimization without performance sacrifice Design -- Tag size vs cache coverage 3. Design -- Dynamic cache coverage control ache coverage Definition: cache mappable address space of the main memory Overlapped cache coverage To avoid Ping-Pong effect and exploit locality of s ull tag cache has a full memory coverage 1-bit tag cache has a partial memory coverage: 1 or 2 or 3 Regions identified by cache coverage index: -index Dynamically changes with P address during program execution Overlapped 1 &2 Instructions for B2 can be retained when cache coverage changes from 1 to
5 Design -- ROBTI architecture Design -- ache coverage shift & detection Three consecutive coverages need differentiated for a Shift, which means 2 LSBs (-lops) of the full tag are sufficient With Gray-encoded -lops, adjacent coverages can be differentiated by 1 common value bit (V) and its bit position (VP) in a coverage eatures 1-bit tag for each cache entry ache operational control unit Uses 2 bits of P and current instruction type to control the cache operation Standard cache size of ROBTI: 32 Surveys of Benchmarks: > 90% s contain no more than Design -- Dynamic cache coverage control xperiments Three operation states Normal ache Access Like a traditional cache operation ache flush It invalidates all cache entries ache coverage shift (Shift) It moves the cache coverage to its neighboring region by offset of cache size Setup Processor MIP2 ISA Benchmark MiBench, Powerstone, some Kernel-like programs valuation Metrics Design cost, power Performance (hit rate) omparison onventional I-cache 5-bit partial tag compare Results Results 29 ROBTI with standard size of 32 Reduction: 30.9% of area and 2.1% of delay (normalized to full tag traditional I-cache) 30 ROBTI with standard size of 32 Performance: or most applications, can achieve the same hit rate as conventional cache Power reduction ROBTI: 25.8% averagely and 27.8% maximally PT-5: 1% averagely and 2.18% maximally
6 Agenda Research Problem aching decoded for most of s, including large, complicated and nested s to avoid repeated instruction fetching and decoding operations as much as possible Design Overview Design Approach I I/IDstall load DLI ID MUX X Xstall MM WB ardware/software o-design Using software to control the operation of DLI - to reduce the complexity of hardware design Xsrc Decoded Instruction Loop ache (DLI): Able to cache large, complicated s fficient: great energy savings and low overhead Using customized hardware design - to reduce the area and power consumption overhead Software Design ontrol low MUX inner s 35 36
7 37 38 inner s inner s MUX MUX inner s inner s 41 42
8 MUX ardware Design: ierarchical ache Table 43 inner s 44 opcode control word DLI Index Table flag operand c_index Instruction ormat Decoded Instruction Word ormat opcode ( a ) operand ( b ) branch memory target address ontrol Word Dictionary Table control word Branch ache Target Table dlic_index branch cache target address ( c ) DLI Overall Architecture xperimental Setup ISA (PISA) Application ASIPmeister Simplescalar G I-cache, Memory VDL (Syn.) VDL (Sim.) Object code ATI Synopsys Design ompiler DLI ModelSim W eval. area, energy, delay W/SW co-design execution trace SW eval. performance profiling Results: Reduction of Instr. fetch and decode Results: nergy consumption 100 DIB DLI adpcm bcnt blowfish crc32 des jpeg qsort rawcaudio rawdaudio rc4 rijndael salsa sha stringsearch AVG 48 Normalized energy consumption adpcm bcnt blowfish crc32 des jpeg qsort rawcaudio rawdaudio rc4 rijndael salsa sha stringsearch AVG
9 Results: Performance overhead Agenda 1.2 DIB DLI 49 Normalized execution cycles adpcm bcnt blowfish crc32 des jpeg qsort rawcaudio rawdaudio rc4 rijndael salsa sha stringsearch AVG 50 onclusions & uture work SBI: reducing instruction data bus switching power Little randomness/correlation can be exploited by existing bus encodings Profiling-based SBI gives more reduction and less overhead ROBTI: reducing instruction cache power A 1-bit tag cache for applications of high spatial/temporal locality Similar performance to full-tag cache, power/area reduced by 26% and 31% DLI: reducing instruction fetch/decode power Software-controlled SPM-like structure for decoded 66% (up to 87%) energy saved with performance overhead of 1.4% Potential xtension onsider bus coupling in SBI coding Loop cache design for s having procedure/function calls A framework involves these low-power techniques for design automation Thank you! 51 52
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors
Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Dan Nicolaescu Alex Veidenbaum Alex Nicolau Dept. of Information and Computer Science University of California at Irvine
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationEnergy Consumption Evaluation of an Adaptive Extensible Processor
Energy Consumption Evaluation of an Adaptive Extensible Processor Hamid Noori, Farhad Mehdipour, Maziar Goudarzi, Seiichiro Yamaguchi, Koji Inoue, and Kazuaki Murakami December 2007 Outline Introduction
More informationImproving Data Access Efficiency by Using Context-Aware Loads and Stores
Improving Data Access Efficiency by Using Context-Aware Loads and Stores Alen Bardizbanyan Chalmers University of Technology Gothenburg, Sweden alenb@chalmers.se Magnus Själander Uppsala University Uppsala,
More informationThe Pipelined RiSC-16
The Pipelined RiSC-16 ENEE 446: Digital Computer Design, Fall 2000 Prof. Bruce Jacob This paper describes a pipelined implementation of the 16-bit Ridiculously Simple Computer (RiSC-16), a teaching ISA
More informationReducing Instruction Fetch Cost by Packing Instructions into Register Windows
Reducing Instruction Fetch Cost by Packing Instructions into Register Windows Stephen Hines, Gary Tyson, David Whalley Computer Science Dept. Florida State University November 14, 2005 ➊ Introduction Reducing
More informationHardware Design I Chap. 10 Design of microprocessor
Hardware Design I Chap. 0 Design of microprocessor E-mail: shimada@is.naist.jp Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory
More informationLow-Power Data Address Bus Encoding Method
Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 15 April 1, 2009 martha@cs.columbia.edu and the rest of the semester Source code (e.g., *.java, *.c) (software) Compiler MIPS instruction set architecture
More informationEnhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension
Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information
More informationReference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses
Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Tony Givargis and David Eppstein Department of Information and Computer Science Center for Embedded Computer
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationMidterm #2 Solutions April 23, 1997
CS152 Computer Architecture and Engineering Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Sp97 D.K. Jeong Midterm #2 Solutions
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationLecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)
Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20
More informationMemory Bus Encoding for Low Power: A Tutorial
Memory Bus Encoding for Low Power: A Tutorial Wei-Chung Cheng and Massoud Pedram University of Southern California Department of EE-Systems Los Angeles CA 90089 Outline Background Memory Bus Encoding Techniques
More informationComputer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:
Computer Science and Engineering 331 Midterm Examination #1 Fall 2000 Name: Solutions S.S.#: 1 41 2 13 3 18 4 28 Total 100 Instructions: This exam contains 4 questions. It is closed book and notes. Calculators
More informationChapter 7. Microarchitecture. Copyright 2013 Elsevier Inc. All rights reserved.
Chapter 7 Microarchitecture 1 Figure 7.1 State elements of MIPS processor 2 Figure 7.2 Fetch instruction from memory 3 Figure 7.3 Read source operand from register file 4 Figure 7.4 Sign-extend the immediate
More informationECE 341. Lecture # 15
ECE 341 Lecture # 15 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 19, 2014 Portland State University Pipelining Structural Hazards Pipeline Performance Lecture Topics Effects of Stalls and Penalties
More informationare Softw Instruction Set Architecture Microarchitecture are rdw
Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationCISC 662 Graduate Computer Architecture. Classifying ISA. Lecture 3 - ISA Michela Taufer. In a CPU. From Source to Assembly Code
IS 662 Graduate omputer rchitecture Lecture 3 - IS Michela Taufer lassifying IS Powerpoint Lecture Notes from John Hennessy and David Patterson s: omputer rchitecture, 4th edition ---- dditional teaching
More informationImproving Program Efficiency by Packing Instructions into Registers
Improving Program Efficiency by Packing Instructions into Registers Stephen Hines, Joshua Green, Gary Tyson and David Whalley Florida State University Computer Science Dept. Tallahassee, Florida 32306-4530
More informationA Reconfigurable Functional Unit for an Adaptive Extensible Processor
A Reconfigurable Functional Unit for an Adaptive Extensible Processor Hamid Noori Farhad Mehdipour Kazuaki Murakami Koji Inoue and Morteza SahebZamani Department of Informatics, Graduate School of Information
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationAdaptive Reduced Bit-width Instruction Set Architecture (adapt-risa)
Adaptive Reduced Bit-width Instruction Set Architecture (adapt-) Sandro Neves Soares 1, Ashok Halambi 2, Aviral Shrivastava 3, Flávio Rech Wagner 4 and Nikil Dutt 2 1 Universidade de Caxias do Sul - Brazil,
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationChapter 13 Reduced Instruction Set Computers
Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining
More informationLECTURE 5. Single-Cycle Datapath and Control
LECTURE 5 Single-Cycle Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor.
More informationFLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES IMPROVING PROCESSOR EFFICIENCY THROUGH ENHANCED INSTRUCTION FETCH STEPHEN R.
FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES IMPROVING PROCESSOR EFFICIENCY THROUGH ENHANCED INSTRUCTION FETCH By STEPHEN R. HINES A Dissertation submitted to the Department of Computer Science
More informationEnergy and Thermal Aware Buffer Cache Replacement Algorithm
MSST 2010 nergy and Thermal Aware uffer ache Replacement Algorithm ianhui Yue, Yifeng Zhu, Zhao ai, and Lin Lin lectrical and omputer ngineering University of Maine Research Summary Memory power consumption
More informationA Study of Reconfigurable Split Data Caches and Instruction Caches
A Study of Reconfigurable Split Data Caches and Instruction Caches Afrin Naz Krishna Kavi Philip Sweany Wentong Li afrin@cs.unt.edu kavi@cse.unt.edu Philip@cse.unt.edu wl@cs.unt.edu Department of Computer
More informationCOMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: A Based on P&H Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined
More informationEfficient Power Reduction Techniques for Time Multiplexed Address Buses
Efficient Power Reduction Techniques for Time Multiplexed Address Buses Mahesh Mamidipaka enter for Embedded omputer Systems Univ. of alifornia, Irvine, USA maheshmn@cecs.uci.edu Nikil Dutt enter for Embedded
More informationComputer Organization and Components
2 Course Structure Computer Organization and Components Module 4: Memory Hierarchy Module 1: Logic Design IS1500, fall 2014 Lecture 4: and F1 DC Ö1 F2 DC Ö2 F7b Lab: dicom F8 Module 2: C and Associate
More informationENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design
ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2014 Sources: Computer
More informationPower Protocol: Reducing Power Dissipation on Off-Chip Data Buses
Power Protocol: Reducing Power Dissipation on Off-Chip Data Buses K. Basu, A. Choudhary, J. Pisharath ECE Department Northwestern University Evanston, IL 60208, USA fkohinoor,choudhar,jayg@ece.nwu.edu
More informationRui Wang, Assistant professor Dept. of Information and Communication Tongji University.
Instructions: ti Language of the Computer Rui Wang, Assistant professor Dept. of Information and Communication Tongji University it Email: ruiwang@tongji.edu.cn Computer Hierarchy Levels Language understood
More informationShift Invert Coding (SINV) for Low Power VLSI
Shift Invert oding (SINV) for Low Power VLSI Jayapreetha Natesan* and Damu Radhakrishnan State University of New York Department of Electrical and omputer Engineering New Paltz, NY, U.S. email: natesa76@newpaltz.edu
More informationDepartment of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.
Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline
More informationMicroprocessors and Microcontrollers. Assignment 1:
Microprocessors and Microcontrollers Assignment 1: 1. List out the mass storage devices and their characteristics. 2. List the current workstations available in the market for graphics and business applications.
More information(1) Using a different mapping scheme will reduce which type of cache miss? (1) Which type of cache miss can be reduced by using longer lines?
(1) Give a one-word definition of coherence. (1) Give a one-word definition of consistency. (1) Using a different mapping scheme will reduce which type of cache miss? (1) Which type of cache miss can be
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationTiny Instruction Caches For Low Power Embedded Systems
Tiny Instruction Caches For Low Power Embedded Systems ANN GORDON-ROSS, SUSAN COTTERELL and FRANK VAHID University of California, Riverside Instruction caches have traditionally been used to improve software
More informationEECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction
EECS150 - Digital Design Lecture 10- CPU Microarchitecture Feb 18, 2010 John Wawrzynek Spring 2010 EECS150 - Lec10-cpu Page 1 Processor Microarchitecture Introduction Microarchitecture: how to implement
More informationLoop Instruction Caching for Energy-Efficient Embedded Multitasking Processors
Loop Instruction Caching for Energy-Efficient Embedded Multitasking Processors Ji Gu, Tohru Ishihara and Kyungsoo Lee Department of Communications and Computer Engineering Graduate School of Informatics
More informationLecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures
Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures ISA-level Tradeoffs: Instruction Length Fixed length: Length of all instructions the same + Easier to decode single instruction
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationBus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao
Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor
More information/ / / Net Speedup. Percentage of Vectorization
Question (Amdahl Law): In this exercise, we are considering enhancing a machine by adding vector hardware to it. When a computation is run in vector mode on the vector hardware, it is 2 times faster than
More informationMulti-core Programming Evolution
Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution
More informationComputer & Microprocessor Architecture HCA103
Computer & Microprocessor Architecture HCA103 Cache Memory UTM-RHH Slide Set 4 1 Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type
More informationProcessing Unit CS206T
Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct
More informationSpace-Time Tradeoffs in Software-Based Deep Packet Inspection
Space-Time Tradeoffs in Software-ased eep Packet Inspection nat remler-arr I Herzliya, Israel Yotam Harchol avid Hay Hebrew University, Israel. OWSP Israel 2011 (Was also presented in I HPSR 2011) Parts
More informationECE 154A Introduction to. Fall 2012
ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 4: Arithmetic and Data Transfer Instructions Agenda Review of last lecture Logic and shift instructions Load/store instructionsi
More informationThe Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined
More informationECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.
This exam is open book and open notes. You have 2 hours. Problems 1-5 refer to the following: We wish to add a new R-Format instruction to the MIPS Instruction Set Architecture called l_inc (load and increment).
More informationThe Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationComputer Systems Architecture Spring 2016
Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,
More informationPart II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1
Part II Instruction-Set Architecture Jan. 211 Computer Architecture, Instruction-Set Architecture Slide 1 Short review of the previous lecture Performance = 1/(Execution time) = Clock rate / (Average CPI
More informationXor-Masking: A Novel Statistical Method for Instruction Read Energy Reduction in Contemporary SRAM Technologies
Tampere University of Technology Xor-Masking: A Novel Statistical Method for Read Energy Reduction in Contemporary SRAM Technologies Citation Multanen, J., Viitanen, T., Jääskeläinen, P., & Takala, J.
More informationOperating system integrated energy aware scratchpad allocation strategies for multiprocess applications
University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic
More informationAnand Raghunathan
ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,
More informationImproving Adaptability and Per-Core Performance of Many-Core Processors Through Reconfiguration
Int J Parallel Prog (2010) 38:203 224 DOI 10.1007/s10766-010-0128-3 Improving Adaptability and Per-Core Performance of Many-Core Processors Through Reconfiguration Tameesh Suri Aneesh Aggarwal Received:
More informationECE 486/586. Computer Architecture. Lecture # 7
ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationComputers and Microprocessors. Lecture 34 PHYS3360/AEP3630
Computers and Microprocessors Lecture 34 PHYS3360/AEP3630 1 Contents Computer architecture / experiment control Microprocessor organization Basic computer components Memory modes for x86 series of microprocessors
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationFrom CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations
1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationCS3350B Computer Architecture MIPS Instruction Representation
CS3350B Computer Architecture MIPS Instruction Representation Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada
More informationLecture 4: MIPS Instruction Set
Lecture 4: MIPS Instruction Set No class on Tuesday Today s topic: MIPS instructions Code examples 1 Instruction Set Understanding the language of the hardware is key to understanding the hardware/software
More informationInstruction Set Design
Instruction Set Design software instruction set hardware CPE442 Lec 3 ISA.1 Instruction Set Architecture Programmer's View ADD SUBTRACT AND OR COMPARE... 01010 01110 10011 10001 11010... CPU Memory I/O
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationOptimal Cache Organization using an Allocation Tree
Optimal Cache Organization using an Allocation Tree Tony Givargis Technical Report CECS-2-22 September 11, 2002 Department of Information and Computer Science Center for Embedded Computer Systems University
More informationEmbedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institution of Technology, Delhi
Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institution of Technology, Delhi Lecture - 34 Compilers for Embedded Systems Today, we shall look at the compilers, which
More informationCMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1
CMCS 611-101 Advanced Computer Architecture Lecture 9 Pipeline Implementation Challenges October 5, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Computer Architecture
More informationComputer Architecture EE 4720 Final Examination
Name Computer Architecture EE 4720 Final Examination 10 May 2014, 10:00 12:00 CDT Alias Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Exam Total (15 pts) (15 pts) (15 pts) (15 pts)
More informationHandout 4 Memory Hierarchy
Handout 4 Memory Hierarchy Outline Memory hierarchy Locality Cache design Virtual address spaces Page table layout TLB design options (MMU Sub-system) Conclusion 2012/11/7 2 Since 1980, CPU has outpaced
More informationComputer Organization. Structure of a Computer. Registers. Register Transfer. Register Files. Memories
Computer Organization Structure of a Computer Computer design as an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + Control = finite
More informationEECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer
EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM
More informationCharacteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram
Microprocessor Design & Organisation HCA2102 Cache Memory Characteristics Location Unit of transfer Access method Performance Physical type Physical Characteristics UTM-RHH Slide Set 5 2 Location Internal
More informationMath 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro
Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS
More informationInitial Representation Finite State Diagram. Logic Representation Logic Equations
Control Implementation Alternatives Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently;
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationCS 251, Winter 2018, Assignment % of course mark
CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationRISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.
COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped
More informationLecture 10: Simple Data Path
Lecture 10: Simple Data Path Course so far Performance comparisons Amdahl s law ISA function & principles What do bits mean? Computer math Today Take QUIZ 6 over P&H.1-, before 11:59pm today How do computers
More informationWilliam Stallings Computer Organization and Architecture 8th Edition. Cache Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 4 Cache Memory Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics
More informationImproving the Energy and Execution Efficiency of a Small Instruction Cache by Using an Instruction Register File
Improving the Energy and Execution Efficiency of a Small Instruction Cache by Using an Instruction Register File Stephen Hines, Gary Tyson, David Whalley Computer Science Dept. Florida State University
More informationMath 230 Assembly Programming (AKA Computer Organization) Spring 2008
Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro II Lect 10 Feb 15, 2008 Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L10.1
More informationA Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal
More informationComputer Organization
Computer Organization! Computer design as an application of digital logic design procedures! Computer = processing unit + memory system! Processing unit = control + datapath! Control = finite state machine
More informationProject 2: Pipelining (10%) Purpose. Pipelines. ENEE 646: Digital Computer Design, Fall 2017 Assigned: Wednesday, Sep 6; Due: Tuesday, Oct 3
Project 2: Pipelining (10%) ENEE 646: Digital Computer Design, Fall 2017 Assigned: Wednesday, Sep 6; Due: Tuesday, Oct 3 Purpose This project is intended to help you understand in detail how a pipelined
More information