CEC 450 Real-Time Systems

Size: px
Start display at page:

Download "CEC 450 Real-Time Systems"

Transcription

1 CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert

2 A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation Time Sensed (From Release to Response) Event Sensed Interrupt Dispatch Preemption Dispatch Completion (IO Queued) Actuation (IO Completion) Interference Time Input-Latency Execution Dispatch-Latency Execution Output-Latency Sam Siewert 2

3 Service Response Timeline (With Intermediate Blocking) Response Time = Time Actuation Time Sensed (From Event to Response) Event Sensed Interrupt Dispatch Preemption Dispatch Completion (IO Queued) Actuation (IO Completion) Time Input Latency Dispatch Latency Execution Interference Execution Output Latency Event Sensed Interrupt Dispatch Yield Dispatch Preemption Dispatch Completion (IO Queued) Actuation (IO Completion) Time Input Latency Dispatch Latency Execution BLOCKING Execution Interference Execution Output Latency Sam Siewert 3

4 Service Response Timeline - WCET (With Intermediate I/O Impacts BCET to Cause WCET) Response Time = Time Actuation Time Sensed (From Event to Response) Event Sensed Interrupt Dispatch Preemption Dispatch Completion (IO Queued) Actuation (IO Completion) Time Input Latency Dispatch Latency Execution Interference Execution Output Latency Event Sensed Interrupt Dispatch Preemption Dispatch Completion (IO Queued) Actuation (IO Completion) Input Latency Dispatch Latency Execution MMIO (Stalls) Execution Interference Execution Output Latency Time Sam Siewert 4

5 Services and I/O Initial Input I/O from Sensors Low-Rate Input: Byte or Word Input from Memory-Mapped Registers or I/O Ports High-Rate Input: Block Input from DMA Channels and Block Reads from FIFOs Final Response I/O to Actuators Low-Rate Output: Byte or Word Writes Posted to Output FIFO or Bus Interface Queue High-Rate Output: Block Output on DMA Channels and Block Writes Intermediate I/O During Service Execution, Memory Mapped Register I/O External Memory I/O Cache Loads and Write-Backs In Memory Input/Output from Service to Service Sam Siewert 5

6 Service Integration Concepts Initial Sensor Input and Final Actuator Response Output Device Interface Drivers Simple Service Has No Intermediate I/O Latency, No Synchronization Sensor Input Interface Digital Control Law Service Actuator Output Interface Sensor Electronics Sensors Effectors Actuator Electronics Environment Sam Siewert 6

7 Complex Multi-Service Systems Multiple Services Synchronization Between Services Communication Between Services Multiple Sensor Input and Actuator Output Interfaces Intermediate I/O, Shared Memory, Messaging Sam Siewert 7

8 Multi-Service Pipelines framerdy interrupt RT Management Event releases / Service control framerdy Control Plane tbtvid (30 Hz) RGB32 buffer tfrmdisp (10 Hz) grayscale buffer tfrmlnk (2 Hz) frame Remote Display tnet (2 Hz) Data Plane P0 topnav (15 Hz) ttlmlnk (5 Hz) TCP/IP pkts P1 tthrustctl (15 Hz) P2 Bt878 PCI NTSC decoder (30 Hz) tcamctl (3 Hz) Serial A/B ISA ethernet Sam Siewert 8 P3 SW HW

9 Pipelined Architecture Review Recall that Pipeline Yields CPI of 1 or Less Instruction Completed Each CPU Clock Unless Pipeline Stalls! IF ID Execute Write- Back IF ID Execute Write- Back IF ID Execute Write- Back IF ID Execute Write- Back IF ID Execute Write- Back IF ID Execute Write- Back Sam Siewert 9

10 Service Execution [ ] WCET = ( CPI Longest _ Path _ Inst _ Count) + Stall _ Cycles Clk _ Period best case Efficiency of Execution Memory Access for Inter-Service Communication Bounded Intermediate I/O Ideally Lock Data and Code into Cache or Use Tightly Coupled Memory [Scratch Pad, Zero Wait State] Sam Siewert 10

11 Service Efficiency Path Length for a Release Instruction Count Longest Path Given Algorithm Branches Loop Iterations Data Driven? Path Execution Number of Cycles to Complete Path Clocks Per Instruction Number of Stall Cycles for Pipeline Data Dependencies (Intermediate IO) Cache Misses (Intermediate Memory Access) Branch Mis-predictions (Small Penalty) Sam Siewert 11

12 Hiding Intermediate IO Latency (Overlapping CPU and Bus I/O) ICT = Instruction Count Time Time to execute a block of instructions with no stalls CPU Cycles x CPU Clock Period IOT = Bus Interface IO Time Bus IO Cycles x Bus Clock Period OR = Overlap Required Percentage of CPU cycles that must be concurrent with I/O cycles NOA = Non-Overlap Allowable for Si to meet Di Percentage of CPU cycles that can be in addition to IO cycle time without missing service deadline Di = Deadline for Service S i relative to release interrupt or system call initiates S i request and S i execution CPI = Clocks Per Instruction for a block of instructions IPC is Instructions Per Clock, Also Used, Just the Inverse (Superscalar, Pipelined) Sam Siewert 12

13 Processing and I/O Time Overlap Di >= IOT is required; otherwise if Di < IOT, Si is IO-Bound Di >= ICT is required; otherwise if Di < ICT, Si is CPU-Bound Di >= (IOT + ICT) requires no overlap of IOT with ICT if Di < (IOT + ICT) where (Di >= IOT and Di >= ICT), overlap of IOT with ICT is required if Di < (IOT + ICT) where (Di < IOT or Di < ICT), deadline Di can t be met regardless of overlap Sam Siewert 13

14 Service Execution Efficiency Waiting on I/O from MMIO Bus and Memory CPI worst-case = (ICT + IOT) / ICT ICT IOT CPI best-case = (max(ict, IOT)) / ICT ICT IOT CPI required = D i / ICT OR = 1 - [(D i - IOT) / ICT] ICT ICT OR IOT CPI required = [ICT(1-OR) + IOT] / ICT NOA = (D i IOT) / ICT; OR + NOA = 1 (by definition) Sam Siewert 14

15 Synchronization and Message Message Queues Provide Communication Provide Synchronization Traditional Message Queue Message Size Internal Fragmentation Heap Message Queue Messages Are Pointers Message Data in Heap Passing S1 Sam Siewert 15 Msg Msg Heap Buffer Pool S2 S1 Ptr S2 Allocation Of Message Heap ENEA OSE Message Passing RTOS Advertised Advantages De-Allocation Of Message Heap

16 What About Storage I/O Much Higher Latency than Off-Chip Memory or Bus MMIO Milliseconds of Latency Compared to GHz Core Clock Rates 1 million to 1! Flash Better, But Still Slower than SRAM or DRAM Option #1 Avoid Disk Drives and Flash I/O Option #2 Pre-Buffer Storage I/O (Read-Ahead Cache), Post Write-backs (Write-Back Cache), MFU/MRU (Most Frequently Used / Most Recently Used) Kept in Read Cache Sam Siewert 16

17 Parallel Processing Speed-up (Makes I/O Latency Worse?) Grid Data Processing Speed-up 1. Multi-Core, Multi-threaded, Macro-blocks/Frames 2. SIMD, Vector Instructions Operating over Large Words (Many Times Instruction Set Size) 3. Co-Processor Operates in Parallel to CPU(s) SPMD GPU or GP-GPU Co-Processor PCI-Express Bus Interfaces Transfer Program and Data to Co-Processor Threads and Blocks to Transform Data Concurrently Image Data Processing Few Data Dependencies Good Speed-up by Amdahl s Law Max _ Speed P=Parallel Portion (1-P)=Sequential Portion S=# of Cores (Concurrency) Overhead for Co-Processor IO for Co-Processing _ Up Multicore _ Speed _ Up = S is infinite here 1 = (1 P) (1 P) + Sam Siewert 17 P / S

18 Amdahl s Law Infinite Cores Maximum Speed-up Driven by Sequential and Parallel Portions of Program P = Parallel Portion (1-P) = Sequential Portion Speed-up for Given Multi-core Architecture Function of # of Cores (Speed-up in Parallel Portions) All Code Parallel (Infinite Speed-up) 95% Parallel (20x Speed-up) Amdahl's Law Max Speed-up (Any Number of Processor Cores) Algorithm Speed Up E Sequential Portion (% Computation in Sequential vs. Parallel Execution) Max Speed-up No Parallel Portion All Sequential (No Speed-up) Sam Siewert 18

19 Multi-Core Speed-Up Amdahl's Law - Speed-up with # Cores and Parallel Portion % Parallel Program 14 Speed-up Max Speed-up 2 cores 4 cores 8 cores 12 cores 32 cores Sequential Portion of Algorithm Sam Siewert 19

20 Hiding Storage I/O Latency Overlapping with Processing Simple Design Each Thread has READ, PROCESS, WRITE-BACK Execution READ F(1) Process F(1) Write-back F(1) READ F(2) Frame rate is READ+PROCESS+WRITE latency e.g. 10 fps for 100 milliseconds If READ is 70 msec, PROCESS is 10 msec, and WRITE-BACK 20 msec, predominate time is IO time, not processing Disk drive with 100 MB/sec READ rate can only read 16 fps, 62.5 msec READ latency Sam Siewert 20

21 Hiding Storage I/O Latency Schedule Multiple Overlapping Threads? READ F 1 Process F 1 Write-back F 1 READ F 4 Process F 4 Write-back F 4 READ F 2 Process F 2 Write-back F 2 READ F 5 Process F 5 READ F 3 Process F 3 Write-back F 3 Read F 6 Start-up Core #1 Continuous Processing Core #1 Continuous Processing READ F 1 Process F 1 Write-back F 1 READ F 4 Process F 4 Write-back F 4 READ F 2 Process F 2 Write-back F 2 READ F 5 Process F 5 READ F 3 Process F 3 Write-back F 3 Read F 6 Start-up Core #2 Continuous Processing Core #2 Continuous Processing Requires N threads = N stages x N cores 1.5 to 2x Number of Threads for SMT (Hyper-threading) For IO Stage Duration Similar to Processing Time More Threads if IO Time (Read+WB+Read) >> 3 x Processing Time Sam Siewert 21

22 Hiding Latency Dedicated I/O Schedule Reads Ahead of Processing Read F 1 Read F 2 Read F 3 Read F 4 Read F 5 Read F 6 Read F 7 Read F 8 Wait Process F 1 Process F 3 Process F 5 Wait Process F 2 Process F 4 Process F 6 Wait WB F 1 WB F 2 WB F 3 WB F 4 WB F 5 WB F 6 Start-up Dual-Core Concurrent Processing Completion Requires N threads = 2 + N cores Synchronize Frame Ready/Write-backs Balance Stage Read/Write-Back Latency to Processing 1.5 to 2x Threads for SMT (Hyper-threading) Sam Siewert 22

23 Processing Latency Alone Write Code with Memory Resident Frames Load Frames in Advance Process In-Memory Frames Over and Over Do No IO During Processing Provides Baseline Measurement of Processing Latency per Frame Alone Provides Method of Optimizing Processing Without IO Latency Sam Siewert 23

24 I/O Latency Alone Comment Out Frame Transformation Code or Call Stubbed NULL Function Provides Measurement of IO Frame Rate Alone Essentially Zero Latency Transform No Change Between Input Frames and Output Frames Allows for Tuning of IO Scheduler and Threading Sam Siewert 24

25 Tips for Linux I/O Scheduling blockdev --getra /dev/sda Should return 256 Means that reads read-ahead up to 128K Function calls read, fread should request as much as possible Check actual bytes read, re-read as needed in a loop blockdev --setra /dev/sda (8MB) Switch CFQ to Deadline Use lsscsi to verify your disk is /dev/sda substitue block driver interface used for file system if not sda cat /sys/block/sda/queue/scheduler echo deadline > /sys/block/sda/queue/scheduler Options are noop, cfq, deadline Sam Siewert 25

CS A490 Digital Media and Interactive Systems

CS A490 Digital Media and Interactive Systems CS A490 Digital Media and Interactive Systems Lecture 11 Thread Scaling and I/O Threading and Async I/O on Linux October 30, 2013 Sam Siewert Parallel Processing Speed-up Grid Data Processing Speed-up

More information

S is infinite here 1 =

S is infinite here 1 = Lecture 12 ECEN 5653 CPU & IO Threading, Scaling, and Speed-up April 7, 2008 Sam Siewert Reminders Help Sessions E-mail siewerts@colorado.edu with ECEN5033 DEBUG in Subject Choose Meeting Date and Time

More information

CSE A215 Assembly Language Programming for Engineers

CSE A215 Assembly Language Programming for Engineers CSE A215 Assembly Language Programming for Engineers Lecture 13 Storage and I/O (MMIO, Devices, Reliability/Availability, Performance) 20 November 2012 Sam Siewert Hardware/Software Interface for I/O Basics

More information

Lecture 11 ECEN 5653

Lecture 11 ECEN 5653 Lecture 11 ECEN 5653 Code Configuration Management and Version Control, User-Space Debug And Performance Optimizations April 17, 2012 Sam Siewert Overview NAB Show This Week - http://www.nabshow.com/ Viral

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 3 Real-Time Service Implementation Part 1 September 7, 2018 Sam Siewert More Embedded Systems Summer - Analog, Digital, Firmware, Software Reasons to Consider Catch up

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 1 Introduction Part 2 August 27, 2018 Sam Siewert Questions on Final Projects? Examples here - http://mercury.pr.erau.edu/~siewerts/cec450/documents/video/ Creative Projects

More information

SE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert

SE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert SE300 SWE Practices Lecture 10 Introduction to Event- Driven Architectures Tuesday, March 17, 2015 Sam Siewert Copyright {c} 2014 by the McGraw-Hill Companies, Inc. All rights Reserved. Four Common Types

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

Performance and Optimization Issues in Multicore Computing

Performance and Optimization Issues in Multicore Computing Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 7 Review October 9, 2017 Sam Siewert Coming Next Finish Up with Recount of Mars Pathfinder and Unbounded Priority Inversion Mike Jone s Page (Microsoft) Glenn Reeves on

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 What is an Operating System? What is

More information

2 nd Half. Memory management Disk management Network and Security Virtual machine

2 nd Half. Memory management Disk management Network and Security Virtual machine Final Review 1 2 nd Half Memory management Disk management Network and Security Virtual machine 2 Abstraction Virtual Memory (VM) 4GB (32bit) linear address space for each process Reality 1GB of actual

More information

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK SUBJECT : CS6303 / COMPUTER ARCHITECTURE SEM / YEAR : VI / III year B.E. Unit I OVERVIEW AND INSTRUCTIONS Part A Q.No Questions BT Level

More information

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design SRAMs to Memory Low Power VLSI System Design Lecture 0: Low Power Memory Design Prof. R. Iris Bahar October, 07 Last lecture focused on the SRAM cell and the D or D memory architecture built from these

More information

Lecture 25: Board Notes: Threads and GPUs

Lecture 25: Board Notes: Threads and GPUs Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

Introduction to Computing Systems Terminology Guide

Introduction to Computing Systems Terminology Guide Introduction to Computing Systems Terminology Guide Sam Siewert January 12, 2014 Sam Siewert ADC - Analog to Digital Converter, encodes analog signals into digital values. ALU Arithmetic Logic Unit, the

More information

CSE A215 Assembly Language Programming for Engineers

CSE A215 Assembly Language Programming for Engineers CSE A215 Assembly Language Programming for Engineers Lecture 4 & 5 Logic Design Review (Chapter 3 And Appendices C&D in COD CDROM) September 20, 2012 Sam Siewert ALU Quick Review Conceptual ALU Operation

More information

Pipelining to Superscalar

Pipelining to Superscalar Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel

More information

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.

Memory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit. Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 10 Device Interface Drivers and MMIO October 29, 2015 Sam Siewert MMIO Interfacing to Off-Chip Devices Sam Siewert 2 Embedded I/O (HW View) Analog I/O DAC analog output:

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 2 Introduction Part 1 August 31, 2015 Sam Siewert So Why SW for HRT Systems? ASIC and FPGA State-Machine Solutions Offer Hardware Clocked Deterministic Solutions FPGAs

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4. Chapter 4: CPU 4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.8 Control hazard 4.14 Concluding Rem marks Hazards Situations that

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three level scheduling 2 1 Types of Scheduling 3 Long- and Medium-Term Schedulers Long-term scheduler Determines which programs

More information

Assuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline?

Assuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline? 1. Imagine we have a non-pipelined processor running at 1MHz and want to run a program with 1000 instructions. a) How much time would it take to execute the program? 1 instruction per cycle. 1MHz clock

More information

A Predictable RTOS. Mantis Cheng Department of Computer Science University of Victoria

A Predictable RTOS. Mantis Cheng Department of Computer Science University of Victoria A Predictable RTOS Mantis Cheng Department of Computer Science University of Victoria Outline I. Analysis of Timeliness Requirements II. Analysis of IO Requirements III. Time in Scheduling IV. IO in Scheduling

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Written Exam / Tentamen

Written Exam / Tentamen Written Exam / Tentamen Computer Organization and Components / Datorteknik och komponenter (IS1500), 9 hp Computer Hardware Engineering / Datorteknik, grundkurs (IS1200), 7.5 hp KTH Royal Institute of

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Summary of Computer Architecture

Summary of Computer Architecture Summary of Computer Architecture Summary CHAP 1: INTRODUCTION Structure Top Level Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Profiling: Understand Your Application

Profiling: Understand Your Application Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

TDDD07 Real-time Systems Lecture 10: Wrapping up & Real-time operating systems

TDDD07 Real-time Systems Lecture 10: Wrapping up & Real-time operating systems TDDD07 Real-time Systems Lecture 10: Wrapping up & Real-time operating systems Simin Nadjm-Tehrani Real-time Systems Laboratory Department of Computer and Information Science Linköping Univerity 28 pages

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

EECS 571 Principles of Real-Time Embedded Systems. Lecture Note #10: More on Scheduling and Introduction of Real-Time OS

EECS 571 Principles of Real-Time Embedded Systems. Lecture Note #10: More on Scheduling and Introduction of Real-Time OS EECS 571 Principles of Real-Time Embedded Systems Lecture Note #10: More on Scheduling and Introduction of Real-Time OS Kang G. Shin EECS Department University of Michigan Mode Changes Changes in mission

More information

EECS 452 Lecture 9 TLP Thread-Level Parallelism

EECS 452 Lecture 9 TLP Thread-Level Parallelism EECS 452 Lecture 9 TLP Thread-Level Parallelism Instructor: Gokhan Memik EECS Dept., Northwestern University The lecture is adapted from slides by Iris Bahar (Brown), James Hoe (CMU), and John Shen (CMU

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System Abstract In this paper, we address the latency issue in RT- XEN virtual machines that are available in Xen 4.5. Despite the advantages of applying virtualization to systems, the default credit scheduler

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Multiprocessor and Real-Time Scheduling. Chapter 10

Multiprocessor and Real-Time Scheduling. Chapter 10 Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

CSE 4/521 Introduction to Operating Systems. Lecture 12 Main Memory I (Background, Swapping) Summer 2018

CSE 4/521 Introduction to Operating Systems. Lecture 12 Main Memory I (Background, Swapping) Summer 2018 CSE 4/521 Introduction to Operating Systems Lecture 12 Main Memory I (Background, Swapping) Summer 2018 Overview Objective: 1. To provide a detailed description of various ways of organizing memory hardware.

More information

Comp 204: Computer Systems and Their Implementation. Lecture 18: Devices

Comp 204: Computer Systems and Their Implementation. Lecture 18: Devices Comp 204: Computer Systems and Their Implementation Lecture 18: Devices 1 Today Devices Introduction Handling I/O Device handling Buffering and caching 2 Operating System An Abstract View User Command

More information

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 9: Multithreading

More information

Lecture 14: Memory Hierarchy. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 14: Memory Hierarchy. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 14: Memory Hierarchy James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L14 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Your goal today Housekeeping understand memory system

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 11 Memory Hierarchy - I Israel Koren ECE568/Koren Part.11.1 ECE568/Koren Part.11.2 Ideal Memory

More information

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5 OPERATING SYSTEMS CS3502 Spring 2018 Processor Scheduling Chapter 5 Goals of Processor Scheduling Scheduling is the sharing of the CPU among the processes in the ready queue The critical activities are:

More information

Architectures for Instruction-Level Parallelism

Architectures for Instruction-Level Parallelism Low Power VLSI System Design Lecture : Low Power Microprocessor Design Prof. R. Iris Bahar October 0, 07 The HW/SW Interface Seminar Series Jointly sponsored by Engineering and Computer Science Hardware-Software

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Overall Structure of RT Systems

Overall Structure of RT Systems Course Outline Introduction Characteristics of RTS Real Time Operating Systems (RTOS) OS support: scheduling, resource handling Real Time Programming Languages Language support, e.g. Ada tasking Scheduling

More information

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of

More information

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions

More information

Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Active thread Idle thread

Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Active thread Idle thread Intra-Warp Compaction Techniques Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Goal Active thread Idle thread Compaction Compact threads in a warp to coalesce (and eliminate)

More information

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware. Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline

More information

COT 4600 Operating Systems Fall Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM

COT 4600 Operating Systems Fall Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM Lecture 23 Attention: project phase 4 due Tuesday November 24 Final exam Thursday December 10 4-6:50

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Computer Systems Assignment 4: Scheduling and I/O

Computer Systems Assignment 4: Scheduling and I/O Autumn Term 018 Distributed Computing Computer Systems Assignment : Scheduling and I/O Assigned on: October 19, 018 1 Scheduling The following table describes tasks to be scheduled. The table contains

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Multiprocessor and Real- Time Scheduling. Chapter 10

Multiprocessor and Real- Time Scheduling. Chapter 10 Multiprocessor and Real- Time Scheduling Chapter 10 Classifications of Multiprocessor Loosely coupled multiprocessor each processor has its own memory and I/O channels Functionally specialized processors

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

CSE 153 Design of Operating Systems Fall 2018

CSE 153 Design of Operating Systems Fall 2018 CSE 153 Design of Operating Systems Fall 2018 Lecture 12: File Systems (1) Disk drives OS Abstractions Applications Process File system Virtual memory Operating System CPU Hardware Disk RAM CSE 153 Lecture

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

What Operating Systems Do An operating system is a program hardware that manages the computer provides a basis for application programs acts as an int

What Operating Systems Do An operating system is a program hardware that manages the computer provides a basis for application programs acts as an int Operating Systems Lecture 1 Introduction Agenda: What Operating Systems Do Computer System Components How to view the Operating System Computer-System Operation Interrupt Operation I/O Structure DMA Structure

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

EC EMBEDDED AND REAL TIME SYSTEMS

EC EMBEDDED AND REAL TIME SYSTEMS EC6703 - EMBEDDED AND REAL TIME SYSTEMS Unit I -I INTRODUCTION TO EMBEDDED COMPUTING Part-A (2 Marks) 1. What is an embedded system? An embedded system employs a combination of hardware & software (a computational

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

(b) External fragmentation can happen in a virtual memory paging system.

(b) External fragmentation can happen in a virtual memory paging system. Alexandria University Faculty of Engineering Electrical Engineering - Communications Spring 2015 Final Exam CS333: Operating Systems Wednesday, June 17, 2015 Allowed Time: 3 Hours Maximum: 75 points Note:

More information

Computer Organization ECE514. Chapter 5 Input/Output (9hrs)

Computer Organization ECE514. Chapter 5 Input/Output (9hrs) Computer Organization ECE514 Chapter 5 Input/Output (9hrs) Learning Outcomes Course Outcome (CO) - CO2 Describe the architecture and organization of computer systems Program Outcome (PO) PO1 Apply knowledge

More information

ECE 341. Lecture # 15

ECE 341. Lecture # 15 ECE 341 Lecture # 15 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 19, 2014 Portland State University Pipelining Structural Hazards Pipeline Performance Lecture Topics Effects of Stalls and Penalties

More information