Statically Calculating Secondary Thread Performance in ASTI Systems

Size: px
Start display at page:

Download "Statically Calculating Secondary Thread Performance in ASTI Systems"

Transcription

1 Statically Calculating Secondary Thread Performance in ASTI Systems Siddhartha Shivshankar, Sunil Vangara and Alex Guimarães Dean Center for Embedded Systems Research Department of Electrical and Computer Engineering North Carolina State University 1

2 Overview ASTI: Asynchronous Software Thread Integration Register File Partitioning Experiments Conclusions 2

3 Basic Idea of ASTI Goal: recover fine-grain idle-time for use by other threads Examine program to find a function f with significant internal idle time Idle time is imposed by instruction-level timing requirements (e.g. for input, output instructions) If an idle time piece n is coarse-grain (T Idle (f,n) >> 2*T ContextSwitch ), then we can recover it efficiently with context switching If it is fine grain (T Idle (f,n)!>> 2*T ContextSwitch ), then apply ASTI (Asynchronous Software Thread Integration) Details of ASTI in LCTES 2004, CASES

4 ASTI Applied to Communication Protocols Executive ReceiveMessage ReceiveBit Prepare message buffer Subroutine calls Read bit from bus 3 times and vote Idle time Return Check for errors, save bit, update CRC Sample bus for resynchronization Primary Thread Integrated Secondary Secondary Thread Thread Executive ReceiveMessage ReceiveBit Need only first and last Coroutine coroutine calls Wasted Recover idle even time, too short short idle for cocall time 4

5 Protocol Controller Options Analog & Digital I/O Analog & Digital I/O Analog & Digital I/O Analog & Digital I/O System MCU System MCU Optional System MCU I/O Expander Discrete Protocol Controller On-Board Protocol Controller Generic MCU with ASTI S/W Protocol Controller Physical Layer Transceiver Physical Layer Transceiver Physical Layer Transceiver Physical Layer Transceiver I/O Expander Communication Network Discrete protocol controller with MCU MCU with on-board protocol controller Generic MCU with ASTI SW protocol controller 5

6 But what about Caches? Deep instruction pipelines? Branch prediction? Superscalar instruction execution? Speculative execution? The reorder buffer? Page faults? Forwarding paths? Load queues? Data prefetching? Predicated execution? Branch delay slots? Instruction prefetching? Store forwarding? R-ops Dynamic optimization The phase of the moon Wind direction Et cetera, et cetera 6

7 Register File Partitioning Single register file must support primary and secondary threads Three ways to use a register For primary thread exclusively For secondary thread exclusively Shared between the two, swapped on coroutine calls Register file may not be homogeneous Pointer/address registers Immediate-operand capable registers... so need to pick best partition for each type. How? 7

8 Primary and Secondary Thread Performance Impact of register file partitioning More registers for primary thread Less spilling and filling -> primary code takes fewer cycles More idle time -> more cycles for secondary thread Fewer registers for secondary thread -> more spilling and filling - > secondary thread requires more cycles, response time rises More registers for secondary thread Similar case More registers swapped Both threads require fewer cycles to execute Coroutine call takes longer -> More cycles wasted switching between threads -> Now coroutine call fits doesn t fit into shorter idle time fragments, reducing cycles available for secondary thread How do we find the best register file partitioning? Too complex to compute everything analytically Instead compile and analyze iteratively to perform design space exploration 8

9 Thrint foo.s Control-flow Analysis Data-flow Analysis Static Timing Analysis GProf Integration Analysis Integration foo.int.s foo.id XVCG GnuPlot Thread Integrating Compiler Back-End: Thrint Have enhanced Thrint to Integrate threads using ASTI methods (was just STI) Measure best, worst case performance for secondary thread 9

10 Iterative Partition Analysis Toolchain Register File Partitioning Decisions s_m.c Primary Thread r_m.c s_b.c gcc Thrint: ICTA T SegmentIdle r_b.c Original Performance of Secondary Thread Secondary Thread gcc Thrint T Sec Performance Comparison: Slowdown vs. Dedicated MCU gcc Thrint T Sec-Seg-Part Performance of Segmented, Partitioned Secondary Thread 10

11 Atmel AVR 8-bit load/store architecture for microcontrollers Register File (32) Pointer + immediate (6) Immediate (10) Other (16) Protocol controllers in C CAN: 62.5 kbps MIL-STD-1553: 1 Mbps Secondary threads in C Network-RS232 bridge PID controller Compiled with AVR-GCC, -O3 Experiments Bus Bridge MCU Dig. Out Dig. In ASTI Software Primary Thread (J1850) Message Queues Secondary Thread (Interface) UART System MCU UART 11

12 Performance Evaluation Measure slowdown of integrated secondary thread (worst-case execution path) with partitioned register file, compared with original full-register file performance Need to evaluate and schedule for worst-case to ensure system always meets its deadlines How much performance do we give up by partitioning the register file? Not all partitions are schedulable Not enough time for coroutine call Not enough time for primary thread to meet its I/O instruction deadlines 12

13 Results I: Average Performance Slowdown 60% 40% 20% Bridge (Host Interface) PID Controller 0% 1553Send 1553Receive CANSend CANReceive Average performance for all feasible partitioning approaches 13

14 Results II: Best Performance 2.0% 1.5% Bridge (Host Interface) PID Controller Slowdown 1.0% 0.5% 0.0% -0.5% 1553Send 1553Receive CANSend CANReceive Find best (least slow-down) of all feasible partitionings AVR register file is adequate to handle register pressure for both threads, or idle time is adequate for coroutine calls 14

15 Detailed Analysis Example Primary: 1553 send Secondary: PID controller Immediate registers Secondary is sensitive to # of immediate registers Primary: CAN send Secondary: RS232-CAN Bridge Other registers Cocall must be brief for schedulability Best is 1.5% slowdown: 10,6 to 14,2 with no swapped registers 15

16 Conclusions Conclusions and Future Work Performance varies significantly for AVR architecture Average case bad Best case close to non-partitioned register file Future Work Derive and evaluate heuristics to search efficiently through partitioning design space Replace coroutine call with dispatcher to support multiple secondary threads 16

17 Questions? Have you applied this to SPEC? No, that s not representative of embedded software-implemented communication protocols Don t caches break the timing predictability you need? The processors we use run at under 50 MHz, so we don t have a memory wall Why not use a multithreaded processor? They re too expensive, too rare, and businesses prefer familiar processors Why not just design an ASIC to do it? Too expensive to get the first one Why not program an FPGA to do it? Too expensive to get the rest of them 17

18 Appendices 18

19 Why Network Communication Protocol Controllers? Multiple threads must be able to make progress, even with fullyloaded bus Idle time is very fine grain (under one bit time) Each application domain customizes its protocols Wireless sensor networks tweak the medium access control, etc. for minimal energy use Automotive: optimize for guaranteed (hard real-time) delivery Chicken and egg problem Protocol controller chip won t appear until adequate market anticipated Chip costs remain high until volumes amortize design costs Delay until protocol controller appears as peripheral on cheap MCUs MCUs are good fit for many embedded protocols, if concurrency is cheap 10 to 200 cycles of processing needed per bit 1 kbps 1 Mbps bus speed Temporally predictable MCUs are cheap and flexible 1 MHz for $ MHz for $5-$10 (but you pay in increased energy use and other issues) 19

20 Assumptions - Small Embedded Systems Processors Not practical to design a custom processor Not practical to use fast processor (e.g. raise clock speed by 10x or more) Can handle some code explosion (e.g. up to 3x) Using a generic microcontroller (e.g. 4, 8 or 16 bit) without memory protection, virtual memory, caches. Workload 32 Bit 8% 16 Bit 12% DSP 11% At most a few threads need to make asynchronous progress, others can wait One hard-real-time thread with tight deadlines Other threads may have deadlines which are significantly longer Interrupts are delayed or handled with polling servers (CASES 2003) Subroutine calls are cloned or inlined (CASES 2004) 4 Bit 12% 8 Bit 57% 20

21 Control-Flow View of ASTI Idle Time Idle Time Idle Time Primary Function Control Flow Secondary Function Break secondary thread into segments lasting approximately for the total idle time Integrate intervening primary code into each segment Insert coroutine calls at start of idle time and end of each segment 21

22 Big Picture How do we efficiently allocate N concurrent (potentially real-time) tasks onto fewer than N processors? Compilation and scheduling for concurrent/parallel/distributed systems Real-time systems Hardware/software cosynthesis Bottlenecks Scheduling each context switch Performing each context switch We focus on 1 processor, and that processor is generic (low-cost) with no special features for accelerating context switch bottlenecks Note: threads must be able to make independent (asynchronous) progress 22

23 Steps in STI: Source Code Preparation Structure program (C) to accumulate work to perform in integrated functions Write functions (C) to be integrated Compile to assembly code, partitioning register file for functions to be integrated (-ffixed) 23

24 Control Dependence Graph Thread Representation Procedure Code Conditional Loop CDG s hierarchical structure simplifies integration Vertical = conditional nesting, Horizontal = execution order Summary information at each level Conditional Nesting Our Thrint back-end compiler operates on CDGs of host, guest threads Annotates host with execution time predictions Execution Order Moves guest code into host, enforcing ctl/data/time dependencies Find gap, or else descend into subgraph 24 Have code transformations to handle conditionals & loops

25 Parse Asm Form CFG/CDG Read Integration Directives Static Timing Analysis Node Labelling Thrint Overview STI Pad Timing Jitter in secondary thread Plan integration Pad excess timing jitter ASTI Pad Timing Jitter in message level function. Pad Timing Jitter in bit level function. Idle Time Analysis Temporal Determinism Analysis Data-Flow Analysis Register Virtualization Integration Register Reallocation Static Timing Analysis Timing Verification Code Regeneration For each guest For each host Do host loop transformations Pad excess timing jitter Clone and insert guest node(s) For each guest For each host If Fused loop, add fused loop control test code Delete original guests Plan integration in secondary thread Pad jitter in predicate nodes and blocking I/O loops Integrate cocalls within the secondary thread. Integrate intervening guest code at appropriate locations. Delete original guests 25

26 Steps in STI: Analysis and Integration Planning Parse assembly code to form CFG and then CDG Perform tree-based static timing analysis Pad away timing variations from conditionals with nops or nop loops (example) Perform basic data-flow analysis to identify loop-control variables and possibly iteration counts Compare duration of primary functions with maximum allowed latency for ISRs and other short-laxity tasks Create polling servers to handle these as needed Compare duration of secondary functions with amount of idle time time in primary functions, considering minimum period for primary function Break long secondary functions into segments which fit into primary functions idle time minus polling servers minus two context switch times Also end segments when reaching a loop with an unknown iteration count Define target times for regions in primary code which are time-critical 26

27 Steps in STI: Integration Note: conditionals have been padded away previously Single primary events Move primary code to execute at proper times within secondary code Replicate primary code into conditionals Split and peel loops and insert primary code Guard primary code within loop to trigger on given iteration Looping primary events Peel off primary function loop iterations which don t overlap with secondary loops Integrate as single primary events Fuse loop iterations which do overlap Fuse loop control tests Unroll loop to match idle time in primary loop to work in secondary loop Create clean-up loops to perform remaining iterations Redo static timing analysis and verify correct timing Recreate assembly file Compile, link, download and run! 27

28 Protocol Software Structure protocol_executive() send idle receive send_message() send_bit() receive_message() receive_bit() Most idle time is located in these functions 28

29 What about Interrupts? What about Frequent Primaries and Long Secondaries? Interrupts? STI disables interrupts while integrated threads run STIGLitz: ints. disabled for one field of video ( ms) Frequent primaries and long secondaries? Primary thread needs to run again before integrated version would finish Solutions Use polling servers to service each non-deferrable thread (e.g. UART) Break secondary into segments and integrate primary in multiple times Laxity for Secondary Thread (max. latency allowed) Worst case execution time of integrated thread Minimum primary thread period minus maximum primary thread work WCET for Secondary Thread 29

30 Detail: Register File Partitioning vs. Performance Problem: STI requires that integrated threads share the register file Trade-off: Code compiled to fit into fewer registers switches contexts faster Dispatcher switches contexts roughly every 900 cycles Two context switches for one register take 12 cycles Code compiled to fit into fewer registers runs slower More variables must remain in memory Goal: Squeeze pre-integrated threads into as few registers as practical Method: Determine sensitivity of the host threads execution time to the number of registers available Divide AVR registers into three classes: Pointer registers (r26-r31) Immediate-operand capable registers (r16-r25) Other registers (r0-r15) Analyze DrawSprite, DrawLine, DrawCircle functions Limit registers available to the register allocator through gcc s ffixed option. Measure execution time using an on-chip timer/counter 30

31 Measurements Results DrawLine and DrawCircle not very sensitive DrawSprite very sensitive Strange speed-up when excluding one pointer register Design decisions DrawLine and DrawCircle Exclude eight other" registers and two pointer registers Use 22 registers Each context switch: 132 cycles DrawSprite Exclude only one other register and two pointer registers Use 29 registers Each context switch: 174 cycles Normalized Run Time Normalized Run Time Normalized Run Time Draw_Circle Sensitivity to Register Exclusion DrawCircle - Immediate DrawCircle - Pointer DrawCircle - Other Total Registers Excluded Draw_Line Sensitivity to Register Exclusion DrawLine - Pointer DrawLine - Other DrawLine - Immediate Total Registers Excluded Draw_Sprite Registers_removed DrawSprite - Immediate DrawSprite - Pointer DrawSprite - Other 31

32 To Do Remove intervening code from primary code - animate 32

Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration

Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration RTSS 2002 -- December 3-5, Austin, Texas Alex Dean Center for Embedded Systems Research Dept. of ECE, NC State

More information

Balancing Register Pressure and Context-Switching Delays in ASTI Systems

Balancing Register Pressure and Context-Switching Delays in ASTI Systems Balancing Register Pressure and Context-Switching Delays in ASTI Systems Siddhartha Shivshankar, Sunil Vangara and Alexander G. Dean Center for Embedded Systems Research Department of Electrical and Computer

More information

Improving Energy-Efficiency Efficiency in Sensor Networks by Raising Communication Throughput Using Software Thread Integration

Improving Energy-Efficiency Efficiency in Sensor Networks by Raising Communication Throughput Using Software Thread Integration Improving Energy-Efficiency Efficiency in Sensor Networks by Raising Communication Throughput Using Software Thread Integration Ramnath Venugopalan and Alexander Dean Center for Embedded Systems Research

More information

Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration

Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration Alexander G. Dean Center for Embedded Systems Research Department of Electrical and Computer Engineering North

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Complementing Software Pipelining with Software Thread Integration

Complementing Software Pipelining with Software Thread Integration Complementing Software Pipelining with Software Thread Integration LCTES 05 - June 16, 2005 Won So and Alexander G. Dean Center for Embedded System Research Dept. of ECE, North Carolina State University

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

CARNEGIE MELLON UNIVERSITY SOFTWARE THREAD INTEGRATION FOR HARDWARE TO SOFTWARE MIGRATION

CARNEGIE MELLON UNIVERSITY SOFTWARE THREAD INTEGRATION FOR HARDWARE TO SOFTWARE MIGRATION CARNEGIE MELLON UNIVERSITY SOFTWARE THREAD INTEGRATION FOR HARDWARE TO SOFTWARE MIGRATION A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for the degree DOCTOR

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

Enhancing the AvrX Kernel with Efficient Secure Communication Using Software Thread Integration

Enhancing the AvrX Kernel with Efficient Secure Communication Using Software Thread Integration Enhancing the AvrX Kernel with Efficient Secure Communication Using Software Thread Integration Prasanth Ganesan and Alexander G. Dean Center for Embedded Systems Research Dept. of Electrical and Computer

More information

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019

250P: Computer Systems Architecture. Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 250P: Computer Systems Architecture Lecture 9: Out-of-order execution (continued) Anton Burtsev February, 2019 The Alpha 21264 Out-of-Order Implementation Reorder Buffer (ROB) Branch prediction and instr

More information

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto. Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

System-Level Issues for Software Thread Integration: Guest Triggering and Host Selection

System-Level Issues for Software Thread Integration: Guest Triggering and Host Selection Published in the Proceedings of the 20th IEEE Symposium on Real-Time Systems, December 1-3, 1999, Phoenix, Arizona System-Level Issues for Software Thread Integration: Triggering and Host Selection Alexander

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

ABSTRACT. VASANTH ASOKAN, Relaxing Control Flow Constraints in ASTI (Under the direction of Dr. Alexander G. Dean)

ABSTRACT. VASANTH ASOKAN, Relaxing Control Flow Constraints in ASTI (Under the direction of Dr. Alexander G. Dean) ABSTRACT VASANTH ASOKAN, Relaxing Control Flow Constraints in ASTI (Under the direction of Dr. Alexander G. Dean) Asynchronous Software Thread Integration (ASTI) provides methods to reclaim sub-bit duration

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches

Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches Session xploiting ILP with SW Approaches lectrical and Computer ngineering University of Alabama in Huntsville Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar,

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Principles in Computer Architecture I CSE 240A (Section ) CSE 240A Homework Three. November 18, 2008

Principles in Computer Architecture I CSE 240A (Section ) CSE 240A Homework Three. November 18, 2008 Principles in Computer Architecture I CSE 240A (Section 631684) CSE 240A Homework Three November 18, 2008 Only Problem Set Two will be graded. Turn in only Problem Set Two before December 4, 2008, 11:00am.

More information

Assuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline?

Assuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline? 1. Imagine we have a non-pipelined processor running at 1MHz and want to run a program with 1000 instructions. a) How much time would it take to execute the program? 1 instruction per cycle. 1MHz clock

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture An Introduction to Parallel Programming Peter Pacheco Chapter 2 Parallel Hardware and Parallel Software 1 The Von Neuman Architecture Control unit: responsible for deciding which instruction in a program

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Lecture 25: Board Notes: Threads and GPUs

Lecture 25: Board Notes: Threads and GPUs Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

VLIW/EPIC: Statically Scheduled ILP

VLIW/EPIC: Statically Scheduled ILP 6.823, L21-1 VLIW/EPIC: Statically Scheduled ILP Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind

More information

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1)

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1) Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics (Sections B.1-B.3, 2.1) 1 Problem 3 Consider the following LSQ and when operands are available. Estimate

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Memory Management. Dr. Yingwu Zhu

Memory Management. Dr. Yingwu Zhu Memory Management Dr. Yingwu Zhu Big picture Main memory is a resource A process/thread is being executing, the instructions & data must be in memory Assumption: Main memory is infinite Allocation of memory

More information

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy EE482: Advanced Computer Organization Lecture #13 Processor Architecture Stanford University Handout Date??? Beyond ILP II: SMT and variants Lecture #13: Wednesday, 10 May 2000 Lecturer: Anamaya Sullery

More information

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as

More information

Instruction Level Parallelism (ILP)

Instruction Level Parallelism (ILP) 1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into

More information

Rapidly Developing Embedded Systems Using Configurable Processors

Rapidly Developing Embedded Systems Using Configurable Processors Class 413 Rapidly Developing Embedded Systems Using Configurable Processors Steven Knapp (sknapp@triscend.com) (Booth 160) Triscend Corporation www.triscend.com Copyright 1998-99, Triscend Corporation.

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

Lecture 13 - VLIW Machines and Statically Scheduled ILP

Lecture 13 - VLIW Machines and Statically Scheduled ILP CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and Statically Scheduled ILP John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Simultaneous Multithreading: a Platform for Next Generation Processors

Simultaneous Multithreading: a Platform for Next Generation Processors Simultaneous Multithreading: a Platform for Next Generation Processors Paulo Alexandre Vilarinho Assis Departamento de Informática, Universidade do Minho 4710 057 Braga, Portugal paulo.assis@bragatel.pt

More information

CS 614 COMPUTER ARCHITECTURE II FALL 2005

CS 614 COMPUTER ARCHITECTURE II FALL 2005 CS 614 COMPUTER ARCHITECTURE II FALL 2005 DUE : November 9, 2005 HOMEWORK III READ : - Portions of Chapters 5, 6, 7, 8, 9 and 14 of the Sima book and - Portions of Chapters 3, 4, Appendix A and Appendix

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized

More information

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2. Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Problem 0 Consider the following LSQ and when operands are

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points

More information

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2. Lecture: SMT, Cache Hierarchies Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1) 1 Problem 1 Consider the following LSQ and when operands are

More information

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware. Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline

More information

Static vs. Dynamic Scheduling

Static vs. Dynamic Scheduling Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.

More information

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13 I/O Handling ECE 650 Systems Programming & Engineering Duke University, Spring 2018 Based on Operating Systems Concepts, Silberschatz Chapter 13 Input/Output (I/O) Typical application flow consists of

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

Memory: Overview. CS439: Principles of Computer Systems February 26, 2018

Memory: Overview. CS439: Principles of Computer Systems February 26, 2018 Memory: Overview CS439: Principles of Computer Systems February 26, 2018 Where We Are In the Course Just finished: Processes & Threads CPU Scheduling Synchronization Next: Memory Management Virtual Memory

More information

Analyzing Real-Time Systems

Analyzing Real-Time Systems Analyzing Real-Time Systems Reference: Burns and Wellings, Real-Time Systems and Programming Languages 17-654/17-754: Analysis of Software Artifacts Jonathan Aldrich Real-Time Systems Definition Any system

More information

Introduction to Embedded Systems

Introduction to Embedded Systems Stefan Kowalewski, 4. November 25 Introduction to Embedded Systems Part 2: Microcontrollers. Basics 2. Structure/elements 3. Digital I/O 4. Interrupts 5. Timers/Counters Introduction to Embedded Systems

More information

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) Past & Present Have looked at two constraints: Mutual exclusion constraint between two events is a requirement that

More information