Tracing mfence White Paper

Size: px
Start display at page:

Download "Tracing mfence White Paper"

Transcription

1 Tracing mfence White Paper Doug Deao Texas Texas Instruments All rights reserved Document History Revision Modifications 0.4 Added Alert Appendix to the end of the document. This section provides guidance for dealing with mfence instruction alerts in regards to trace. 0.4 Updated the Trace triggers Required section to include example of setting up workaround properties for Event trace. 0.5 Added section with instructions for setting up Trace Job workaround with AETLib 0.5 Updated for CCS 5.x and later.

2 The Issue Trace data generation for the mfence instruction, added to Keystone devices, is incorrect. The mfence instruction will stall the instruction pipeline until the completion of all outstanding CPU-triggered memory transactions. To determine if all outstanding CPU-triggered memory transactions are complete, the instruction checks an internal busy flag. always waits at least 5 clock cycles before checking the busy flag in order to account for pipeline delays. During the course of executing a operation, any enabled interrupts will still be serviced. While the mfence is waiting on the busy flag, the Trace PC stream continues to advance indicating in error the instruction pipeline is advancing. This causes any branch data between the mfence and the next trace sync point to be reconstructed incorrectly in the Trace Viewer or by TD.EXE, and can cause bad Trace Status column messages. Workaround Overview The workaround requires three components: 1. CCS v5.x.(and earlier CCS releases) must be updated with Emupack or later. 2. In your code the mfence instruction must be followed immediately with a nop and mark instructions. 3. An additional trace trigger is required to Don t Sample PC on a Mark. The Don t Sample PC on Mark will cause a new sync point in the trace stream. The Emupack update causes all cycles between the mfence and the new sync point to be associated with the mfence instruction, rather than instructions after the mfence in error. The following sections will provide details on implementing the workaround. Validation Discussion Validation of the workaround utilized the TSCL counter to confirm the trace timing data. Validation confirmed that all interrupts and branches that occur after the new sync point behave correctly. Validation also included generation of an interrupt during the first cycle of an mfence, which behaved as expected with the return from interrupt back to the mfence. We also tested the case where the interrupt occurred immediately after the mfence. In this case the interrupt returned to the nop instruction after the mfence as expected, but the trace timing data was one cycle less than the TSCL count. There are potential boundary condition cases caused by interrupts during an mfence instruction that our testing may not have covered. We encourage you to check your specific mfence trace cases and confirm proper return from any interrupts that occurred during the mfence instruction.

3 Code Changes Required Every occurrence of the mfence instruction must be followed with a nop and mark instruction. Methods to include nop/mark code: 1. For C code use the preprocessor to update the code: #define _mfence() asm("\tmfence\n\tnop\n\tmark 0") Note that we do not recommend using the compiler _mfence() and _mark() intrinsics in this case because the compiler can schedule code between the intrinsics. 2. For assembly: mfence nop mark 0

4 Trace Triggers Required A Don t Store Sample trace trigger with the following properties must be enabled, along with your normal trace triggers. For PC Trace use cases, CCSv5.4 and later provides a predefined Workaround (Don t Store Sample on Mark 0) trace trigger automatically. The Workaround trigger is not automatically added for Custom Core Trace use cases and must be added by the user. Also, for the Custom Core Trace use case, there are differences between Standard and Event Trace Don t Store Sample triggers. The following shows the properties for a Standard Trace Don t Store Sample trace trigger:

5 The following shows the properties for an Event Trace Don t Store Sample trace trigger: Note that you must select the specific mark instruction used in your code for this purpose. If you are using the mark 0 instructions for some other trace purpose then for this workaround you must use one of the other mark instructions (mark 1,2, or 3) in your code and in the Don t Store Sample trigger to avoid conflicts. If using AETLib add the following to your code: /* Set up AET trigger for the Trace workaround */ AET_jobParams MarkTraceParams; MarkTraceParams = AET_JOBPARAMS; /* Initialize Job Parameter Structure */ MarkTraceParams.eventNumber[0] = AET_EVT_MISC_MARK_INS_0; MarkTraceParams.triggerType = AET_TRIG_TRACE_PCSUSPEND; /* Set up the desired job */ if (err=aet_setupjob(aet_job_trig_on_events, &MarkTraceParams)) { printf("error setting up AET resources for mark job [error = 0x%X]\n",err); return err; }

6 How do I know the workaround is functional: At cycle 802 (the MARK 0 sample) the Trace Status column contains Pc_Off Timing_On, indicating the PC Stream has been turned off on that cycle, thus causing the new sync point. At cycle 803 the Trace Status column contains Pc_On Timing_On indicating the PC data stream has been turned back on. In this case the number of cycles read from the TSCL was 49 which is the same as the highlighted trace timing. Any additional post processing you do with the data will work as normal.

7 Mfence Alert Appendix Single Issue: This alert addresses an issue with a store instruction that directly precedes an mfence instruction. The solution requires two mfence instructions back-to-back after the store instruction. For the trace workaround to function properly both mfence instructions must be followed immediately with a nop and mark instructions. Change: To: STORE_A TRANSACTION_B STORE_A NOP MARK 0 NOP MARK 0 TRANSACTION_B End of Document

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 9 Pipelining Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Data Hazards Instruction Hazards Advanced Reliable Systems (ARES) Lab.

More information

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 21 and 22 April 22 and 27, 2009 martha@cs.columbia.edu Amdahl s Law Be aware when optimizing... T = improved Taffected improvement factor + T unaffected

More information

file://c:\documents and Settings\degrysep\Local Settings\Temp\~hh607E.htm

file://c:\documents and Settings\degrysep\Local Settings\Temp\~hh607E.htm Page 1 of 18 Trace Tutorial Overview The objective of this tutorial is to acquaint you with the basic use of the Trace System software. The Trace System software includes the following: The Trace Control

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

XDS560 Trace. Technology Showcase. Daniel Rinkes Texas Instruments

XDS560 Trace. Technology Showcase. Daniel Rinkes Texas Instruments XDS560 Trace Technology Showcase Daniel Rinkes Texas Instruments Agenda AET / XDS560 Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call Graph Agenda

More information

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3) Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3) 1 Support for Speculation In general, when we re-order instructions, register renaming

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

A framework for verification of Program Control Unit of VLIW processors

A framework for verification of Program Control Unit of VLIW processors A framework for verification of Program Control Unit of VLIW processors Santhosh Billava, Saankhya Labs, Bangalore, India (santoshb@saankhyalabs.com) Sharangdhar M Honwadkar, Saankhya Labs, Bangalore,

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

Pipeline Review. Review

Pipeline Review. Review Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1 The basic MIPS Processor Pipeline 2 Performance of pipelining

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Laboratory Pipeline MIPS CPU Design (2): 16-bits version

Laboratory Pipeline MIPS CPU Design (2): 16-bits version Laboratory 10 10. Pipeline MIPS CPU Design (2): 16-bits version 10.1. Objectives Study, design, implement and test MIPS 16 CPU, pipeline version with the modified program without hazards Familiarize the

More information

Memory Subsystem Profiling with the Sun Studio Performance Analyzer

Memory Subsystem Profiling with the Sun Studio Performance Analyzer Memory Subsystem Profiling with the Sun Studio Performance Analyzer CScADS, July 20, 2009 Marty Itzkowitz, Analyzer Project Lead Sun Microsystems Inc. marty.itzkowitz@sun.com Outline Memory performance

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

CS / ECE 6810 Midterm Exam - Oct 21st 2008

CS / ECE 6810 Midterm Exam - Oct 21st 2008 Name and ID: CS / ECE 6810 Midterm Exam - Oct 21st 2008 Notes: This is an open notes and open book exam. If necessary, make reasonable assumptions and clearly state them. The only clarifications you may

More information

ERRATA SHEET INTEGRATED CIRCUITS. Date: July 9, 2007 Document Release: Version 1.6 Device Affected: LPC2148

ERRATA SHEET INTEGRATED CIRCUITS. Date: July 9, 2007 Document Release: Version 1.6 Device Affected: LPC2148 INTEGRATED CIRCUITS ERRATA SHEET Date: July 9, 2007 Document Release: Version 1.6 Device Affected: LPC2148 This errata sheet describes both the functional deviations and any deviations from the electrical

More information

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture William Stallings Computer Organization and Architecture Chapter 16 Control Unit Operations Rev. 3.2 (2009-10) by Enrico Nardelli 16-1 Execution of the Instruction Cycle It has many elementary phases,

More information

Distributed by: www.jameco.com 1-800-831-4242 The content and copyrights of the attached material are the property of its owner. MSP430F11x2/12x2 Device Erratasheet Current Version Devices MSP430F1122

More information

ClearSpeed Visual Profiler

ClearSpeed Visual Profiler ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are

More information

Embedded Systems Lab 2 - Introduction to interrupts

Embedded Systems Lab 2 - Introduction to interrupts Embedded Systems Lab - Introduction to interrupts You are asked to prepare the first part before the lab. Lab duration: 5min A laptop with a working installation of MPLABX IDE and your toolbox are required.

More information

Using ARM ETB with TI CCS. CCS 3.3 with SR9 on TMS320DM6446

Using ARM ETB with TI CCS. CCS 3.3 with SR9 on TMS320DM6446 Using ARM ETB with TI CCS CCS 3.3 with SR9 on TMS320DM6446 1 ETB Usage Brief Tutorial 1. Setup CCS setup configuration to include the ETB. 2. Connect to the target (including the ETB) 3. Select the ETB

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

Student ID: For examiner use

Student ID: For examiner use COMP/ Practice Final Exam Student ID: u Make sure you read each question carefully. Questions are not equally weighted, and the size of the answer box is not necessarily related to the length of the expected

More information

ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors

ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors Jeffrey Dean Jamey Hicks Carl Waldspurger William Weihl George Chrysos Digital Equipment Corporation 1 Motivation

More information

Chapter 8. Pipelining

Chapter 8. Pipelining Chapter 8. Pipelining Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.

More information

AMSC/CMSC 662 Computer Organization and Programming for Scientific Computing Fall 2011 Operating Systems Dianne P. O Leary c 2011

AMSC/CMSC 662 Computer Organization and Programming for Scientific Computing Fall 2011 Operating Systems Dianne P. O Leary c 2011 AMSC/CMSC 662 Computer Organization and Programming for Scientific Computing Fall 2011 Operating Systems Dianne P. O Leary c 2011 1 Operating Systems Notes taken from How Operating Systems Work by Curt

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Where Does The Cpu Store The Address Of The

Where Does The Cpu Store The Address Of The Where Does The Cpu Store The Address Of The Next Instruction To Be Fetched The three most important buses are the address, the data, and the control buses. The CPU always knows where to find the next instruction

More information

Computer Architecture and Organization

Computer Architecture and Organization 6-1 Chapter 6 - Languages and the Machine Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 6 Languages and the Machine 6-2 Chapter 6 - Languages and the Machine Chapter

More information

TMS320VC5503/5507/5509/5510 DSP Direct Memory Access (DMA) Controller Reference Guide

TMS320VC5503/5507/5509/5510 DSP Direct Memory Access (DMA) Controller Reference Guide TMS320VC5503/5507/5509/5510 DSP Direct Memory Access (DMA) Controller Reference Guide Literature Number: January 2007 This page is intentionally left blank. Preface About This Manual Notational Conventions

More information

4 DEBUGGING. In This Chapter. Figure 2-0. Table 2-0. Listing 2-0.

4 DEBUGGING. In This Chapter. Figure 2-0. Table 2-0. Listing 2-0. 4 DEBUGGING Figure 2-0. Table 2-0. Listing 2-0. In This Chapter This chapter contains the following topics: Debug Sessions on page 4-2 Code Behavior Analysis Tools on page 4-8 DSP Program Execution Operations

More information

2

2 1 2 3 4 5 6 For more information, see http://www.intel.com/content/www/us/en/processors/core/core-processorfamily.html 7 8 The logic for identifying issues on Intel Microarchitecture Codename Ivy Bridge

More information

80C186 AND 80C188 EMBEDDED MICROPROCESSORS SPECIFICATION UPDATE

80C186 AND 80C188 EMBEDDED MICROPROCESSORS SPECIFICATION UPDATE 80C186 AND 80C188 EMBEDDED MICROPROCESSORS SPECIFICATION UPDATE Release Date: July, 1996 Order Number 272894-001 The 80C186 and 80C188 Embedded Microprocessors may contain design defects or errors known

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance

More information

Basic Computer Architecture

Basic Computer Architecture Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I

More information

Module 4c: Pipelining

Module 4c: Pipelining Module 4c: Pipelining R E F E R E N C E S : S T A L L I N G S, C O M P U T E R O R G A N I Z A T I O N A N D A R C H I T E C T U R E M O R R I S M A N O, C O M P U T E R O R G A N I Z A T I O N A N D A

More information

TMS320VC5409A Digital Signal Processor Silicon Errata

TMS320VC5409A Digital Signal Processor Silicon Errata TMS320VC5409A Digital Signal Processor Silicon Errata June 2001 Revised May 2003 Copyright 2003, Texas Instruments Incorporated Literature Number REVISION HISTORY This revision history highlights the technical

More information

Introducing SPI Xpress SPI protocol Master / Analyser on USB

Introducing SPI Xpress SPI protocol Master / Analyser on USB Introducing SPI Xpress SPI protocol Master / Analyser on USB SPI Xpress is Byte Paradigm s SPI protocol exerciser and analyser. It is controlled from a PC through a USB 2.0 high speed interface. It allows

More information

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009 Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Floating Point/Multicycle Pipelining in DLX

Floating Point/Multicycle Pipelining in DLX Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations in one or two cycles is impractical since it requires: A much longer CPU clock cycle, and/or

More information

Computer Systems Architecture I. CSE 560M Lecture 5 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 5 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 5 Prof. Patrick Crowley Plan for Today Note HW1 was assigned Monday Commentary was due today Questions Pipelining discussion II 2 Course Tip Question 1:

More information

ERRATA SHEET INTEGRATED CIRCUITS. Date: July 7, 2008 Document Release: Version 1.8 Device Affected: LPC2148

ERRATA SHEET INTEGRATED CIRCUITS. Date: July 7, 2008 Document Release: Version 1.8 Device Affected: LPC2148 INTEGRATED CIRCUITS ERRATA SHEET Date: July 7, 2008 Document Release: Version 1.8 Device Affected: LPC2148 This errata sheet describes both the functional problems and any deviations from the electrical

More information

ECE 486/586. Computer Architecture. Lecture # 12

ECE 486/586. Computer Architecture. Lecture # 12 ECE 486/586 Computer Architecture Lecture # 12 Spring 2015 Portland State University Lecture Topics Pipelining Control Hazards Delayed branch Branch stall impact Implementing the pipeline Detecting hazards

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Network Intrusion Detection Systems. Beyond packet filtering

Network Intrusion Detection Systems. Beyond packet filtering Network Intrusion Detection Systems Beyond packet filtering Goal of NIDS Detect attacks as they happen: Real-time monitoring of networks Provide information about attacks that have succeeded: Forensic

More information

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27) Lecture Topics Today: Data and Control Hazards (P&H 4.7-4.8) Next: continued 1 Announcements Exam #1 returned Milestone #5 (due 2/27) Milestone #6 (due 3/13) 2 1 Review: Pipelined Implementations Pipelining

More information

TMS320C3X Floating Point DSP

TMS320C3X Floating Point DSP TMS320C3X Floating Point DSP Microcontrollers & Microprocessors Undergraduate Course Isfahan University of Technology Oct 2010 By : Mohammad 1 DSP DSP : Digital Signal Processor Why A DSP? Example Voice

More information

Processor (II) - pipelining. Hwansoo Han

Processor (II) - pipelining. Hwansoo Han Processor (II) - pipelining Hwansoo Han Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 =2.3 Non-stop: 2n/0.5n + 1.5 4 = number

More information

Secure software guidelines for ARMv8-M. for ARMv8-M. Version 0.1. Version 2.0. Copyright 2017 ARM Limited or its affiliates. All rights reserved.

Secure software guidelines for ARMv8-M. for ARMv8-M. Version 0.1. Version 2.0. Copyright 2017 ARM Limited or its affiliates. All rights reserved. Connect Secure software User Guide guidelines for ARMv8-M Version 0.1 Version 2.0 Page 1 of 19 Revision Information The following revisions have been made to this User Guide. Date Issue Confidentiality

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction

More information

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Appendix C Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure C.2 The pipeline can be thought of as a series of data paths shifted in time. This shows

More information

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Pipelining James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy What is Pipelining? Pipelining

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Help Volume Agilent Technologies. All rights reserved. Instrument: Agilent Technologies 16550A Logic Analyzer

Help Volume Agilent Technologies. All rights reserved. Instrument: Agilent Technologies 16550A Logic Analyzer Help Volume 1992-2002 Agilent Technologies. All rights reserved. Instrument: Agilent Technologies 16550A Logic Analyzer Agilent Technologies 16550A 100 MHz State/500 MHz Timing Logic Analyzer The Agilent

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

ERRATA SHEET INTEGRATED CIRCUITS. Date: 2008 June 2 Document Release: Version 1.6 Device Affected: LPC2468. NXP Semiconductors

ERRATA SHEET INTEGRATED CIRCUITS. Date: 2008 June 2 Document Release: Version 1.6 Device Affected: LPC2468. NXP Semiconductors INTEGRATED CIRCUITS ERRATA SHEET Date: 2008 June 2 Document Release: Version 1.6 Device Affected: LPC2468 This errata sheet describes both the known functional problems and any deviations from the electrical

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Embedded Target for TI C6000 DSP 2.0 Release Notes

Embedded Target for TI C6000 DSP 2.0 Release Notes 1 Embedded Target for TI C6000 DSP 2.0 Release Notes New Features................... 1-2 Two Virtual Targets Added.............. 1-2 Added C62x DSP Library............... 1-2 Fixed-Point Code Generation

More information

Pipelining. Principles of pipelining Pipeline hazards Remedies. Pre-soak soak soap wash dry wipe. l Chapter 4.4 and 4.5

Pipelining. Principles of pipelining Pipeline hazards Remedies. Pre-soak soak soap wash dry wipe. l Chapter 4.4 and 4.5 Pipelining Pre-soak soak soap wash dry wipe Chapter 4.4 and 4.5 Principles of pipelining Pipeline hazards Remedies 1 Multi-stage process Sequential execution One process begins after previous finishes

More information

80C186XL/80C188XL EMBEDDED MICROPROCESSORS SPECIFICATION UPDATE

80C186XL/80C188XL EMBEDDED MICROPROCESSORS SPECIFICATION UPDATE 80C186XL/80C188XL EMBEDDED MICROPROCESSORS SPECIFICATION UPDATE Release Date: January, 2002 Order Number: 272895.003 The 80C186XL/80C188XL embedded microprocessors may contain design defects or errors

More information

Lecture: Static ILP, Branch Prediction

Lecture: Static ILP, Branch Prediction Lecture: Static ILP, Branch Prediction Topics: compiler-based ILP extraction, branch prediction, bimodal/global/local/tournament predictors (Section 3.3, notes on class webpage) 1 Problem 1 Use predication

More information

Visual Profiler. User Guide

Visual Profiler. User Guide Visual Profiler User Guide Version 3.0 Document No. 06-RM-1136 Revision: 4.B February 2008 Visual Profiler User Guide Table of contents Table of contents 1 Introduction................................................

More information

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,

More information

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos Unresolved data hazards 81 Unresolved data hazards Arithmetic instructions following a load, and reading the register updated by the load: if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or

More information

XDS560 Trace. Advanced Use Cases for Profiling. Daniel Rinkes Texas Instruments

XDS560 Trace. Advanced Use Cases for Profiling. Daniel Rinkes Texas Instruments XDS560 Trace Advanced Use Cases for Profiling Daniel Rinkes Texas Instruments Agenda AET / XDS560Trace Overview Interrupt Profiling Statistical Profiling Thread Aware Profiling Thread Aware Dynamic Call

More information

Lecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4)

Lecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4) Lecture 2: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4) 1 Defining Fault, Error, and Failure A fault produces a latent error; it becomes effective when

More information

CY7C Errata Revision: *A. June 25, 2004 Errata Document for CY7C Part Numbers Affected. CY7C67200 Qualification Status

CY7C Errata Revision: *A. June 25, 2004 Errata Document for CY7C Part Numbers Affected. CY7C67200 Qualification Status Errata Revision: *A June 25, 2004 for This document describes the errata for the. Details include errata trigger conditions, available workarounds, and silicon revision applicability. This document should

More information

MCUXpresso IDE Instruction Trace Guide. Rev May, 2018 User guide

MCUXpresso IDE Instruction Trace Guide. Rev May, 2018 User guide MCUXpresso IDE Instruction Trace Guide User guide 14 May, 2018 Copyright 2018 NXP Semiconductors All rights reserved. ii 1. Trace Overview... 1 1.1. Instruction Trace Overview... 1 1.1.1. Supported Targets...

More information

TMS320C674x/OMAP-L1x Processor General-Purpose Input/Output (GPIO) User's Guide

TMS320C674x/OMAP-L1x Processor General-Purpose Input/Output (GPIO) User's Guide TMS320C674x/OMAP-L1x Processor General-Purpose Input/Output (GPIO) User's Guide Literature Number: SPRUFL8B June 2010 2 Preface... 7 1 Introduction... 9 1.1 Purpose of the Peripheral... 9 1.2 Features...

More information

Module 2: Computer-System Structures. Computer-System Architecture

Module 2: Computer-System Structures. Computer-System Architecture Module 2: Computer-System Structures Computer-System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection General System Architecture Operating System Concepts 2.1 Silberschatz

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Debugging Guide. Mixed expressions that produce non desired results are used. Example (both a, b are real):

Debugging Guide. Mixed expressions that produce non desired results are used. Example (both a, b are real): 1 Debugging Guide The following is a compilation of common, trivial but invisible errors that may have catastrophic results. The erroneous action is marked with red. Common Errors: Precision, Expressions

More information

The CPU Pipeline. MIPS R4000 Microprocessor User's Manual 43

The CPU Pipeline. MIPS R4000 Microprocessor User's Manual 43 The CPU Pipeline 3 This chapter describes the basic operation of the CPU pipeline, which includes descriptions of the delay instructions (instructions that follow a branch or load instruction in the pipeline),

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Chapter 8 Pipelining and Vector Processing 8 1 If the pipeline stages are heterogeneous, the slowest stage determines the flow rate of the entire pipeline. This leads to other stages idling. 8 2 Pipeline

More information

ECE 505 Computer Architecture

ECE 505 Computer Architecture ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

COSC 6385 Computer Architecture - Memory Hierarchy Design (III)

COSC 6385 Computer Architecture - Memory Hierarchy Design (III) COSC 6385 Computer Architecture - Memory Hierarchy Design (III) Fall 2006 Reducing cache miss penalty Five techniques Multilevel caches Critical word first and early restart Giving priority to read misses

More information

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST Chapter 3. Pipelining EE511 In-Cheol Park, KAIST Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup

More information