Exercise: OpenRISC Programming

Size: px
Start display at page:

Download "Exercise: OpenRISC Programming"

Transcription

1 Exercise: OpenRISC Programming Increasing efficiency of the OpenRISC core with simple instruction extensions Michael Gautschi Antonio Pullini

2 Introduction All exercises will be performed on the following configuration: 4 OpenRISC or10n cores With private instruction cache (I$) 8 banks of shared level 1 memory (TCDM) Large L2 memory for instructions and data Since the focus of this exercise is on the core, only one core is used and the others are put to sleep. or10n processor cores Not used / sleeping

3 Exercise Overview 1. Introduction example Compile & execute Helloworld on the Instruction Set Simulator (ISS) 2. Simulator Basics Analyze ISS output and debug 3. Benchmarking Analyze performance improvements of the new instructions using the simulator 4. RTL simulations and benchmarking Compare to simulator 5. Unaligned memory accesses Program a simple stencil and show the benefit of unaligned memory accesses 6. Auto vectorization Analyze the improvements of vector operations on a matrix addition 7. Interrupts and events Compare interrupts and event response times

4 Getting Started 1/2 Copy data from master account: $ mkdir 2_OpenRISC $ cp /home/soc_master/2_openrisc/openrisc.tar.gz 2_OpenRISC/. $ tar xzf OpenRISC.tar.gz 2_OpenRISC directory Source the instruction set simulator $ source /home/soc_master/germain/pulp-sdk ub14.10_2/env/setup.sh We will be working in the software (sw) and build directories sw-dir: contains application sourcecode build-dir: Holds compiler and simulator output here Simulator and RTL-simulations will be run here sim-dir: Contains precompiled code for RTL-simulation

5 Getting Started 2/2 We will be working on the scratch because we are going to generate some data 1. Create a build directory and set up the compiler $ mkdir /scratch/soc_xx/build_id 2. Start the compiler in a new shell: $ or1k v1.7.3 xterm $ tcsh (switch to cshell) 3. Configure the build directory in the new shell $ cd /scratch/soc_xx/build_id $ cp /home/soc_xx/2_openrisc/build/configure_template.tcl /scratch/soc_xx/build_id/configure.tcl In the configure script: Set the path to your exercise folder: OPENRISC_DIR= /home/soc_xx/2_openrisc $./configure.tcl You have successfully set up the build directory! Lets get started with exercise 1!

6 Exercise 1 Introduction a) The build directory is created, the compiler is configured, and the simulator is set up. We are ready to start with a simple helloworld application. b) Compile Helloworld.c Helloworld.c is located in sw/apps/sequential_tests/helloworld/. To compile the application enter the build folder and run the makefile $ cd build $ make helloworld : to generate an executable for the simulator $ make helloworld.read : to generate the assemebly $ make helloworld.slm.cmd : to generate input data for RTL simulations c) Run Helloworld.c The simulator is called pulp-run and can be started like this: $ pulp-run --load-binary=./apps/sequential_tests/helloworld/helloworld:0xf Console should output helloworld Output is also written to the file: stdout/stdout_pe

7 Exercise 2 Simulator basics 1/4 Simulator commands to run an application Run an application: $ pulp-run --load-binary=./applicationname:0xf Get assembly traces: $ pulp-run --load-binary=./applicationname:0xf --iis-trace Enter debug mode Force entering the debugger in the beginning: $ pulp-run --load-binary=./applicationname:0xf --pdb-break press Ctrl+C during execution Check current status of core: (Cmd) state : bootaddress = 0x1c Breakpoints and inspection of memory (in debug mode) Set a breakpoint on memory access: (Cmd) bkp address region rw Display all breakpoints: (Cmd) bkp_list Remove a breakpoint: (Cmd) bkp_dis ID Inspect memory: (Cmd) mem_dump address size : address = 32 bit memory address; : default region = 0x4 (1 word) : rw = read or write access (default rw) : ID shown in bkp_list : 32bit addresses : default size = 0x4 (1 word)

8 Exercise 2 Simulator basics 2/4 Analyze the rijendael application (aes en/decryption) function header Generate the assembly: $ make rijndael.read Have a look at the assembly file rijndael.read The most important functions are: main compute_aes encrypt encfile decrypt decfile Instruction in big and little endian format PC Disassembled instructions Absolute and relative jump/branch targets

9 Exercise 2 Simulator basics 3/4 Simulator traces: Latency of the instruction New register values, effective addresses Disassembled instructions To trace an application: Run the application with the option --iss-trace Four traces are generated, one for each core we are actually using a four-core configuration of which three cores are sleeping trace_cluster0_core0.log shows the interesting traces! PC

10 Exercise 2 Simulator basics 4/4 Tasks: Run the rijndael application with traces and make use of the debugger and breakpoints Which instructions take more than one cycle to execute? Why are some load/store instructions (l.lwz, l.sw) taking more than one cycle? Do you see an example of a data hazard? How many times is address 0x1c read? What is the encryption key?

11 Exercise 3 Benchmarking 1/5 Intro: The timer can be used to count execution time (in cycles) Check matrixmul.c for example Include <timer.h> and use the functions: reset_timer() start_timer() stop_timer() get_time() Hardware loops To run the application with different hardware loop settings, the compiler needs to be reconfigured There are two options: 1. Create a new build folder with a new configure.tcl script 2. Modify current folder and modify the configuration script: configure.tcl To support different numbers of hardware loops, add the following to TARGET_C_FLAGS in configure.tcl: remove the option -mno-hwlp to enable hwlps -mmax-hwloops=<number_of_hwlp> : number_of_hwlp [0,4] Reconfigure build folder: $./configure.tcl The compiler will generate the following hwloop instructions to produce efficient loops: - lp.start - lp.end - lp.count - lp.counti - lp.setup - lp.setupi

12 Exercise 3 Benchmarking 2/5 Tasks: Check if hardware loops are generated (in the matrixmul.read file) How many instructions are actually saved? Compare the matrixmul.read with and w/o hwloops. Estimate the speedup with 1,2,4 hardware loops Do your measurements match your estimations? Why is the benefit of the first one higher? Baseline Hardware loop (1 register set) 2 register sets 3 register sets 4 register sets Execution time: (# cycles/ % improvement)

13 Exercise 3 Benchmarking 3/5 Pre/post increment immediate: Activated by default! Deactivate with mno-idxls Pre/post increment register: From a hardware perspective, what is the drawback of this instruction? Deactivate with mno-rrls Multiply-accumulate instruction: Old architecture: Accumulation register stored in a special register which is not in the register file Accumulation result can be accessed in two cycles New architecture: Accumulates directly on the register file Enable new MAC with -mmac3 Old MAC: New MAC:

14 Exercise 3 Benchmarking 4/5 Vector Instructions: Add, sub, mul, mac, comparisons are all supported in vector mode In parallel one can compute: One word Two halfwords, or Four bytes Enable the vector extensions with the compiler option: mlv32 for vector generation -munaligned-ls for unaligned memory accesses Check in the.read me if vector code is generated. Vector instructions have the format: lv.{mul,mac,sub,add, } Tasks: Run the matrixmul application with the different compiler options Summarize your results in the first column on the next page

15 Exercise 3 Benchmarking 5/5 Baseline Hardware-loop Pre/post incr. imm. Pre/post incr. reg mac Vector Unaligned access Execution time: (# cycles/ % improvement) Simulator: RTL-SIM: Can you explain your results? Are the results you obtained optimal? What can be done better? Try to improve the matrix multiplication such that vectors are used more efficiently

16 Exercise 4 RTL Simulation 1/2 The simulator is not 100% accurate because: It is still under development Not all components are modeled (no caches/memory contentions/ simplified DMA) But a lot faster than RTL simulation Start RTL simulations with the Makefile in your build folders: $ make testname_vsim Rerun the matrixmul in the RTL simulation and complete the table If you have a build folder for each configuration, you can run the simulations in parallel (each RTL-sim will take ~5min) Are you able to reproduce the previously obtained results? Do you observe any differences? Do you have an explanation?

17 Exercise 4 RTL Simulation 2/2 In matrixmul.c the function computegold() which is actually computing the multiplication is called two times with the same inputs and outputs! Code snipped of matrixmul.c: Why do you think this is the case? Compare the execution time of the first iteration between the RTL-simulation and the simulator! What do you observe?

18 Exercise 5 Unaligned Memory Access Unaligned memory access If data is not aligned, for example when computing the stencil over an image, a lot of memory accesses and shifting is required. If data can be accessed in an unaligned fashion, the shifting is not required anymore. The or10n core supports unaligned accesses in two consecutive cycles! Image 1 stencil Pixel = 1 char Tasks: Complete the stencil template program (sw/apps/sequential_tests/stencil/stencil.c) which computes four stencils in parallel over a 32x32 pixel image Do you observe a speedup with unaligned memory access enabled? Compare to a vector only implementation 4 parallel stencils computed with vectorial unit

19 Exercise 6 Auto Vectorization In the matrixmul example, the automatic vector support did not bring the desired speedup If the matrices are accessed in a more regular fashion, this is expected to change! Run the matrixadd{8,16,32} applications in RTL simulation and measure the speedup. matrixadd8 is based on characters matrixadd16 on short integers matrixadd32 on integers Run the motion_detection application and compare first the execution time of the baseline and the optimized architecture without vector support. And then add first unaligned memory access, and then vector instructions Baseline Without vector Unaligned access With vector Execution time: (# cycles/ % improvement) matrixadd32 matrixadd16 matrixadd8 motion_detection

20 Exercise 7 Interrupts & Events 1/2 Core 3 is used to model an external interface in the following Receiving and processing of data is completed as follows: 1. Core 0 configures its interrupt or event mask before completing other tasks/going to sleep 2. Core 3 generates data and sends/places it in a buffer (here in L2 memory) 3. Core 3 sends commands to global event unit, to generate an event for core 0 4. Core 0 receives an interrupt/ wakes up and processes the data in L2 memory 5. Continue from beginning

21 Exercise 7 Interrupts & Events 2/2 Task 1: Interrupts Check out the interrupt/event template: sw/apps/sequential_tests/event_interrupt.c Define an interrupt handler Copy data from receive_buffer to your memory (received_data) Initialize interrupts Add interrupt handler on interrupt GP0 (see sw/libs/sys_lib/src/int.c) Set interrupt mask (1= interrupt is not masked) Task 2: Events In the function wait_and_proc_data(): go to sleep, wait for event When an event is received: copy data from receive_buffer to your memory (received data) Task 3: Comparison Compare the two solutions: Execution time Measure the time required to process one package in modelsim $ make event_interrupt_vsim_debug Hints to measure time: Events: observe the clock of core 0 Interrupts: observe the state of exception routine (exc_running_p) What are the benefits of using events?

22 Questions & Answers You have successfully completed the exercise If you are interested in a mini-project we can offer you: Implementing and optimizing of a benchmark, using the multicore pulp environment See last exercise about the pulp architecture Compare simulator and RTL simulations and come up with examples how to improve it Add performance counters on RTL level Compare OR10N core to a state-of-the art micro-controller with DSP functionalities like the ARM Cortex M4 Add a small unit which observes the stack pointer, and issues an exception if a stack overflow has been detected We are open to your own ideas!

Exercise: RISC Programming

Exercise: RISC Programming Exercise: RISC Programming Increasing efficiency of a RISC-core with simple instruction extensions 04.04.2016 Michael Gautschi Introduction The exercises in today will be performed on the Pulpino platform

More information

Exercise 6: PULP Programming

Exercise 6: PULP Programming Exercise 6: PULP Programming Introduction to the PULP Computing Platform 24.05.2016 Antonio Pullini Michael Gautschi Davide Schiavone Integrated Systems Laboratory How efficient do we need to be? Integrated

More information

L2 - C language for Embedded MCUs

L2 - C language for Embedded MCUs Formation C language for Embedded MCUs: Learning how to program a Microcontroller (especially the Cortex-M based ones) - Programmation: Langages L2 - C language for Embedded MCUs Learning how to program

More information

Changing the Embedded World TM. Module 3: Getting Started Debugging

Changing the Embedded World TM. Module 3: Getting Started Debugging Changing the Embedded World TM Module 3: Getting Started Debugging Module Objectives: Section 1: Introduce Debugging Techniques Section 2: PSoC In-Circuit Emulator (ICE) Section 3: Hands on Debugging a

More information

This section covers the MIPS instruction set.

This section covers the MIPS instruction set. This section covers the MIPS instruction set. 1 + I am going to break down the instructions into two types. + a machine instruction which is directly defined in the MIPS architecture and has a one to one

More information

ELC4438: Embedded System Design ARM Cortex-M Architecture II

ELC4438: Embedded System Design ARM Cortex-M Architecture II ELC4438: Embedded System Design ARM Cortex-M Architecture II Liang Dong Electrical and Computer Engineering Baylor University Memory system The memory systems in microcontrollers often contain two or more

More information

RM3 - Cortex-M4 / Cortex-M4F implementation

RM3 - Cortex-M4 / Cortex-M4F implementation Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course

More information

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009 Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling

More information

RISC-V CUSTOMIZATION WITH STUDIO 8

RISC-V CUSTOMIZATION WITH STUDIO 8 RISC-V CUSTOMIZATION WITH STUDIO 8 Zdeněk Přikryl CTO, Codasip GmbH WHO IS CODASIP Leading provider of RISC-V processor IP Introduced its first RISC-V processor in November 2015 Offers its own portfolio

More information

DSP ISA Extensions for an Open-Source RISC-V Implementation

DSP ISA Extensions for an Open-Source RISC-V Implementation DSP ISA Extensions for an Open-Source RISC-V Implementation Davide Schiavone Davide Rossi Michael Gautschi Eric Flamand Andreas Traber Luca Benini Integrated Systems Laboratory Introduction: a typical

More information

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1

CSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1 CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB

More information

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Introduction All processors offer some form of instructions to add, subtract, and manipulate data.

More information

RI5CY Core: Datasheet

RI5CY Core: Datasheet RI5CY Core: Datasheet Instruction Interface rdata addr Data Interface addr wdata rdata 2 Prefetch Buffer Decoder 0 ALU LSU IF ID GPR ID EX CSR EX WB MULT Andreas Traber atraber@iis.ee.ethz.ch February

More information

Cortex-R5 Software Development

Cortex-R5 Software Development Cortex-R5 Software Development Course Description Cortex-R5 software development is a three days ARM official course. The course goes into great depth, and provides all necessary know-how to develop software

More information

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage. CS3410 Spring 2015 Problem Set 2 (version 3) Due Saturday, April 25, 11:59 PM (Due date for Problem-5 is April 20, 11:59 PM) NetID: Name: 200 points total. Start early! This is a big problem set. Problem

More information

The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications

The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications 29.11.2017 Pasquale Davide Schiavone, Florian Zaruba Davide Rossi, Igor Loi, Antonio Pullini, Francesco

More information

CSX600 Runtime Software. User Guide

CSX600 Runtime Software. User Guide CSX600 Runtime Software User Guide Version 3.0 Document No. 06-UG-1345 Revision: 3.D January 2008 Table of contents Table of contents 1 Introduction................................................ 7 2

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 Scherer Balázs Budapest University of Technology and Economics Department of Measurement and Information Systems BME-MIT 2018 Trends of 32-bit microcontrollers

More information

NEW CEIBO DEBUGGER. Menus and Commands

NEW CEIBO DEBUGGER. Menus and Commands NEW CEIBO DEBUGGER Menus and Commands Ceibo Debugger Menus and Commands D.1. Introduction CEIBO DEBUGGER is the latest software available from Ceibo and can be used with most of Ceibo emulators. You will

More information

An Introduction to Komodo

An Introduction to Komodo An Introduction to Komodo The Komodo debugger and simulator is the low-level debugger used in the Digital Systems Laboratory. Like all debuggers, Komodo allows you to run your programs under controlled

More information

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely

More information

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a

More information

Lab 03 - x86-64: atoi

Lab 03 - x86-64: atoi CSCI0330 Intro Computer Systems Doeppner Lab 03 - x86-64: atoi Due: October 1, 2017 at 4pm 1 Introduction 1 2 Assignment 1 2.1 Algorithm 2 3 Assembling and Testing 3 3.1 A Text Editor, Makefile, and gdb

More information

Cortex-M3/M4 Software Development

Cortex-M3/M4 Software Development Cortex-M3/M4 Software Development Course Description Cortex-M3/M4 software development is a 3 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software

More information

SD Card Controller IP Specification

SD Card Controller IP Specification SD Card Controller IP Specification Marek Czerski Friday 30 th August, 2013 1 List of Figures 1 SoC with SD Card IP core................................ 4 2 Wishbone SD Card Controller IP Core interface....................

More information

Conventions in this tutorial

Conventions in this tutorial This document provides an exercise using Digi JumpStart for Windows Embedded CE 6.0. This document shows how to develop, run, and debug a simple application on your target hardware platform. This tutorial

More information

Chapter 15 ARM Architecture, Programming and Development Tools

Chapter 15 ARM Architecture, Programming and Development Tools Chapter 15 ARM Architecture, Programming and Development Tools Lesson 07 ARM Cortex CPU and Microcontrollers 2 Microcontroller CORTEX M3 Core 32-bit RALU, single cycle MUL, 2-12 divide, ETM interface,

More information

Introducing the Latest SiFive RISC-V Core IP Series

Introducing the Latest SiFive RISC-V Core IP Series Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance

More information

Introduction. Overview and Getting Started. CS 161 Computer Security Lab 1 Buffer Overflows v.01 Due Date: September 17, 2012 by 11:59pm

Introduction. Overview and Getting Started. CS 161 Computer Security Lab 1 Buffer Overflows v.01 Due Date: September 17, 2012 by 11:59pm Dawn Song Fall 2012 CS 161 Computer Security Lab 1 Buffer Overflows v.01 Due Date: September 17, 2012 by 11:59pm Introduction In this lab, you will get a hands-on approach to circumventing user permissions

More information

Introduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref.

Introduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual Glance into the past Initial ARM Processor developed by Acorn Computers,

More information

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science

More information

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture. Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction

More information

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Hercules ARM Cortex -R4 System Architecture. Processor Overview Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features

More information

Copyright 2014 Xilinx

Copyright 2014 Xilinx IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able

More information

DSP Mapping, Coding, Optimization

DSP Mapping, Coding, Optimization DSP Mapping, Coding, Optimization On TMS320C6000 Family using CCS (Code Composer Studio) ver 3.3 Started with writing a simple C code in the class, from scratch Project called First, written for C6713

More information

Evaluating RISC-V Cores for PULP

Evaluating RISC-V Cores for PULP Evaluating RISC-V Cores for PULP An Open Parallel Ultra-Low-Power Platform www.pulp.ethz.ch 30 June 2015 Sven Stucki Antonio Pullini Michael Gautschi Frank K. Gürkaynak Andrea Marongiu Igor Loi Davide

More information

Codewarrior for ColdFire (Eclipse) 10.0 Setup

Codewarrior for ColdFire (Eclipse) 10.0 Setup Codewarrior for ColdFire (Eclipse) 10.0 Setup 1. Goal This document is designed to ensure that your Codewarrior for Coldfire v10.0 environment is correctly setup and to orient you to it basic functionality

More information

Cortex-A9 MPCore Software Development

Cortex-A9 MPCore Software Development Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop

More information

The Next Steps in the Evolution of ARM Cortex-M

The Next Steps in the Evolution of ARM Cortex-M The Next Steps in the Evolution of ARM Cortex-M Joseph Yiu Senior Embedded Technology Manager CPU Group ARM Tech Symposia China 2015 November 2015 Trust & Device Integrity from Sensor to Server 2 ARM 2015

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related

More information

Using the KD30 Debugger

Using the KD30 Debugger ELEC3730 Embedded Systems Tutorial 3 Using the KD30 Debugger 1 Introduction Overview The KD30 debugger is a powerful software tool that can greatly reduce the time it takes to develop complex programs

More information

Project #1 Exceptions and Simple System Calls

Project #1 Exceptions and Simple System Calls Project #1 Exceptions and Simple System Calls Introduction to Operating Systems Assigned: January 21, 2004 CSE421 Due: February 17, 2004 11:59:59 PM The first project is designed to further your understanding

More information

ECE 206, Fall 2001: Lab 3

ECE 206, Fall 2001: Lab 3 ECE 206, : Lab 3 Data Movement Instructions Learning Objectives This lab will give you practice with a number of LC-2 programming constructs. In particular you will cover the following topics: - Load/store

More information

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda

More information

RM4 - Cortex-M7 implementation

RM4 - Cortex-M7 implementation Formation Cortex-M7 implementation: This course covers the Cortex-M7 V7E-M compliant CPU - Processeurs ARM: ARM Cores RM4 - Cortex-M7 implementation This course covers the Cortex-M7 V7E-M compliant CPU

More information

Laboratory Exercise 4

Laboratory Exercise 4 Laboratory Exercise Input/Output in an Embedded System The purpose of this exercise is to investigate the use of devices that provide input and output capabilities for a processor. There are two basic

More information

embos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.1 Revision: 0 Date: May 3, 2018

embos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.1 Revision: 0 Date: May 3, 2018 embos Real-Time Operating System Document: UM01025 Software Version: 3.1 Revision: 0 Date: May 3, 2018 A product of SEGGER Microcontroller GmbH www.segger.com 2 Disclaimer Specifications written in this

More information

FR Family MB Emulator System Getting Started Guide

FR Family MB Emulator System Getting Started Guide FR Family MB2198-01 Emulator System Getting Started Guide Doc. No. 002-05222 Rev. *A Cypress Semiconductor 198 Champion Court San Jose, CA 95134-1709 http://www.cypress.com Copyrights Copyrights Cypress

More information

I/O - input/output. system components: CPU, memory, and bus -- now add I/O controllers and peripheral devices. CPU Cache

I/O - input/output. system components: CPU, memory, and bus -- now add I/O controllers and peripheral devices. CPU Cache I/O - input/output system components: CPU, memory, and bus -- now add I/O controllers and peripheral devices CPU Cache CPU must perform all transfers to/from simple controller, e.g., CPU reads byte from

More information

Early Software Development Through Emulation for a Complex SoC

Early Software Development Through Emulation for a Complex SoC Early Software Development Through Emulation for a Complex SoC FTF-NET-F0204 Raghav U. Nayak Senior Validation Engineer A P R. 2 0 1 4 TM External Use Session Objectives After completing this session you

More information

Fredrick M. Cady. Assembly and С Programming forthefreescalehcs12 Microcontroller. шт.

Fredrick M. Cady. Assembly and С Programming forthefreescalehcs12 Microcontroller. шт. SECOND шт. Assembly and С Programming forthefreescalehcs12 Microcontroller Fredrick M. Cady Department of Electrical and Computer Engineering Montana State University New York Oxford Oxford University

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

1. Choose a module that you wish to implement. The modules are described in Section 2.4.

1. Choose a module that you wish to implement. The modules are described in Section 2.4. Chapter 2 Lab 2 - Datapath 2.1 Overview During lab 2 you will complete the RTL code of the ALU and MAC datapaths of the DSP core and write a set of small test programs to verify your implementation. Most

More information

Chapter 7 Central Processor Unit (S08CPUV2)

Chapter 7 Central Processor Unit (S08CPUV2) Chapter 7 Central Processor Unit (S08CPUV2) 7.1 Introduction This section provides summary information about the registers, addressing modes, and instruction set of the CPU of the HCS08 Family. For a more

More information

Design of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1

Design of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1 Design of Embedded DSP Processors Unit 7: Programming toolchain 9/26/2017 Unit 7 of TSEA26 2017 H1 1 Toolchain introduction There are two kinds of tools 1.The ASIP design tool for HW designers Frontend

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7

Figure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 SE205 - TD1 Énoncé General Instructions You can download all source files from: https://se205.wp.mines-telecom.fr/td1/ SIMD-like Data-Level Parallelism Modern processors often come with instruction set

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Design UART Loopback with Interrupts

Design UART Loopback with Interrupts Once the E is displayed, will the 0 reappear if you return the DIP switch to its OFF position and re-establish the loopback path? Usually not. When you break the loopback path, it will most likely truncate

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

ARMv8-A Software Development

ARMv8-A Software Development ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for

More information

Laboratory 1 Semester 1 11/12

Laboratory 1 Semester 1 11/12 CS2106 National University of Singapore School of Computing Laboratory 1 Semester 1 11/12 MATRICULATION NUMBER: In this lab exercise, you will get familiarize with some basic UNIX commands, editing and

More information

Disassemble the machine code present in any memory region. Single step through each assembly language instruction in the Nios II application.

Disassemble the machine code present in any memory region. Single step through each assembly language instruction in the Nios II application. Nios II Debug Client This tutorial presents an introduction to the Nios II Debug Client, which is used to compile, assemble, download and debug programs for Altera s Nios II processor. This tutorial presents

More information

Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim

Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim Ray Duran Staff Design Specialist FAE, Altera Corporation 408-544-7937

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point

More information

Chapter 5. Introduction ARM Cortex series

Chapter 5. Introduction ARM Cortex series Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1

More information

CPU Structure and Function

CPU Structure and Function CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Red Suite 4 Getting Started. Applies to Red Suite 4.22 or greater

Red Suite 4 Getting Started. Applies to Red Suite 4.22 or greater Red Suite 4 Getting Started Applies to Red Suite 4.22 or greater March 26, 2012 Table of Contents 1 ABOUT THIS GUIDE... 3 1.1 WHO SHOULD USE IT... 3 2 RED SUITE 4... 4 2.1 NEW FEATURES IN RED SUITE 4...

More information

_ V Renesas R8C In-Circuit Emulation. Contents. Technical Notes

_ V Renesas R8C In-Circuit Emulation. Contents. Technical Notes _ V9.12. 225 Technical Notes Renesas R8C In-Circuit Emulation This document is intended to be used together with the CPU reference manual provided by the silicon vendor. This document assumes knowledge

More information

The Embedded computing platform. Four-cycle handshake. Bus protocol. Typical bus signals. Four-cycle example. CPU bus.

The Embedded computing platform. Four-cycle handshake. Bus protocol. Typical bus signals. Four-cycle example. CPU bus. The Embedded computing platform CPU bus. Memory. I/O devices. CPU bus Connects CPU to: memory; devices. Protocol controls communication between entities. Bus protocol Determines who gets to use the bus

More information

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University Prof. Sunil P Khatri (Lab exercise created and tested by Ramu Endluri, He Zhou, Andrew Douglass

More information

Resource 2 Embedded computer and development environment

Resource 2 Embedded computer and development environment Resource 2 Embedded computer and development environment subsystem The development system is a powerful and convenient tool for embedded computing applications. As shown below, the development system consists

More information

Embedded Seminar in Shenzhen

Embedded Seminar in Shenzhen in Shenzhen 1 hello world PC HELLO WORLD IDE Simulator - C 2 2 3 3 Architecture 6 Halfword and signed halfword / byte support System mode Thumb instruction set 4 4T Improved /Thumb Interworking CLZ Saturated

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro 1 Levels of Representation/Interpretation Machine Interpretation High Level Language Program (e.g., C) Compiler Assembly

More information

ECE 154A Introduction to. Fall 2012

ECE 154A Introduction to. Fall 2012 ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 4: Arithmetic and Data Transfer Instructions Agenda Review of last lecture Logic and shift instructions Load/store instructionsi

More information

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine Machine Language Instructions Introduction Instructions Words of a language understood by machine Instruction set Vocabulary of the machine Current goal: to relate a high level language to instruction

More information

Implementing Secure Software Systems on ARMv8-M Microcontrollers

Implementing Secure Software Systems on ARMv8-M Microcontrollers Implementing Secure Software Systems on ARMv8-M Microcontrollers Chris Shore, ARM TrustZone: A comprehensive security foundation Non-trusted Trusted Security separation with TrustZone Isolate trusted resources

More information

CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt

CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions Robert Wagner cps 104 Int.1 RW Fall 2000 Interrupts Exceptions and Traps Interrupts, Exceptions and Traps are asynchronous

More information

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are

More information

F28HS Hardware-Software Interface: Systems Programming

F28HS Hardware-Software Interface: Systems Programming F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has

More information

Instructions: Language of the Computer

Instructions: Language of the Computer Instructions: Language of the Computer Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class

More information

Tools Basics. Getting Started with Renesas Development Tools R8C/3LX Family

Tools Basics. Getting Started with Renesas Development Tools R8C/3LX Family Getting Started with Renesas Development Tools R8C/3LX Family Description: The purpose of this lab is to allow a user new to the Renesas development environment to quickly come up to speed on the basic

More information

ECE 471 Embedded Systems Lecture 5

ECE 471 Embedded Systems Lecture 5 ECE 471 Embedded Systems Lecture 5 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 17 September 2013 HW#1 is due Thursday Announcements For next class, at least skim book Chapter

More information

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools The hardware modules and descriptions referred to in this document are *NOT SUPPORTED* by Texas Instruments

More information

EECS 678: Intro to Operating Systems Programming Assignment 3: Virtual Memory in Nachos

EECS 678: Intro to Operating Systems Programming Assignment 3: Virtual Memory in Nachos EECS 678: Intro to Operating Systems Programming Assignment 3: Virtual Memory in Nachos 1. Introduction 2. Background 3. Assignment 4. Implementation Details 5. Implementation Overview 6. Testing and Validation

More information

ARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University ARM Processor Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu CPU Architecture CPU & Memory address Memory data CPU 200 ADD r5,r1,r3 PC ICE3028:

More information

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a

More information

embos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.0 Revision: 0 Date: September 18, 2017

embos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.0 Revision: 0 Date: September 18, 2017 embos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.0 Revision: 0 Date: September 18, 2017 A product of SEGGER Microcontroller GmbH & Co. KG www.segger.com

More information

ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview

ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview M J Brockway January 25, 2016 UM10562 All information provided in this document is subject to legal disclaimers. NXP B.V. 2014. All

More information