Exercise: OpenRISC Programming
|
|
- August Bates
- 5 years ago
- Views:
Transcription
1 Exercise: OpenRISC Programming Increasing efficiency of the OpenRISC core with simple instruction extensions Michael Gautschi Antonio Pullini
2 Introduction All exercises will be performed on the following configuration: 4 OpenRISC or10n cores With private instruction cache (I$) 8 banks of shared level 1 memory (TCDM) Large L2 memory for instructions and data Since the focus of this exercise is on the core, only one core is used and the others are put to sleep. or10n processor cores Not used / sleeping
3 Exercise Overview 1. Introduction example Compile & execute Helloworld on the Instruction Set Simulator (ISS) 2. Simulator Basics Analyze ISS output and debug 3. Benchmarking Analyze performance improvements of the new instructions using the simulator 4. RTL simulations and benchmarking Compare to simulator 5. Unaligned memory accesses Program a simple stencil and show the benefit of unaligned memory accesses 6. Auto vectorization Analyze the improvements of vector operations on a matrix addition 7. Interrupts and events Compare interrupts and event response times
4 Getting Started 1/2 Copy data from master account: $ mkdir 2_OpenRISC $ cp /home/soc_master/2_openrisc/openrisc.tar.gz 2_OpenRISC/. $ tar xzf OpenRISC.tar.gz 2_OpenRISC directory Source the instruction set simulator $ source /home/soc_master/germain/pulp-sdk ub14.10_2/env/setup.sh We will be working in the software (sw) and build directories sw-dir: contains application sourcecode build-dir: Holds compiler and simulator output here Simulator and RTL-simulations will be run here sim-dir: Contains precompiled code for RTL-simulation
5 Getting Started 2/2 We will be working on the scratch because we are going to generate some data 1. Create a build directory and set up the compiler $ mkdir /scratch/soc_xx/build_id 2. Start the compiler in a new shell: $ or1k v1.7.3 xterm $ tcsh (switch to cshell) 3. Configure the build directory in the new shell $ cd /scratch/soc_xx/build_id $ cp /home/soc_xx/2_openrisc/build/configure_template.tcl /scratch/soc_xx/build_id/configure.tcl In the configure script: Set the path to your exercise folder: OPENRISC_DIR= /home/soc_xx/2_openrisc $./configure.tcl You have successfully set up the build directory! Lets get started with exercise 1!
6 Exercise 1 Introduction a) The build directory is created, the compiler is configured, and the simulator is set up. We are ready to start with a simple helloworld application. b) Compile Helloworld.c Helloworld.c is located in sw/apps/sequential_tests/helloworld/. To compile the application enter the build folder and run the makefile $ cd build $ make helloworld : to generate an executable for the simulator $ make helloworld.read : to generate the assemebly $ make helloworld.slm.cmd : to generate input data for RTL simulations c) Run Helloworld.c The simulator is called pulp-run and can be started like this: $ pulp-run --load-binary=./apps/sequential_tests/helloworld/helloworld:0xf Console should output helloworld Output is also written to the file: stdout/stdout_pe
7 Exercise 2 Simulator basics 1/4 Simulator commands to run an application Run an application: $ pulp-run --load-binary=./applicationname:0xf Get assembly traces: $ pulp-run --load-binary=./applicationname:0xf --iis-trace Enter debug mode Force entering the debugger in the beginning: $ pulp-run --load-binary=./applicationname:0xf --pdb-break press Ctrl+C during execution Check current status of core: (Cmd) state : bootaddress = 0x1c Breakpoints and inspection of memory (in debug mode) Set a breakpoint on memory access: (Cmd) bkp address region rw Display all breakpoints: (Cmd) bkp_list Remove a breakpoint: (Cmd) bkp_dis ID Inspect memory: (Cmd) mem_dump address size : address = 32 bit memory address; : default region = 0x4 (1 word) : rw = read or write access (default rw) : ID shown in bkp_list : 32bit addresses : default size = 0x4 (1 word)
8 Exercise 2 Simulator basics 2/4 Analyze the rijendael application (aes en/decryption) function header Generate the assembly: $ make rijndael.read Have a look at the assembly file rijndael.read The most important functions are: main compute_aes encrypt encfile decrypt decfile Instruction in big and little endian format PC Disassembled instructions Absolute and relative jump/branch targets
9 Exercise 2 Simulator basics 3/4 Simulator traces: Latency of the instruction New register values, effective addresses Disassembled instructions To trace an application: Run the application with the option --iss-trace Four traces are generated, one for each core we are actually using a four-core configuration of which three cores are sleeping trace_cluster0_core0.log shows the interesting traces! PC
10 Exercise 2 Simulator basics 4/4 Tasks: Run the rijndael application with traces and make use of the debugger and breakpoints Which instructions take more than one cycle to execute? Why are some load/store instructions (l.lwz, l.sw) taking more than one cycle? Do you see an example of a data hazard? How many times is address 0x1c read? What is the encryption key?
11 Exercise 3 Benchmarking 1/5 Intro: The timer can be used to count execution time (in cycles) Check matrixmul.c for example Include <timer.h> and use the functions: reset_timer() start_timer() stop_timer() get_time() Hardware loops To run the application with different hardware loop settings, the compiler needs to be reconfigured There are two options: 1. Create a new build folder with a new configure.tcl script 2. Modify current folder and modify the configuration script: configure.tcl To support different numbers of hardware loops, add the following to TARGET_C_FLAGS in configure.tcl: remove the option -mno-hwlp to enable hwlps -mmax-hwloops=<number_of_hwlp> : number_of_hwlp [0,4] Reconfigure build folder: $./configure.tcl The compiler will generate the following hwloop instructions to produce efficient loops: - lp.start - lp.end - lp.count - lp.counti - lp.setup - lp.setupi
12 Exercise 3 Benchmarking 2/5 Tasks: Check if hardware loops are generated (in the matrixmul.read file) How many instructions are actually saved? Compare the matrixmul.read with and w/o hwloops. Estimate the speedup with 1,2,4 hardware loops Do your measurements match your estimations? Why is the benefit of the first one higher? Baseline Hardware loop (1 register set) 2 register sets 3 register sets 4 register sets Execution time: (# cycles/ % improvement)
13 Exercise 3 Benchmarking 3/5 Pre/post increment immediate: Activated by default! Deactivate with mno-idxls Pre/post increment register: From a hardware perspective, what is the drawback of this instruction? Deactivate with mno-rrls Multiply-accumulate instruction: Old architecture: Accumulation register stored in a special register which is not in the register file Accumulation result can be accessed in two cycles New architecture: Accumulates directly on the register file Enable new MAC with -mmac3 Old MAC: New MAC:
14 Exercise 3 Benchmarking 4/5 Vector Instructions: Add, sub, mul, mac, comparisons are all supported in vector mode In parallel one can compute: One word Two halfwords, or Four bytes Enable the vector extensions with the compiler option: mlv32 for vector generation -munaligned-ls for unaligned memory accesses Check in the.read me if vector code is generated. Vector instructions have the format: lv.{mul,mac,sub,add, } Tasks: Run the matrixmul application with the different compiler options Summarize your results in the first column on the next page
15 Exercise 3 Benchmarking 5/5 Baseline Hardware-loop Pre/post incr. imm. Pre/post incr. reg mac Vector Unaligned access Execution time: (# cycles/ % improvement) Simulator: RTL-SIM: Can you explain your results? Are the results you obtained optimal? What can be done better? Try to improve the matrix multiplication such that vectors are used more efficiently
16 Exercise 4 RTL Simulation 1/2 The simulator is not 100% accurate because: It is still under development Not all components are modeled (no caches/memory contentions/ simplified DMA) But a lot faster than RTL simulation Start RTL simulations with the Makefile in your build folders: $ make testname_vsim Rerun the matrixmul in the RTL simulation and complete the table If you have a build folder for each configuration, you can run the simulations in parallel (each RTL-sim will take ~5min) Are you able to reproduce the previously obtained results? Do you observe any differences? Do you have an explanation?
17 Exercise 4 RTL Simulation 2/2 In matrixmul.c the function computegold() which is actually computing the multiplication is called two times with the same inputs and outputs! Code snipped of matrixmul.c: Why do you think this is the case? Compare the execution time of the first iteration between the RTL-simulation and the simulator! What do you observe?
18 Exercise 5 Unaligned Memory Access Unaligned memory access If data is not aligned, for example when computing the stencil over an image, a lot of memory accesses and shifting is required. If data can be accessed in an unaligned fashion, the shifting is not required anymore. The or10n core supports unaligned accesses in two consecutive cycles! Image 1 stencil Pixel = 1 char Tasks: Complete the stencil template program (sw/apps/sequential_tests/stencil/stencil.c) which computes four stencils in parallel over a 32x32 pixel image Do you observe a speedup with unaligned memory access enabled? Compare to a vector only implementation 4 parallel stencils computed with vectorial unit
19 Exercise 6 Auto Vectorization In the matrixmul example, the automatic vector support did not bring the desired speedup If the matrices are accessed in a more regular fashion, this is expected to change! Run the matrixadd{8,16,32} applications in RTL simulation and measure the speedup. matrixadd8 is based on characters matrixadd16 on short integers matrixadd32 on integers Run the motion_detection application and compare first the execution time of the baseline and the optimized architecture without vector support. And then add first unaligned memory access, and then vector instructions Baseline Without vector Unaligned access With vector Execution time: (# cycles/ % improvement) matrixadd32 matrixadd16 matrixadd8 motion_detection
20 Exercise 7 Interrupts & Events 1/2 Core 3 is used to model an external interface in the following Receiving and processing of data is completed as follows: 1. Core 0 configures its interrupt or event mask before completing other tasks/going to sleep 2. Core 3 generates data and sends/places it in a buffer (here in L2 memory) 3. Core 3 sends commands to global event unit, to generate an event for core 0 4. Core 0 receives an interrupt/ wakes up and processes the data in L2 memory 5. Continue from beginning
21 Exercise 7 Interrupts & Events 2/2 Task 1: Interrupts Check out the interrupt/event template: sw/apps/sequential_tests/event_interrupt.c Define an interrupt handler Copy data from receive_buffer to your memory (received_data) Initialize interrupts Add interrupt handler on interrupt GP0 (see sw/libs/sys_lib/src/int.c) Set interrupt mask (1= interrupt is not masked) Task 2: Events In the function wait_and_proc_data(): go to sleep, wait for event When an event is received: copy data from receive_buffer to your memory (received data) Task 3: Comparison Compare the two solutions: Execution time Measure the time required to process one package in modelsim $ make event_interrupt_vsim_debug Hints to measure time: Events: observe the clock of core 0 Interrupts: observe the state of exception routine (exc_running_p) What are the benefits of using events?
22 Questions & Answers You have successfully completed the exercise If you are interested in a mini-project we can offer you: Implementing and optimizing of a benchmark, using the multicore pulp environment See last exercise about the pulp architecture Compare simulator and RTL simulations and come up with examples how to improve it Add performance counters on RTL level Compare OR10N core to a state-of-the art micro-controller with DSP functionalities like the ARM Cortex M4 Add a small unit which observes the stack pointer, and issues an exception if a stack overflow has been detected We are open to your own ideas!
Exercise: RISC Programming
Exercise: RISC Programming Increasing efficiency of a RISC-core with simple instruction extensions 04.04.2016 Michael Gautschi Introduction The exercises in today will be performed on the Pulpino platform
More informationExercise 6: PULP Programming
Exercise 6: PULP Programming Introduction to the PULP Computing Platform 24.05.2016 Antonio Pullini Michael Gautschi Davide Schiavone Integrated Systems Laboratory How efficient do we need to be? Integrated
More informationL2 - C language for Embedded MCUs
Formation C language for Embedded MCUs: Learning how to program a Microcontroller (especially the Cortex-M based ones) - Programmation: Langages L2 - C language for Embedded MCUs Learning how to program
More informationChanging the Embedded World TM. Module 3: Getting Started Debugging
Changing the Embedded World TM Module 3: Getting Started Debugging Module Objectives: Section 1: Introduce Debugging Techniques Section 2: PSoC In-Circuit Emulator (ICE) Section 3: Hands on Debugging a
More informationThis section covers the MIPS instruction set.
This section covers the MIPS instruction set. 1 + I am going to break down the instructions into two types. + a machine instruction which is directly defined in the MIPS architecture and has a one to one
More informationELC4438: Embedded System Design ARM Cortex-M Architecture II
ELC4438: Embedded System Design ARM Cortex-M Architecture II Liang Dong Electrical and Computer Engineering Baylor University Memory system The memory systems in microcontrollers often contain two or more
More informationRM3 - Cortex-M4 / Cortex-M4F implementation
Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course
More informationAssembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009
Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling
More informationRISC-V CUSTOMIZATION WITH STUDIO 8
RISC-V CUSTOMIZATION WITH STUDIO 8 Zdeněk Přikryl CTO, Codasip GmbH WHO IS CODASIP Leading provider of RISC-V processor IP Introduced its first RISC-V processor in November 2015 Offers its own portfolio
More informationDSP ISA Extensions for an Open-Source RISC-V Implementation
DSP ISA Extensions for an Open-Source RISC-V Implementation Davide Schiavone Davide Rossi Michael Gautschi Eric Flamand Andreas Traber Luca Benini Integrated Systems Laboratory Introduction: a typical
More informationCSIS1120A. 10. Instruction Set & Addressing Mode. CSIS1120A 10. Instruction Set & Addressing Mode 1
CSIS1120A 10. Instruction Set & Addressing Mode CSIS1120A 10. Instruction Set & Addressing Mode 1 Elements of a Machine Instruction Operation Code specifies the operation to be performed, e.g. ADD, SUB
More informationLaboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication
Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Introduction All processors offer some form of instructions to add, subtract, and manipulate data.
More informationRI5CY Core: Datasheet
RI5CY Core: Datasheet Instruction Interface rdata addr Data Interface addr wdata rdata 2 Prefetch Buffer Decoder 0 ALU LSU IF ID GPR ID EX CSR EX WB MULT Andreas Traber atraber@iis.ee.ethz.ch February
More informationCortex-R5 Software Development
Cortex-R5 Software Development Course Description Cortex-R5 software development is a three days ARM official course. The course goes into great depth, and provides all necessary know-how to develop software
More informationa) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.
CS3410 Spring 2015 Problem Set 2 (version 3) Due Saturday, April 25, 11:59 PM (Due date for Problem-5 is April 20, 11:59 PM) NetID: Name: 200 points total. Start early! This is a big problem set. Problem
More informationThe PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications
The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications 29.11.2017 Pasquale Davide Schiavone, Florian Zaruba Davide Rossi, Igor Loi, Antonio Pullini, Francesco
More informationCSX600 Runtime Software. User Guide
CSX600 Runtime Software User Guide Version 3.0 Document No. 06-UG-1345 Revision: 3.D January 2008 Table of contents Table of contents 1 Introduction................................................ 7 2
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function
William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data
More informationARM Cortex core microcontrollers 3. Cortex-M0, M4, M7
ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 Scherer Balázs Budapest University of Technology and Economics Department of Measurement and Information Systems BME-MIT 2018 Trends of 32-bit microcontrollers
More informationNEW CEIBO DEBUGGER. Menus and Commands
NEW CEIBO DEBUGGER Menus and Commands Ceibo Debugger Menus and Commands D.1. Introduction CEIBO DEBUGGER is the latest software available from Ceibo and can be used with most of Ceibo emulators. You will
More informationAn Introduction to Komodo
An Introduction to Komodo The Komodo debugger and simulator is the low-level debugger used in the Digital Systems Laboratory. Like all debuggers, Komodo allows you to run your programs under controlled
More informationComputer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview
Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is
More informationPerformance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,
More informationConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine
PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT
ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely
More informationComputer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview
Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a
More informationLab 03 - x86-64: atoi
CSCI0330 Intro Computer Systems Doeppner Lab 03 - x86-64: atoi Due: October 1, 2017 at 4pm 1 Introduction 1 2 Assignment 1 2.1 Algorithm 2 3 Assembling and Testing 3 3.1 A Text Editor, Makefile, and gdb
More informationCortex-M3/M4 Software Development
Cortex-M3/M4 Software Development Course Description Cortex-M3/M4 software development is a 3 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software
More informationSD Card Controller IP Specification
SD Card Controller IP Specification Marek Czerski Friday 30 th August, 2013 1 List of Figures 1 SoC with SD Card IP core................................ 4 2 Wishbone SD Card Controller IP Core interface....................
More informationConventions in this tutorial
This document provides an exercise using Digi JumpStart for Windows Embedded CE 6.0. This document shows how to develop, run, and debug a simple application on your target hardware platform. This tutorial
More informationChapter 15 ARM Architecture, Programming and Development Tools
Chapter 15 ARM Architecture, Programming and Development Tools Lesson 07 ARM Cortex CPU and Microcontrollers 2 Microcontroller CORTEX M3 Core 32-bit RALU, single cycle MUL, 2-12 divide, ETM interface,
More informationIntroducing the Latest SiFive RISC-V Core IP Series
Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance
More informationIntroduction. Overview and Getting Started. CS 161 Computer Security Lab 1 Buffer Overflows v.01 Due Date: September 17, 2012 by 11:59pm
Dawn Song Fall 2012 CS 161 Computer Security Lab 1 Buffer Overflows v.01 Due Date: September 17, 2012 by 11:59pm Introduction In this lab, you will get a hands-on approach to circumventing user permissions
More informationIntroduction to the ARM Architecture. or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref.
Introduction to the ARM Architecture or: a loose set of random facts blatantly copied from tech sheets and the Architecture Ref. Manual Glance into the past Initial ARM Processor developed by Acorn Computers,
More informationContents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides
Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science
More information5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.
Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction
More informationHercules ARM Cortex -R4 System Architecture. Processor Overview
Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features
More informationCopyright 2014 Xilinx
IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able
More informationDSP Mapping, Coding, Optimization
DSP Mapping, Coding, Optimization On TMS320C6000 Family using CCS (Code Composer Studio) ver 3.3 Started with writing a simple C code in the class, from scratch Project called First, written for C6713
More informationEvaluating RISC-V Cores for PULP
Evaluating RISC-V Cores for PULP An Open Parallel Ultra-Low-Power Platform www.pulp.ethz.ch 30 June 2015 Sven Stucki Antonio Pullini Michael Gautschi Frank K. Gürkaynak Andrea Marongiu Igor Loi Davide
More informationCodewarrior for ColdFire (Eclipse) 10.0 Setup
Codewarrior for ColdFire (Eclipse) 10.0 Setup 1. Goal This document is designed to ensure that your Codewarrior for Coldfire v10.0 environment is correctly setup and to orient you to it basic functionality
More informationCortex-A9 MPCore Software Development
Cortex-A9 MPCore Software Development Course Description Cortex-A9 MPCore software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop
More informationThe Next Steps in the Evolution of ARM Cortex-M
The Next Steps in the Evolution of ARM Cortex-M Joseph Yiu Senior Embedded Technology Manager CPU Group ARM Tech Symposia China 2015 November 2015 Trust & Device Integrity from Sensor to Server 2 ARM 2015
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More informationARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.
ARM CORTEX-R52 Course Family: ARMv8-R Cortex-R CPU Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture. Duration: 4 days Prerequisites and related
More informationUsing the KD30 Debugger
ELEC3730 Embedded Systems Tutorial 3 Using the KD30 Debugger 1 Introduction Overview The KD30 debugger is a powerful software tool that can greatly reduce the time it takes to develop complex programs
More informationProject #1 Exceptions and Simple System Calls
Project #1 Exceptions and Simple System Calls Introduction to Operating Systems Assigned: January 21, 2004 CSE421 Due: February 17, 2004 11:59:59 PM The first project is designed to further your understanding
More informationECE 206, Fall 2001: Lab 3
ECE 206, : Lab 3 Data Movement Instructions Learning Objectives This lab will give you practice with a number of LC-2 programming constructs. In particular you will cover the following topics: - Load/store
More informationCase study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor
Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda
More informationRM4 - Cortex-M7 implementation
Formation Cortex-M7 implementation: This course covers the Cortex-M7 V7E-M compliant CPU - Processeurs ARM: ARM Cores RM4 - Cortex-M7 implementation This course covers the Cortex-M7 V7E-M compliant CPU
More informationLaboratory Exercise 4
Laboratory Exercise Input/Output in an Embedded System The purpose of this exercise is to investigate the use of devices that provide input and output capabilities for a processor. There are two basic
More informationembos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.1 Revision: 0 Date: May 3, 2018
embos Real-Time Operating System Document: UM01025 Software Version: 3.1 Revision: 0 Date: May 3, 2018 A product of SEGGER Microcontroller GmbH www.segger.com 2 Disclaimer Specifications written in this
More informationFR Family MB Emulator System Getting Started Guide
FR Family MB2198-01 Emulator System Getting Started Guide Doc. No. 002-05222 Rev. *A Cypress Semiconductor 198 Champion Court San Jose, CA 95134-1709 http://www.cypress.com Copyrights Copyrights Cypress
More informationI/O - input/output. system components: CPU, memory, and bus -- now add I/O controllers and peripheral devices. CPU Cache
I/O - input/output system components: CPU, memory, and bus -- now add I/O controllers and peripheral devices CPU Cache CPU must perform all transfers to/from simple controller, e.g., CPU reads byte from
More informationEarly Software Development Through Emulation for a Complex SoC
Early Software Development Through Emulation for a Complex SoC FTF-NET-F0204 Raghav U. Nayak Senior Validation Engineer A P R. 2 0 1 4 TM External Use Session Objectives After completing this session you
More informationFredrick M. Cady. Assembly and С Programming forthefreescalehcs12 Microcontroller. шт.
SECOND шт. Assembly and С Programming forthefreescalehcs12 Microcontroller Fredrick M. Cady Department of Electrical and Computer Engineering Montana State University New York Oxford Oxford University
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More information1. Choose a module that you wish to implement. The modules are described in Section 2.4.
Chapter 2 Lab 2 - Datapath 2.1 Overview During lab 2 you will complete the RTL code of the ALU and MAC datapaths of the DSP core and write a set of small test programs to verify your implementation. Most
More informationChapter 7 Central Processor Unit (S08CPUV2)
Chapter 7 Central Processor Unit (S08CPUV2) 7.1 Introduction This section provides summary information about the registers, addressing modes, and instruction set of the CPU of the HCS08 Family. For a more
More informationDesign of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1
Design of Embedded DSP Processors Unit 7: Programming toolchain 9/26/2017 Unit 7 of TSEA26 2017 H1 1 Toolchain introduction There are two kinds of tools 1.The ASIP design tool for HW designers Frontend
More informationReminder: tutorials start next week!
Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected
More informationOptimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd
Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block
More informationFigure 1: 128-bit registers introduced by SSE. 128 bits. xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7
SE205 - TD1 Énoncé General Instructions You can download all source files from: https://se205.wp.mines-telecom.fr/td1/ SIMD-like Data-Level Parallelism Modern processors often come with instruction set
More informationUNIT- 5. Chapter 12 Processor Structure and Function
UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers
More informationDesign UART Loopback with Interrupts
Once the E is displayed, will the 0 reappear if you return the DIP switch to its OFF position and re-establish the loopback path? Usually not. When you break the loopback path, it will most likely truncate
More informationCSCE 5610: Computer Architecture
HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI
More informationARMv8-A Software Development
ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for
More informationLaboratory 1 Semester 1 11/12
CS2106 National University of Singapore School of Computing Laboratory 1 Semester 1 11/12 MATRICULATION NUMBER: In this lab exercise, you will get familiarize with some basic UNIX commands, editing and
More informationDisassemble the machine code present in any memory region. Single step through each assembly language instruction in the Nios II application.
Nios II Debug Client This tutorial presents an introduction to the Nios II Debug Client, which is used to compile, assemble, download and debug programs for Altera s Nios II processor. This tutorial presents
More informationPractical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim
Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim Ray Duran Staff Design Specialist FAE, Altera Corporation 408-544-7937
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point
More informationChapter 5. Introduction ARM Cortex series
Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1
More informationCPU Structure and Function
CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationRed Suite 4 Getting Started. Applies to Red Suite 4.22 or greater
Red Suite 4 Getting Started Applies to Red Suite 4.22 or greater March 26, 2012 Table of Contents 1 ABOUT THIS GUIDE... 3 1.1 WHO SHOULD USE IT... 3 2 RED SUITE 4... 4 2.1 NEW FEATURES IN RED SUITE 4...
More information_ V Renesas R8C In-Circuit Emulation. Contents. Technical Notes
_ V9.12. 225 Technical Notes Renesas R8C In-Circuit Emulation This document is intended to be used together with the CPU reference manual provided by the silicon vendor. This document assumes knowledge
More informationThe Embedded computing platform. Four-cycle handshake. Bus protocol. Typical bus signals. Four-cycle example. CPU bus.
The Embedded computing platform CPU bus. Memory. I/O devices. CPU bus Connects CPU to: memory; devices. Protocol controls communication between entities. Bus protocol Determines who gets to use the bus
More informationECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University
ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University Prof. Sunil P Khatri (Lab exercise created and tested by Ramu Endluri, He Zhou, Andrew Douglass
More informationResource 2 Embedded computer and development environment
Resource 2 Embedded computer and development environment subsystem The development system is a powerful and convenient tool for embedded computing applications. As shown below, the development system consists
More informationEmbedded Seminar in Shenzhen
in Shenzhen 1 hello world PC HELLO WORLD IDE Simulator - C 2 2 3 3 Architecture 6 Halfword and signed halfword / byte support System mode Thumb instruction set 4 4T Improved /Thumb Interworking CLZ Saturated
More informationChapter 12. CPU Structure and Function. Yonsei University
Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor
More informationCS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro
CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro 1 Levels of Representation/Interpretation Machine Interpretation High Level Language Program (e.g., C) Compiler Assembly
More informationECE 154A Introduction to. Fall 2012
ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 4: Arithmetic and Data Transfer Instructions Agenda Review of last lecture Logic and shift instructions Load/store instructionsi
More informationMachine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine
Machine Language Instructions Introduction Instructions Words of a language understood by machine Instruction set Vocabulary of the machine Current goal: to relate a high level language to instruction
More informationImplementing Secure Software Systems on ARMv8-M Microcontrollers
Implementing Secure Software Systems on ARMv8-M Microcontrollers Chris Shore, ARM TrustZone: A comprehensive security foundation Non-trusted Trusted Security separation with TrustZone Isolate trusted resources
More informationCPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt
CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions Robert Wagner cps 104 Int.1 RW Fall 2000 Interrupts Exceptions and Traps Interrupts, Exceptions and Traps are asynchronous
More informationMulti-core microcontroller design with Cortex-M processors and CoreSight SoC
Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are
More informationF28HS Hardware-Software Interface: Systems Programming
F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has
More informationInstructions: Language of the Computer
Instructions: Language of the Computer Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class
More informationTools Basics. Getting Started with Renesas Development Tools R8C/3LX Family
Getting Started with Renesas Development Tools R8C/3LX Family Description: The purpose of this lab is to allow a user new to the Renesas development environment to quickly come up to speed on the basic
More informationECE 471 Embedded Systems Lecture 5
ECE 471 Embedded Systems Lecture 5 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 17 September 2013 HW#1 is due Thursday Announcements For next class, at least skim book Chapter
More informationSpeeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools
Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools The hardware modules and descriptions referred to in this document are *NOT SUPPORTED* by Texas Instruments
More informationEECS 678: Intro to Operating Systems Programming Assignment 3: Virtual Memory in Nachos
EECS 678: Intro to Operating Systems Programming Assignment 3: Virtual Memory in Nachos 1. Introduction 2. Background 3. Assignment 4. Implementation Details 5. Implementation Overview 6. Testing and Validation
More informationARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
ARM Processor Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu CPU Architecture CPU & Memory address Memory data CPU 200 ADD r5,r1,r3 PC ICE3028:
More informationA superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.
CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a
More informationembos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.0 Revision: 0 Date: September 18, 2017
embos Real-Time Operating System embos plug-in for IAR C-Spy Debugger Document: UM01025 Software Version: 3.0 Revision: 0 Date: September 18, 2017 A product of SEGGER Microcontroller GmbH & Co. KG www.segger.com
More informationARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview
ARM Cortex-M4 Architecture and Instruction Set 1: Architecture Overview M J Brockway January 25, 2016 UM10562 All information provided in this document is subject to legal disclaimers. NXP B.V. 2014. All
More information