Exercise: RISC Programming

Size: px
Start display at page:

Download "Exercise: RISC Programming"

Transcription

1 Exercise: RISC Programming Increasing efficiency of a RISC-core with simple instruction extensions Michael Gautschi

2 Introduction The exercises in today will be performed on the Pulpino platform Open source platform [ OpenRISC / RISC-V core 32kB Instruction memory 32kB Data memory SPI (load/unload data) UART (for printf) Small event unit

3 Exercise Overview 1. Introduction example Compile & execute Helloworld 2. RTL Simulator basics Run motion_detection application [perf counters, traces, read] 3. Benchmarking Analyze performance improvements of the new instructions 4. Efficient matrix multiplications and convolutions with Dot-product Program a convolution and show the benefit of the dot product 5. Motion detection with efficient convolution Plug the optimized convolution into the application and observe the speedup 6. Compressed instructions on RISC-V Coremark analysis

4 Getting Started 1/2 Copy data from master account: $ mkdir 2_OpenRISC $ cp /home/soc_master/2_openrisc/pulpino.tar.gz 2_OpenRISC/. $ tar xzf pulpino.tar.gz 2_OpenRISC directory We will be working in the software (sw) and build directories rtl/ips-dir: Contains HDL source code sw-dir: contains application sourcecode (in apps) build-dir: Contains compiler and simulator outputs RTL-simulations will be run here vsim-dir: Contains all scripts for RTL compilation

5 Getting Started 2/2 We will be working on the scratch because we are going to generate some data 1. Create a build directory and set up the compiler $ mkdir /scratch/soc_xx/build_or10n 2. Configure the build directory $ cd /scratch/soc_xx/build_or10n $ cp ~soc_master/2_openrisc/pulpino/sw/cmake_configure.or1k.gcc.sh. In the configure script: Set the path to your exercise folder: PULP_GIT_DIRECTORY= /home/soc_xx/2_openrisc/pulpino $ or1k -g /cmake_configure.or1k.gcc.sh You have successfully set up the build directory! 3. Compile the RTL code $ make vcompile Lets get started with exercise 1!

6 Exercise 1 Introduction a) The build directory is created, the compiler is configured, the RTL is compiled. We are ready to start with a simple helloworld. b) Compile helloworld helloworld.c is located in sw/apps/helloworld/. To compile the application enter the build folder and run the makefile $ cd /scratch/soc_xx/build_or10n $ make helloworld.read : to generate the assembly $ make helloworld.slm.cmd : to generate input data for RTL simulations c) Compile & Run helloworld The application can be run in modelsim (gui) or in batch mode: $ make helloworld.vsim : to start modelsim (+type run al) $ make helloworld.vsimc : to run in batch mode Console should output helloworld Output is also written to the file: apps/helloworld/stdout/uart

7 Exercise 2 Basic Tests 1/4 We are now looking at a more complicated application: The motion_detection application To compile&run the application: $ make motion_detection.vsimc A timer is tracking how many cycles were required to compute the image The printf-output is sent over UART, and the testbench dumps the received data to the file: build_or10n/apps/sequential_tests/motion_detection/stdout/uart The testbench also outputs a trace file which allows to see in what sequence the instructions have been executed: build_or10n/apps/sequential_tests/motion_detection/trace_core00.log

8 Exercise 2 Basic Tests 2/4 To better understand what the compiler generated you can have a look at the disassembled code: $ make motion_detection.read Disassembled instructions PC Absolute and relative jump/branch targets Trace file: Time Cycle PC Instruction encoding ALU register update; load data to register; write to memory

9 Exercise 2 Basic Tests 3/4 Performance counters: In order to profile an application, the core supports several performance counters. Only one Counter exists in the micro-architecture to keep the area overhead small To count multiple events the program has to be run in sequence with different events configured The following events are of interest: Functions to set up performance counters: Name ID Counts: SPR_PCER_CYCLES 0 # cycles SPR_PCER_INSTR 1 # instructions SPR_PCER_LD_STALL 2 # load hazards SPR_PCER_LD 7 # load insn. SPR_PCER_ST 8 # store insn. SPR_PCER_JUMP 9 # jumps SPR_PCER_BRANCH 10 # branches SPR_PCER_DELAY_NOP 11 # delay nops perf_reset() perf_enable_id(id) perf_stop() cpu_perf_get(id) : to reset counters : start count event ID : stop counting : read counter

10 Exercise 2 Basic Tests 4/4 Tasks: How many kb is the binary? How big is the convolution_rect function? Profile the motion_detection algorithm: How many instructions are executed? How many load/stores were used? How many cycles were counted? What is the IPC (# instructions per cycle)?

11 Exercise 3 Benchmarking 1/6 We will benchmark a simple matrix multiplication: sw/apps/sequential_tests/matrixmul8/matrixmul.c sw/apps/sequential_tests/matrixmul8/matmul_kernels.c To have some quick cycle count feedback the timer is used: Include timer.h and use the functions: reset_timer() start_timer() stop_timer() get_time() Hardware loops Hardware loops are enabled by default To prevent the use of hardware loops in your application a flag has to be set: Open../matrixMul8/CMakeLists.txt and remove the flag: -mnohwloop If you recompile the application, the flag will be used for compilation automatically The compiler will generate the following hwloop instructions to produce efficient loops: - lp.start - lp.end - lp.count - lp.counti - lp.setup - lp.setupi

12 Exercise 3 Benchmarking 2/6 Tasks: Check if hardware loops are generated (in the matrixmul8.read file) What speedup do you expect when enabling hardware loops? How many instructions are actually saved? Compare the matrixmul8.read with and w/o hwloops. Do your measurements match your estimations? How do your results change if you set N, M to a constant? (in matmul8() ) int M = SIZE; int N = SIZE; Execution time: (# cycles/ % improvement) Baseline - Hardware loop (2 register set) Codesize [B]

13 Exercise 3 Benchmarking 3/6 Post increment immediate: Activated by default! Deactivate with mnopostmod Old MAC: Post increment register: From a hardware perspective, what is the drawback of this instruction? Multiply-accumulate instruction: Old architecture: Accumulation register stored in a special register Accumulation result can be accessed in two cycles New architecture: Enabled by default! Accumulates directly on the register file Disable with -mnomac New MAC:

14 Exercise 3 Benchmarking 4/6 Vector Instructions: Add, sub, comparisons are all supported in vector mode It is possible to process in parallel: One word Two halfwords, or Four bytes Check in the matrixmul.read if vector code is generated. Vector instructions have the format: lv.{sub,add,dotp, } Tasks: Run the matrixmul application with the different compiler options 1. no extensions: -mnohwloop -mnopostmod -mnomac 2. with hardware loops: -mnopostmod -mnomac 3. with post increment: -mnomac 4. with register mac:. Summarize your results in the first table on the next page

15 Exercise 3 Benchmarking 5/6 Instructions Cycles Codesize Total Reduction [%] Total Baseline - - +Hardware-loop +Post increment +mac +Dot product Speedup [%] [B] Use constant values for N, M to get a fair comparison What can be done better? Try to improve the matrix multiplication by using dot product operations (see next slide)

16 Exercise 3 Benchmarking 6/6 In order to speed up the multiplication with dot products we are first transposing matrix B (this leads to more efficient access patterns when loading vectors in the multiplication) In the second step we can load vectors of 4 chars, and use the Dotproduct and Sum of Dot-product instruction to compute one output pixel How many cycles are required to compute one output pixel?

17 Exercise 4: Efficient Convolutions (1/4) Convolutions are important kernels in image processing Convolutions are defined as: Let us consider a 5x5 window to compute the convolution For each output pixel we need 25 multiplications, and 24 additions, or 1 multiplication and 24 mac operations The Dot product instruction can do 4 multiplications, and 3 additions in a single cycle Hence, 1 Dot Product, and 6 Sum of Dot Product instructions are sufficient

18 Exercise 4: Efficient Convolutions (2/4) Look at the code given in (appname = convolution) apps/sequential_tests/convolution/conv_kernels.c The 5x5 convolution exists for 2 versions conv5x5_byte() and conv5x5_scalar() Check the difference in execution time In order to keep the complexity under control we will now look at a 3x3 kernel The scalar version conv3x3_scalar() is already functional The vector version conv3x3_byte() needs to be completed Task: Compare the two 5x5 convolution kernels Complete the 3x3 convolution kernel (see also next slide)

19 Exercise 4: Efficient Convolutions (3/4) The idea of the vector 3x3 convolution is: 1. Load vectors instead of bytes 2. Process one output pixel in each iteration 3. Use Dotp to maximize the throughput 1 iteration For each vertical column of the image: Initialize the vectors V1,V2 Move V2 -> V1 Move V1 -> V0 Load V2 (fresh data) Compute the convolution with three dot product instructions Move kernel 1 pixel down Switch to next vertical column

20 Exercise 4: Efficient Convolutions (4/4) Tasks: What speedup do you expect? Complete the table using the performance counters How many cycles are required to compute one output pixel? Total instructions Cycles Loads operations Total Reduction[x] Total Speedup [x] Total Reduction [x] 5x5: w/o dot product x5: With dot product 3x3: w/o dot product x3: With dot product Discuss your results with an assistant

21 Exercise 5: Motion detection with fast convolution (1/3) In this exercise we will focus again on the motion detection algorithm. apps/sequential_tests/motion_detection/motion_detection.c The algorithm is doing a bunch of image processing steps: Dilatation Erosion Convolution Etc. The computationally heaviest part is the convolution It is using a 3x3 convolution with a Sobel filter Datatypes are shorts (not bytes!)

22 Exercise 5: Motion detection with fast convolution (2/3) Tasks: Modify the convolution of exercise 4 in order to work with shorts See conv_fast.c Hints: Define 5 vectors V0-V4 Initialize V1-V4 in the beginning of a new column Use the shuffle instruction to combine V3 and V4 into V

23 Exercise 5: Motion detection with fast convolution (3/3) Tasks: Complete the table below (use performance counters to get the instructions/load operations) How do you expect your performance to change if you increase the image size? you can include the header img_40_40.h to see the difference Runtime will increase! Make sure debug outputs are deactivated! Total instructions Cycles Load operations Total Reduction [%] Total Speedup [%] Total Reduction [%] 10x10: w/o dot product 10x10: With dot product 40x40: w/o dot product 40x40: With dot product

24 Exercise 6 RISC-V compressed Instructions (1/2) In this exercise we are going to use the new RISC-V core Not all instructions have been ported yet The core supports 32 bit and compressed 16bit instructions Create a build folder for RISC-V: $ cd /scratch/soc_xx/build_riscv Configure the build folder: $ cp ~soc_master/2_openrisc/pulpino/sw/cmake_configure.riscv.gcc.sh. In the configure script: Set the path to your exercise folder: PULP_GIT_DIRECTORY= /home/soc_xx/2_openrisc/pulpino $ riscv -g2.2.8./cmake_configure.riscv.gcc.sh To switch between compressed and uncompressed instructions set the RVC flag Set RVC=1 in cmake_configure.riscv.gcc.sh to enable compressed instructions Source the configure script again Compile the RTL: $ make vcompile : compiles Pulpino with the RISC-V core

25 Exercise 6 RISC-V compressed Instructions (2/2) Coremark is a core comparison benchmark Independent of frequency Coremark/MHz score = 10^6 / (#ticks) The higher the better Tasks: Run coremark on RISC-V and compute the score (make coremark.vsimc) Run coremark with compressed instruction Go to ARM homepage and compare it to your results RISC-V RISC-V (Compressed) Cortex M0 Cortex M4 Score Size Score Size Score Score

26 Questions & Answers You have successfully completed the exercise You can find sample solutions under: (after the exercise) ~soc_master/2_openrisc/solutions If you are interested in a mini-project we can offer you: Implement a program on Pulpino (e.g. a game) Use the LCD display of the Zedboard Implementation and optimization of a benchmark using the multicore pulp environment See last exercise about the pulp architecture RISC-V core architecture development. Analysis of: Mini core VLIW architecture We are open to your own ideas!

Exercise: OpenRISC Programming

Exercise: OpenRISC Programming Exercise: OpenRISC Programming Increasing efficiency of the OpenRISC core with simple instruction extensions 23.03.2015 Michael Gautschi Antonio Pullini Introduction All exercises will be performed on

More information

DSP ISA Extensions for an Open-Source RISC-V Implementation

DSP ISA Extensions for an Open-Source RISC-V Implementation DSP ISA Extensions for an Open-Source RISC-V Implementation Davide Schiavone Davide Rossi Michael Gautschi Eric Flamand Andreas Traber Luca Benini Integrated Systems Laboratory Introduction: a typical

More information

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm

More information

DSP Mapping, Coding, Optimization

DSP Mapping, Coding, Optimization DSP Mapping, Coding, Optimization On TMS320C6000 Family using CCS (Code Composer Studio) ver 3.3 Started with writing a simple C code in the class, from scratch Project called First, written for C6713

More information

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining 1 Measuring Performance Two primary metrics: wall clock time (response time for a program) and throughput

More information

Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm

Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm Second Semester, 2016 17 Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm Instruction: Submit your answers electronically through

More information

ECE 498 Linux Assembly Language Lecture 1

ECE 498 Linux Assembly Language Lecture 1 ECE 498 Linux Assembly Language Lecture 1 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 13 November 2012 Assembly Language: What s it good for? Understanding at a low-level what

More information

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points] Review Questions 1 The DRAM problem [5 points] Suggest a solution 2 Big versus Little Endian Addressing [5 points] Consider the 32-bit hexadecimal number 0x21d3ea7d. 1. What is the binary representation

More information

L2 - C language for Embedded MCUs

L2 - C language for Embedded MCUs Formation C language for Embedded MCUs: Learning how to program a Microcontroller (especially the Cortex-M based ones) - Programmation: Langages L2 - C language for Embedded MCUs Learning how to program

More information

CPU Structure and Function

CPU Structure and Function CPU Structure and Function Chapter 12 Lesson 17 Slide 1/36 Processor Organization CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Lesson 17 Slide 2/36 CPU With Systems

More information

Research Collection. KISS PULPino - Updates on PULPino updates on PULPino. Other Conference Item. ETH Library

Research Collection. KISS PULPino - Updates on PULPino updates on PULPino. Other Conference Item. ETH Library Research Collection Other Conference Item KISS PULPino - Updates on PULPino updates on PULPino Author(s): Pullini, Antonio; Gautschi, Michael; Gürkaynak, Frank Kagan; Glaser, Florian; Mach, Stefan; Rovere,

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

Introducing the Latest SiFive RISC-V Core IP Series

Introducing the Latest SiFive RISC-V Core IP Series Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance

More information

Lab Objectives. 2. Preparations. 3. Signing in. 4. Examining the Host Environment. 5. Part A: Introduction to AVR Studio. 5.

Lab Objectives. 2. Preparations. 3. Signing in. 4. Examining the Host Environment. 5. Part A: Introduction to AVR Studio. 5. Lab 0 1. Objectives Learn how to use AVR studio, an Integrated Development Environment (IDE) for developing AVR applications in Windows environments, to debug and run an AVR assembly program. Understand

More information

16.1. Unit 16. Computer Organization Design of a Simple Processor

16.1. Unit 16. Computer Organization Design of a Simple Processor 6. Unit 6 Computer Organization Design of a Simple Processor HW SW 6.2 You Can Do That Cloud & Distributed Computing (CyberPhysical, Databases, Data Mining,etc.) Applications (AI, Robotics, Graphics, Mobile)

More information

EE382M 15: Assignment 2

EE382M 15: Assignment 2 EE382M 15: Assignment 2 Professor: Lizy K. John TA: Jee Ho Ryoo Department of Electrical and Computer Engineering University of Texas, Austin Due: 11:59PM September 28, 2014 1. Introduction The goal of

More information

SD Card Controller IP Specification

SD Card Controller IP Specification SD Card Controller IP Specification Marek Czerski Friday 30 th August, 2013 1 List of Figures 1 SoC with SD Card IP core................................ 4 2 Wishbone SD Card Controller IP Core interface....................

More information

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda

More information

Exercise 6: PULP Programming

Exercise 6: PULP Programming Exercise 6: PULP Programming Introduction to the PULP Computing Platform 24.05.2016 Antonio Pullini Michael Gautschi Davide Schiavone Integrated Systems Laboratory How efficient do we need to be? Integrated

More information

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a

More information

Iterated Calculations and Summing ARNs

Iterated Calculations and Summing ARNs Iterated Calculations and Summing ARNs INTRODUCTION This knowledge base article discusses how to iterate calculations and sum the results in the ARN (Analysis Results Node) via a script. One reason why

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/27 CS4617 Computer Architecture Lecture 7: Instruction Set Architectures Dr J Vaughan October 1, 2014 2/27 ISA Classification Stack architecture: operands on top of stack Accumulator architecture: 1

More information

Designing with ALTERA SoC Hardware

Designing with ALTERA SoC Hardware Designing with ALTERA SoC Hardware Course Description This course provides all theoretical and practical know-how to design ALTERA SoC devices under Quartus II software. The course combines 60% theory

More information

CS/COE1541: Introduction to Computer Architecture

CS/COE1541: Introduction to Computer Architecture CS/COE1541: Introduction to Computer Architecture Dept. of Computer Science University of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/1541p/index.html 1 Computer Architecture? Application pull Operating

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage. CS3410 Spring 2015 Problem Set 2 (version 3) Due Saturday, April 25, 11:59 PM (Due date for Problem-5 is April 20, 11:59 PM) NetID: Name: 200 points total. Start early! This is a big problem set. Problem

More information

1. Getting started with GPUs and the DAS-4 For information about the DAS-4 supercomputer, please go to:

1. Getting started with GPUs and the DAS-4 For information about the DAS-4 supercomputer, please go to: 1. Getting started with GPUs and the DAS-4 For information about the DAS-4 supercomputer, please go to: http://www.cs.vu.nl/das4/ For information about the special GPU node hardware in the DAS-4 go to

More information

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University Prof. Sunil P Khatri (Lab exercise created and tested by Ramu Endluri, He Zhou, Andrew Douglass

More information

CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM

CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM Covered topics: 1) pipeline, hazards, and instruction scheduling. 2) pipeline implementation. 3) Cache Organization and

More information

Practice Assignment 1

Practice Assignment 1 German University in Cairo Practice Assignment 1 Dr. Haytham El Miligi Ahmed Hesham Mohamed Khaled Lydia Sidhom Assume that in a given program: 1 Performance Metrics 1.1 IPC and CPI 1.1.1 1. 15% of instructions

More information

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 This homework is to be done individually. Total 9 Questions, 100 points 1. (8

More information

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Introduction All processors offer some form of instructions to add, subtract, and manipulate data.

More information

Programming Assignment Multi-Threading and Debugging 2

Programming Assignment Multi-Threading and Debugging 2 Programming Assignment Multi-Threading and Debugging 2 Due Date: Friday, June 1 @ 11:59 pm PAMT2 Assignment Overview The purpose of this mini-assignment is to continue your introduction to parallel programming

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 04

More information

ADPCM: Adaptive Differential Pulse Code Modulation

ADPCM: Adaptive Differential Pulse Code Modulation ADPCM: Adaptive Differential Pulse Code Modulation Motivation and introduction This is the final exercise. You have three weeks to complete this exercise, but you will need these three weeks! In this exercise,

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications

The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications The PULP Cores: A Set of Open-Source Ultra-Low- Power RISC-V Cores for Internet-of-Things Applications 29.11.2017 Pasquale Davide Schiavone, Florian Zaruba Davide Rossi, Igor Loi, Antonio Pullini, Francesco

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS

More information

RISC-V CUSTOMIZATION WITH STUDIO 8

RISC-V CUSTOMIZATION WITH STUDIO 8 RISC-V CUSTOMIZATION WITH STUDIO 8 Zdeněk Přikryl CTO, Codasip GmbH WHO IS CODASIP Leading provider of RISC-V processor IP Introduced its first RISC-V processor in November 2015 Offers its own portfolio

More information

Evaluating MMX Technology Using DSP and Multimedia Applications

Evaluating MMX Technology Using DSP and Multimedia Applications Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Diploma in Embedded Systems

Diploma in Embedded Systems Diploma in Embedded Systems Duration: 5 Months[5 days a week,3 hours a day, Total 300 hours] Module 1: 8051 Microcontroller in Assemble Language Characteristics of Embedded System Overview of 8051 Family

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

Ultra Low Power Microcontroller - Design Criteria - June 2017

Ultra Low Power Microcontroller - Design Criteria - June 2017 Ultra Low Power Microcontroller - Design Criteria - June 2017 Agenda 1. Low power technology features 2. Intelligent Clock Generator 3. Short wake-up times 4. Intelligent memory access 5. Use case scenario

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Practical Malware Analysis

Practical Malware Analysis Practical Malware Analysis Ch 4: A Crash Course in x86 Disassembly Revised 1-16-7 Basic Techniques Basic static analysis Looks at malware from the outside Basic dynamic analysis Only shows you how the

More information

CS 152 Laboratory Exercise 5

CS 152 Laboratory Exercise 5 CS 152 Laboratory Exercise 5 Professor: Krste Asanovic TA: Christopher Celio Department of Electrical Engineering & Computer Science University of California, Berkeley April 11, 2012 1 Introduction and

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

Designing with ALTERA SoC

Designing with ALTERA SoC Designing with ALTERA SoC תיאורהקורס קורסזהמספקאתכלהידע התיאורטיוהמעשילתכנוןרכיביSoC שלחברתALTERA תחתסביבת הפיתוחII.Quartus הקורסמשלב 60% תיאוריהו- 40% עבודה מעשית עללוחותפיתוח.SoC הקורסמתחילבסקירתמשפחותרכבי

More information

COMP2121 Introductory Experiment

COMP2121 Introductory Experiment COMP2121 Introductory Experiment Objectives: In this introductory experiment, you will: Learn how to use AVR studio, an Integrated Development Environment (IDE) for developing AVR applications in Windows

More information

Project #1 Exceptions and Simple System Calls

Project #1 Exceptions and Simple System Calls Project #1 Exceptions and Simple System Calls Introduction to Operating Systems Assigned: January 21, 2004 CSE421 Due: February 17, 2004 11:59:59 PM The first project is designed to further your understanding

More information

esi-risc Development Suite Getting Started Guide

esi-risc Development Suite Getting Started Guide 1 Contents 1 Contents 2 2 Overview 3 3 Starting the Integrated Development Environment 4 4 Hello World Tutorial 5 5 Next Steps 8 6 Support 10 Version 2.5 2 of 10 2011 EnSilica Ltd, All Rights Reserved

More information

Department of Computer Science and Engineering Yonghong Yan

Department of Computer Science and Engineering Yonghong Yan Appendix A and Chapter 2.12: Compiler, Assembler, Linker and Program Execution CSCE 212 Introduction to Computer Architecture, Spring 2019 https://passlab.github.io/csce212/ Department of Computer Science

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Introduction to RISC-V

Introduction to RISC-V Introduction to RISC-V Jielun Tan, James Connolly February, 2019 Overview What is RISC-V Why RISC-V ISA overview Software environment Beta testing What is RISC-V RISC-V (pronounced risk-five ) is an open,

More information

Introduction to Supercomputing

Introduction to Supercomputing Introduction to Supercomputing TMA4280 Introduction to UNIX environment and tools 0.1 Getting started with the environment and the bash shell interpreter Desktop computers are usually operated from a graphical

More information

University of Colorado at Colorado Springs CS4500/ Fall 2018 Operating Systems Project 1 - System Calls and Processes

University of Colorado at Colorado Springs CS4500/ Fall 2018 Operating Systems Project 1 - System Calls and Processes University of Colorado at Colorado Springs CS4500/5500 - Fall 2018 Operating Systems Project 1 - System Calls and Processes Instructor: Yanyan Zhuang Total Points: 100 Out: 8/29/2018 Due: 11:59 pm, Friday,

More information

Computer Systems and -architecture

Computer Systems and -architecture Computer Systems and -architecture Project 5: Datapath 1 Ba INF 2018-2019 Brent van Bladel brent.vanbladel@uantwerpen.be Don t hesitate to contact the teaching assistant of this course. M.G.305 or by e-mail.

More information

Use Vivado to build an Embedded System

Use Vivado to build an Embedded System Introduction This lab guides you through the process of using Vivado to create a simple ARM Cortex-A9 based processor design targeting the ZedBoard development board. You will use Vivado to create the

More information

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

ISA and RISCV. CASS 2018 Lavanya Ramapantulu ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides Slide Set 1 for ENCM 369 Winter 2014 Lecture Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Winter Term, 2014 ENCM 369 W14 Section

More information

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 3 Prof. Patrick Crowley Plan for Today Announcements Readings are extremely important! No class meeting next Monday Questions Commentaries A few remaining

More information

EXPERIMENT 1. FAMILIARITY WITH DEBUG, x86 REGISTERS and MACHINE INSTRUCTIONS

EXPERIMENT 1. FAMILIARITY WITH DEBUG, x86 REGISTERS and MACHINE INSTRUCTIONS EXPERIMENT 1 FAMILIARITY WITH DEBUG, x86 REGISTERS and MACHINE INSTRUCTIONS Pre-lab: This lab introduces you to a software tool known as DEBUG. Before the lab session, read the first two sections of chapter

More information

Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 10.1.p02)

Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 10.1.p02) Supported platforms & compilers Required software Where to download the packages Geant4 toolkit installation (release 10.1.p02) Using CMake Building a Geant4 application with CMake Example of a Geant4

More information

LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection

LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection This tutorial will introduce you to high-level synthesis (HLS) concepts using LegUp. You will apply HLS to a real problem:

More information

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Optimizing Models of an FPGA Embedded System. Adam Donlin Xilinx Research Labs September 2004

Optimizing Models of an FPGA Embedded System. Adam Donlin Xilinx Research Labs September 2004 Optimizing Models of an FPGA Embedded System Adam Donlin Xilinx Research Labs September 24 Outline Target System Architecture Model Optimizations and Simulation Impact Port Datatypes Threads and Methods

More information

Hands-on. MPI basic exercises

Hands-on. MPI basic exercises WIFI XSF-UPC: Username: xsf.convidat Password: 1nt3r3st3l4r WIFI EDUROAM: Username: roam06@bsc.es Password: Bsccns.4 MareNostrum III User Guide http://www.bsc.es/support/marenostrum3-ug.pdf Remember to

More information

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Abstract: ARM is one of the most licensed and thus widespread processor

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Lab Exercise 4 System on chip Implementation of a system on chip system on the Zynq

Lab Exercise 4 System on chip Implementation of a system on chip system on the Zynq Lab Exercise 4 System on chip Implementation of a system on chip system on the Zynq INF3430/INF4431 Autumn 2016 Version 1.2/06.09.2016 This lab exercise consists of 4 parts, where part 4 is compulsory

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended

More information

UVM-based RISC-V processor verification platform. Tao Liu, Richard Ho, Udi Jonnalagadda

UVM-based RISC-V processor verification platform. Tao Liu, Richard Ho, Udi Jonnalagadda UVM-based RISC-V processor verification platform Tao Liu, Richard Ho, Udi Jonnalagadda Agenda Motivation What makes a good instruction generator Random instruction generation flow RTL and ISS co-simulation

More information

Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm

Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Second Semester, 2015 16 Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Instruction: Submit your answers electronically through

More information

ECE 4750 Computer Architecture, Fall 2015 T16 Advanced Processors: VLIW Processors

ECE 4750 Computer Architecture, Fall 2015 T16 Advanced Processors: VLIW Processors ECE 4750 Computer Architecture, Fall 2015 T16 Advanced Processors: VLIW Processors School of Electrical and Computer Engineering Cornell University revision: 2015-11-30-13-42 1 Motivating VLIW Processors

More information

Evaluating Performance Via Profiling

Evaluating Performance Via Profiling Performance Engineering of Software Systems September 21, 2010 Massachusetts Institute of Technology 6.172 Professors Saman Amarasinghe and Charles E. Leiserson Handout 6 Profiling Project 2-1 Evaluating

More information

Embedded System Design and Modeling EE382N.23, Fall 2017

Embedded System Design and Modeling EE382N.23, Fall 2017 Embedded System Design and Modeling EE382N.23, Fall 2017 Homework #1 Design Languages Assigned: September 6, 2017 Due: September 18, 2017 September 20, 2017 Instructions: Please submit your solutions via

More information

Lecture 4: Instruction Set Design/Pipelining

Lecture 4: Instruction Set Design/Pipelining Lecture 4: Instruction Set Design/Pipelining Instruction set design (Sections 2.9-2.12) control instructions instruction encoding Basic pipelining implementation (Section A.1) 1 Control Transfer Instructions

More information

Lab 1: Using the LegUp High-level Synthesis Framework

Lab 1: Using the LegUp High-level Synthesis Framework Lab 1: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you can compile

More information

Chapter 2 Operating-System Structures

Chapter 2 Operating-System Structures This chapter will discuss the following concepts: 2.1 Operating System Services 2.2 User Operating System Interface 2.3 System Calls 2.4 System Programs 2.5 Operating System Design and Implementation 2.6

More information

Adesto Serial Flash Demo Kit: Quick Start Guide

Adesto Serial Flash Demo Kit: Quick Start Guide Adesto Serial Flash Demo Kit: Quick Start Guide Introduction: This document will provide a simple step-by-step description of how to make use of the Adesto Serial Flash Demo Kit which is comprised of an

More information

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm

Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm 1 Introduction

More information

Using Proposed Vector and Crypto Extensions For Fast and Secure Boot Several case studies in the use of the proposed cryptographic ISA extensions

Using Proposed Vector and Crypto Extensions For Fast and Secure Boot Several case studies in the use of the proposed cryptographic ISA extensions Power Matters. Using Proposed Vector and Crypto Extensions For Fast and Secure Boot Several case studies in the use of the proposed cryptographic ISA extensions G. Richard Newell from Microsemi Corp. and

More information

Copyright 2014 Xilinx

Copyright 2014 Xilinx IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able

More information

embos Real Time Operating System CPU & Compiler specifics for RENESAS M16C CPUs and HEW workbench Document Rev. 1

embos Real Time Operating System CPU & Compiler specifics for RENESAS M16C CPUs and HEW workbench Document Rev. 1 embos Real Time Operating System CPU & Compiler specifics for RENESAS M16C CPUs and HEW workbench Document Rev. 1 A product of SEGGER Microcontroller GmbH & Co. KG www.segger.com 2/28 embos for M16C CPUs

More information

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1 Lecture - 4 Measurement Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1 Acknowledgements David Patterson Dr. Roger Kieckhafer 9/29/2009 2 Computer Architecture is Design and Analysis

More information

07 - Program Flow Control

07 - Program Flow Control September 23, 2014 Schedule change this week The lecture on thursday needs to move Lab computers The current computer lab (Bussen) is pretty nice since it has dual monitors However, the computers does

More information

Lab-1: Profiling/Optimizing Video Decoder Using ADS. National Chiao Tung University Chun-Jen Tsai 3/3/2011

Lab-1: Profiling/Optimizing Video Decoder Using ADS. National Chiao Tung University Chun-Jen Tsai 3/3/2011 Lab-1: Profiling/Optimizing Video Decoder Using ADS National Chiao Tung University Chun-Jen Tsai 3/3/2011 Profiling MPEG-4 SP Decoder Goal: Profiling and optimizing the MPEG-4 video decoder, m4v_dec Tasks:

More information

From Gates to Compilers: Putting it All Together

From Gates to Compilers: Putting it All Together From Gates to Compilers: Putting it All Together CS250 Laboratory 4 (Version 111814) Written by Colin Schmidt Adapted from Ben Keller Overview In this lab, you will continue to build upon the Sha3 accelerator,

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information