DSP Core Instruction Set Architecture Design. Shih-Chieh Chang

Size: px
Start display at page:

Download "DSP Core Instruction Set Architecture Design. Shih-Chieh Chang"

Transcription

1 DSP Core Instruction Set Architecture Design Shih-Chieh Chang

2 Overview of Proposed Architecture Modified Harvard architecture (PM, XDM, YDM) Two parallel instructions per cycle Five-stage pipeline Zero-overhead Loop Dual-ALU and Dual-MAC Sixteen address generation units(agu) Compound Instructions

3 Propose Architecture

4 Experimental Results We have implemented the proposed RISC+DSP core in synthesizable Verilog HDL and targeted towards TSMC 0.35 µm cell library. 100Kgates (ex memory); 120MHz

5 Previous Results Experimental results show that our design gives reasonable computational power and provide higher code density. The results demonstrate that the compound instruction is useful to improve the performance and code density. However, compound instructions make the control logic more complex. = Control logic designed by Dynamic PLAs.

6 Dynamic PLA φ φ 1 a b AND plane O1 P1 P2 P3 P4 φ d1 OR plane O2 O3

7 Dynamic PLAs vs Standard Cell Static Standard Cell Dynamic PLA Performance Tools Capacity Slow, variation in path delay Synthesis/placemen t/routing Limited by tool Fast, all paths delays similar Layout generator, not easy to migrate to new tech. Limited by maximum size.

8 Dynamic PLAs vs Standard Cell Static Standard Cell Dynamic PLA Predictabili ty Unknown delay and area until before routing. Potentially large change with small logic modification. Wire delay is predictable Area and delay can be easily predicted. Very slightly change with logic modification.

9 Dynamic PLAs Previous design examples Dynamic PLA style is adopted in IBM 1 GHz processor [1998] Arrays of dynamic PLAs vs Standard cell style: 15% less delay [2000,Berkeley] Dynamic PlA vs standard cell style: 50% less delay in a real industrial control unit design in a microprocessor

10 Dynamic PLA compiler Architecture selection Floor plan design Leaf Cell design Buffer resizing) Layout generation by Skill code

11 SOP => PLA Layout

12 Compiler Optimizations with DSP-Specific Semantic Descriptions Yung-Chia Lin, NTHU, Taiwan Yuan-Shin Hwang, NTOU, Taiwan Jenq Kuen Lee, NTHU, Taiwan (Also in LCPC 2002)

13 DSPI: Digital Signal Processing Interface The design of Digital Signal Processing Interface ( DSPI ) is to provide: DSP-specific operation semantics Architectural preference for code generation Additional information for directive-based optimizations to work with compilers. DSPI is like the method used in OpenMP and HPF, but specialized in DSP-related domain.

14 Primitive Design of DSPI Using C/C++ standard compiler directive description: #pragma dspi directive-name [clause [,clause] ] Directives are categorized into: Environment Depiction Operation Intention Data Storage Characterization Parallelism Commentary

15 Environment Depiction To specify compiler optimization directives and user control options, like global data format, data precision, To control portability #pragma dspi algorithm_type fxp /* fixed-point computation */ #pragma dspi int_type 32bit /* integer is 32bits fixed-point number */

16 Operation Intention To specify DSPspecific computations To provide multilevel semantics Basic math operators, vector operators, matrix operators, etc. #pragma dspi fxp a:=<16:16> #pragma dspi fxp b:=<16:16> #pragma dspi fxp sadd:=<16:16>,saturated #pragma dspi op sadd:=a.add<b> Int sadd(int a, int b) /* saturated addition of two 32bits fixedpoint number */ { } Operator Examples: Basic Operator.ADD Addition Vector Operator Matrix Operator.CONV.KRONT Convolution Tensor product

17 Data Storage Characterization To provide manipulation knowledge for DSPcustomary data types. To specify accessing features that utilize specific architectural ability, like circular buffer #pragma dspi fxp a:=<24:16> int a; /* 24bits fixed-point number, 16 bits integer part */ #pragma dspi fxp d:=<24:16>,circular int d[160]; /* circular buffer of above type */

18 Parallelism Commentary To provide data parallelism constructs and simultaneous operation semantics To assist compiler in ILP, SIMD, and multiprocessor code generation To match synonymy in OpenMP #pragma dspi parallel_region [shared:var1 ] { } #pragma dspi parallel_for [schedule:=chunksize] for( ) { }

19 On-going Retargetable Compiler Construction We are developing the infrastructure based on SUIF and SPAM. We are working on retargetable code generation with ADL. Source Program Directive Preprocessor SUIF Front-end S U IF IR Architectural Code Generator Low-level IR Architecture-Reform -Interphase S p ecified Transformations Target Machine Specification Optim ization Layers Target Code

20 Preliminary Experiments Due to that the compiler construction is still in progress, we use a semi-manual code generation to evaluate some early experiments. We did only limited transformations of DSPI directives in our two test-suites; one is adopted from DSPstone, the other is the reference G codec from ITU-T. Target processor in this experiment is TI TMS320 C6

21 Experiments of Small Kernels Name conv mat1x3 lms fir iir n_complex matrix multiplication least mean square FIR filter Description convolution IIR filter in biquads complex number computation DSPI.CONV, parallelism, parallelism,.firf, parallelism,.firf, circular buffer,.iirf, circular buffer, complex data,

22 Results Illustration % Benchmark Testsuite 1 (32bit) original DSPI % Benchmark Testsuite 2 (16bit) original DSPI % % Execution Time 80.00% 60.00% 40.00% Execution Time 80.00% 60.00% 40.00% 20.00% 20.00% 0.00% conv mat1x3 lms fir iir 0.00% mat1x3 iir n_complex

23 Experiments of Large Applications G reference codec program is far larger than small kernels in test-suite1. The entire program is loaded into first-level off-core memory and on-core memory is enabled as cache when executing Cycles Program Coder 6.3K Coder 5.3K Decoder 6.3K Decoder 5.3K Original DSPI

24 Results Illustration Cycles Coder 6.3K Coder 5.3K Original DSPI Cycles Decoder 6.3K Decoder 5.3K Original DSPI

25 Decomposition of Instruction Decoder for Low Power Design TingTing Hwang

26 Motivation Execution-frequency of instructions is uneven Decomposition of instruction decoder Most of time, only a small component of decoder is activated

27 Instruction Occurrence Instruction execution frequency ARM 7TDMI instruction set (130 instructions) Statistics from DSPStone ADC(3) ADD(3) AND(3) B(2) BIC(3) CMN(3) CMP(3) EOR(3) LDM(12) LDR(30) MUL(6) MOV(3) MVN(3) ORR(3) RSB(3) RSC(3) SBC(3) STR(18) SUB(3) TEQ(3) TST(3) STM(12) SWP(2) Instruction Group (instruction count)

28 Model for Decomposed Decoder Activate Control Instruction Decoder... FF1 FF2 FF32 Instruction Control FSM Instr. Decoder instruction... Instr. Decoder_1... Instr. Decoder_2 Control Signals Decoder Intermediate code Activate ControlControl Signal Decoder... Intermediate code Other Information Instruction State Registers Control Signals Registers Control Signals Decoder_1 Control Signals Decoder_

29 Methods Instruction Decoder From instruction decode-tree (op-code) Control Signal Decoder From output control signals Intermediate code re-assignment

30 Benchmarking Process Target on ARM 7tdmi Apply to control path Instruction Decoder Control Signal Decoder Power estimated on DSPStone and Powerstone with block activated

31 Results-DSPStone

32 Results-Powerstone

A Bit of History. Program Mem Data Memory. CPU (Central Processing Unit) I/O (Input/Output) Von Neumann Architecture. CPU (Central Processing Unit)

A Bit of History. Program Mem Data Memory. CPU (Central Processing Unit) I/O (Input/Output) Von Neumann Architecture. CPU (Central Processing Unit) Memory COncepts Address Contents Memory is divided into addressable units, each with an address (like an array with indices) Addressable units are usually larger than a bit, typically 8, 16, 32, or 64

More information

Chapter 15. ARM Architecture, Programming and Development Tools

Chapter 15. ARM Architecture, Programming and Development Tools Chapter 15 ARM Architecture, Programming and Development Tools Lesson 4 ARM CPU 32 bit ARM Instruction set 2 Basic Programming Features- ARM code size small than other RISCs 32-bit un-segmented memory

More information

ARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems

ARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems ARM Processors ARM Microprocessor 1 ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems stems 1 2 ARM Design Philosophy hl h Low power

More information

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng 1 EMBEDDED SYSTEM HW CPUS Seungryoul Maeng 2 CPUs Types of Processors CPU Performance Instruction Sets Processors used in ES 3 Processors used in ES 4 Processors used in Embedded Systems RISC type ARM

More information

Writing ARM Assembly. Steven R. Bagley

Writing ARM Assembly. Steven R. Bagley Writing ARM Assembly Steven R. Bagley Hello World B main hello DEFB Hello World\n\0 goodbye DEFB Goodbye Universe\n\0 ALIGN main ADR R0, hello ; put address of hello string in R0 SWI 3 ; print it out ADR

More information

STEVEN R. BAGLEY ARM: PROCESSING DATA

STEVEN R. BAGLEY ARM: PROCESSING DATA STEVEN R. BAGLEY ARM: PROCESSING DATA INTRODUCTION CPU gets instructions from the computer s memory Each instruction is encoded as a binary pattern (an opcode) Assembly language developed as a human readable

More information

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University Chapter 2 Instructions Sets Hsung-Pin Chang Department of Computer Science National ChungHsing University Outline Instruction Preliminaries ARM Processor SHARC Processor 2.1 Instructions Instructions sets

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 3

ECE 571 Advanced Microprocessor-Based Design Lecture 3 ECE 571 Advanced Microprocessor-Based Design Lecture 3 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 22 January 2013 The ARM Architecture 1 Brief ARM History ACORN Wanted a chip

More information

Processor Status Register(PSR)

Processor Status Register(PSR) ARM Registers Register internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has 13 general-purpose registers R0-R12 1 Stack Pointer (SP) R13

More information

Digital Signal Processor Core Technology

Digital Signal Processor Core Technology The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x

More information

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part 2

ARM Cortex-A9 ARM v7-a. A programmer s perspective Part 2 ARM Cortex-A9 ARM v7-a A programmer s perspective Part 2 ARM Instructions General Format Inst Rd, Rn, Rm, Rs Inst Rd, Rn, #0ximm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7

More information

Chapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018

Chapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018 Chapters 3 ARM Assembly Embedded Systems with ARM Cortext-M Updated: Wednesday, February 7, 2018 Programming languages - Categories Interpreted based on the machine Less complex, not as efficient Efficient,

More information

Basic ARM InstructionS

Basic ARM InstructionS Basic ARM InstructionS Instructions include various fields that encode combinations of Opcodes and arguments special fields enable extended functions (more in a minute) several 4-bit OPERAND fields, for

More information

LAB 1 Using Visual Emulator. Download the emulator https://salmanarif.bitbucket.io/visual/downloads.html Answer to questions (Q)

LAB 1 Using Visual Emulator. Download the emulator https://salmanarif.bitbucket.io/visual/downloads.html Answer to questions (Q) LAB 1 Using Visual Emulator Download the emulator https://salmanarif.bitbucket.io/visual/downloads.html Answer to questions (Q) Load the following Program in Visual Emulator ; The purpose of this program

More information

ARM Assembly Language. Programming

ARM Assembly Language. Programming Outline: ARM Assembly Language the ARM instruction set writing simple programs examples Programming hands-on: writing simple ARM assembly programs 2005 PEVE IT Unit ARM System Design ARM assembly language

More information

ECE 471 Embedded Systems Lecture 5

ECE 471 Embedded Systems Lecture 5 ECE 471 Embedded Systems Lecture 5 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 17 September 2013 HW#1 is due Thursday Announcements For next class, at least skim book Chapter

More information

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University ARM Instruction Set Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Condition Field (1) Most ARM instructions can be conditionally

More information

ARM. Assembly Language and Machine Code. Goal: Blink an LED

ARM. Assembly Language and Machine Code. Goal: Blink an LED ARM Assembly Language and Machine Code Goal: Blink an LED Review Turning on an LED Connect LED to GPIO 20 3.3V 1k GND 1 -> 3.3V 0 -> 0.0V (GND) Two Steps 1. Configure GPIO20 to be an OUTPUT 2. "Set" GPIO20

More information

Microprocessors vs. DSPs (ESC-223)

Microprocessors vs. DSPs (ESC-223) Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

ARM. Assembly Language and Machine Code. Goal: Blink an LED

ARM. Assembly Language and Machine Code. Goal: Blink an LED ARM Assembly Language and Machine Code Goal: Blink an LED Memory Map 100000000 16 4 GB Peripheral registers are mapped into address space Memory-Mapped IO (MMIO) MMIO space is above physical memory 020000000

More information

Independent DSP Benchmarks: Methodologies and Results. Outline

Independent DSP Benchmarks: Methodologies and Results. Outline Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline

More information

ARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University ARM Processor Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu CPU Architecture CPU & Memory address Memory data CPU 200 ADD r5,r1,r3 PC ICE3028:

More information

ARM Architecture and Instruction Set

ARM Architecture and Instruction Set AM Architecture and Instruction Set Ingo Sander ingo@imit.kth.se AM Microprocessor Core AM is a family of ISC architectures, which share the same design principles and a common instruction set AM does

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) Encoding of instructions raises some interesting choices Tradeoffs: performance, compactness, programmability Uniformity. Should different instructions Be the same size

More information

ARM Instruction Set. Computer Organization and Assembly Languages Yung-Yu Chuang. with slides by Peng-Sheng Chen

ARM Instruction Set. Computer Organization and Assembly Languages Yung-Yu Chuang. with slides by Peng-Sheng Chen ARM Instruction Set Computer Organization and Assembly Languages g Yung-Yu Chuang with slides by Peng-Sheng Chen Introduction The ARM processor is easy to program at the assembly level. (It is a RISC)

More information

ARM Instruction Set. Introduction. Memory system. ARM programmer model. The ARM processor is easy to program at the

ARM Instruction Set. Introduction. Memory system. ARM programmer model. The ARM processor is easy to program at the Introduction ARM Instruction Set The ARM processor is easy to program at the assembly level. (It is a RISC) We will learn ARM assembly programming at the user level l and run it on a GBA emulator. Computer

More information

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential

More information

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc. Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

ECE 498 Linux Assembly Language Lecture 5

ECE 498 Linux Assembly Language Lecture 5 ECE 498 Linux Assembly Language Lecture 5 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 29 November 2012 Clarifications from Lecture 4 What is the Q saturate status bit? Some

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2014 Conclusions - Instruction set design An assembly language instruction set must be more efficient than Junior Accelerations shall be implemented at arithmetic and algorithmic levels.

More information

Lecture 15 ARM Processor A RISC Architecture

Lecture 15 ARM Processor A RISC Architecture CPE 390: Microprocessor Systems Fall 2017 Lecture 15 ARM Processor A RISC Architecture Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1 Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng 1 EMBEDDED SYSTEM HW CPUS Seungryoul Maeng 2 CPUs Types of Processors CPU Performance Instruction Sets Processors used in ES 3 Processors Single Purpose ( Hardware ) General Purpose ( Software ) Application

More information

ARM Assembly Programming

ARM Assembly Programming Introduction ARM Assembly Programming The ARM processor is very easy to program at the assembly level. (It is a RISC) We will learn ARM assembly programming at the user level and run it on a GBA emulator.

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

3. The Instruction Set

3. The Instruction Set 3. The Instruction Set We now know what the ARM provides by way of memory and registers, and the sort of instructions to manipulate them.this chapter describes those instructions in great detail. As explained

More information

18-349: Introduction to Embedded Real- Time Systems Lecture 3: ARM ASM

18-349: Introduction to Embedded Real- Time Systems Lecture 3: ARM ASM 18-349: Introduction to Embedded Real- Time Systems Lecture 3: ARM ASM Anthony Rowe Electrical and Computer Engineering Carnegie Mellon University Lecture Overview Exceptions Overview (Review) Pipelining

More information

Developing an environment for embedded software energy estimation

Developing an environment for embedded software energy estimation Computer Standards & Interfaces 28 (25) 15 158 www.elsevier.com/locate/csi Developing an environment for embedded software energy estimation S. Nikolaidis a, A. Chatzigeorgiou b, T. Laopoulos a, T a Department

More information

Chapter 15. ARM Architecture, Programming and Development Tools

Chapter 15. ARM Architecture, Programming and Development Tools Chapter 15 ARM Architecture, Programming and Development Tools Lesson 5 ARM 16-bit Thumb Instruction Set 2 - Thumb 16 bit subset Better code density than 32-bit architecture instruction set 3 Basic Programming

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

Research Progress on Compilers for DSP Cores with Specifications

Research Progress on Compilers for DSP Cores with Specifications Research Progress on Compilers for DSP Cores with Specifications Ching-ren Lee, Jenq-Kuen Lee crlee@pllab.cs.nthu.edu.tw, jklee@cs.nthu.edu.tw Programming Language Research Lab. Department of Computer

More information

CSE 410. Operating Systems

CSE 410. Operating Systems CSE 410 Operating Systems Handout: syllabus 1 Today s Lecture Course organization Computing environment Overview of course topics 2 Course Organization Course website http://www.cse.msu.edu/~cse410/ Syllabus

More information

ARM Assembly Language

ARM Assembly Language ARM Assembly Language Introduction to ARM Basic Instruction Set Microprocessors and Microcontrollers Course Isfahan University of Technology, Dec. 2010 1 Main References The ARM Architecture Presentation

More information

Enabling the design of multicore SoCs with ARM cores and programmable accelerators

Enabling the design of multicore SoCs with ARM cores and programmable accelerators Enabling the design of multicore SoCs with ARM cores and programmable accelerators Target Compiler Technologies www.retarget.com Sol Bergen-Bartel China Business Development 03 Target Compiler Technologies

More information

Embedded C for High Performance DSP Programming with the CoSy Compiler Development System

Embedded C for High Performance DSP Programming with the CoSy Compiler Development System Embedded C for High Performance DSP Programming with the CoSy Compiler Development System Marcel Beemster/Yoichi Sugiyama ACE Associated Compiler Experts/Japan Novel Corporation contact: yo_sugi@jnovel.co.jp

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components

More information

Worksheet #4 Condition Code Flag and Arithmetic Operations

Worksheet #4 Condition Code Flag and Arithmetic Operations Name: Student ID: Date: Name: Student ID: Objectives Worksheet #4 Condition Code Flag and Arithmetic Operations To understand the arithmetic operations of numeric values To comprehend the usage of arithmetic

More information

C6000 Compiler Roadmap

C6000 Compiler Roadmap C6000 Compiler Roadmap CGT v7.4 CGT v7.3 CGT v7. CGT v8.0 CGT C6x v8. CGT Longer Term In Development Production Early Adopter Future CGT v7.2 reactive Current 3H2 4H 4H2 H H2 Future CGT C6x v7.3 Control

More information

Impact of Source-Level Loop Optimization on DSP Architecture Design

Impact of Source-Level Loop Optimization on DSP Architecture Design Impact of Source-Level Loop Optimization on DSP Architecture Design Bogong Su Jian Wang Erh-Wen Hu Andrew Esguerra Wayne, NJ 77, USA bsuwpc@frontier.wilpaterson.edu Wireless Speech and Data Nortel Networks,

More information

Introduction to Digital Logic Missouri S&T University CPE 2210 Hardware Implementations

Introduction to Digital Logic Missouri S&T University CPE 2210 Hardware Implementations Introduction to Digital Logic Missouri S&T University CPE 2210 Hardware Implementations Egemen K. Çetinkaya Egemen K. Çetinkaya Department of Electrical & Computer Engineering Missouri University of Science

More information

An Optimizing Compiler for the TMS320C25 DSP Chip

An Optimizing Compiler for the TMS320C25 DSP Chip An Optimizing Compiler for the TMS320C25 DSP Chip Wen-Yen Lin, Corinna G Lee, and Paul Chow Published in Proceedings of the 5th International Conference on Signal Processing Applications and Technology,

More information

The ARM processor. Morgan Kaufman ed Overheads for Computers as Components

The ARM processor. Morgan Kaufman ed Overheads for Computers as Components The ARM processor Born in Acorn on 1983, after the success achieved by the BBC Micro released on 1982. Acorn is a really smaller company than most of the USA competitors, therefore it initially develops

More information

Typical DSP application

Typical DSP application DSP markets DSP markets Typical DSP application TI DSP History: Modem applications 1982 TMS32010, TI introduces its first programmable general-purpose DSP to market Operating at 5 MIPS. It was ideal for

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

MNEMONIC OPERATION ADDRESS / OPERAND MODES FLAGS SET WITH S suffix ADC

MNEMONIC OPERATION ADDRESS / OPERAND MODES FLAGS SET WITH S suffix ADC ECE425 MNEMONIC TABLE MNEMONIC OPERATION ADDRESS / OPERAND MODES FLAGS SET WITH S suffix ADC Adds operands and Carry flag and places value in destination register ADD Adds operands and places value in

More information

Evaluation of Static and Dynamic Scheduling for Media Processors. Overview

Evaluation of Static and Dynamic Scheduling for Media Processors. Overview Evaluation of Static and Dynamic Scheduling for Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Wayne Wolf Overview Media Processing Present and Future Evaluation

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths

Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee and M. Sato RIKEN AICS (Advanced Institute for Computational

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Digital Electronics & Computer Engineering (E85) Harris Spring 2018

Digital Electronics & Computer Engineering (E85) Harris Spring 2018 Digital Electronics & Computer Engineering (E85) Harris Spring 2018 Final Exam This is a closed-book take-home exam. Electronic devices including calculators are not allowed, except on the computer question

More information

Lode DSP Core. Features. Overview

Lode DSP Core. Features. Overview Features Two multiplier accumulator units Single cycle 16 x 16-bit signed and unsigned multiply - accumulate 40-bit arithmetic logical unit (ALU) Four 40-bit accumulators (32-bit + 8 guard bits) Pre-shifter,

More information

Distributed Vision Processing in Smart Camera Networks

Distributed Vision Processing in Smart Camera Networks Distributed Vision Processing in Smart Camera Networks CVPR-07 Hamid Aghajan, Stanford University, USA François Berry, Univ. Blaise Pascal, France Horst Bischof, TU Graz, Austria Richard Kleihorst, NXP

More information

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant

More information

Lecture 4 (part 2): Data Transfer Instructions

Lecture 4 (part 2): Data Transfer Instructions Lecture 4 (part 2): Data Transfer Instructions CSE 30: Computer Organization and Systems Programming Diba Mirza Dept. of Computer Science and Engineering University of California, San Diego Assembly Operands:

More information

DSP Platforms Lab (AD-SHARC) Session 05

DSP Platforms Lab (AD-SHARC) Session 05 University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming

More information

CPU1. D $, 16-K Dual Ported South UPA

CPU1. D $, 16-K Dual Ported South UPA MAJC-5200: A High Performance Microprocessor for Multimedia Computing Subramania Sudharsanan Sun Microsystems, Inc., Palo Alto, CA 94303, USA Abstract. The newly introduced Microprocessor Architecture

More information

VE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS

VE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS VE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS Introduction to 32 bit Processors, ARM Architecture, ARM cortex M3, 32 bit ARM Instruction set, Thumb Instruction set, Exception

More information

IMAGINE: Signal and Image Processing Using Streams

IMAGINE: Signal and Image Processing Using Streams IMAGINE: Signal and Image Processing Using Streams Brucek Khailany William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong, John D. Owens, Brian Towles Concurrent VLSI Architecture

More information

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Frédéric Pétrot and Nicolas Fournel System Level Synthesis

More information

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2016 1 ISA is the HW/SW

More information

Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling

Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling Shao-Yun Fang, Chung-Wei Lin, Guang-Wan Liao, and Yao-Wen Chang March 26, 2013 Graduate Institute of Electronics Engineering

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

ARM Cortex-M4 Architecture and Instruction Set 2: General Data Processing Instructions

ARM Cortex-M4 Architecture and Instruction Set 2: General Data Processing Instructions ARM Cortex-M4 Architecture and Instruction Set 2: General Data Processing Instructions M J Brockway January 31, 2016 Cortex-M4 Machine Instructions - simple example... main FUNCTION ; initialize registers

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

CprE 488 Embedded Systems Design. Lecture 3 Processors and Memory

CprE 488 Embedded Systems Design. Lecture 3 Processors and Memory CprE 488 Embedded Systems Design Lecture 3 Processors and Memory Joseph Zambreno Electrical and Computer Engineering Iowa State University www.ece.iastate.edu/~zambreno rcl.ece.iastate.edu Although computer

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan ARM Programmers Model Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu Current program status register (CPSR) Prog Model 2 Data processing

More information

Benchmarking Processors for DSP Applications

Benchmarking Processors for DSP Applications Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Processors for DSP Applications Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California 94704 USA

More information

Basic Computer Architecture

Basic Computer Architecture Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I

More information

One instruction specifies multiple operations All scheduling of execution units is static

One instruction specifies multiple operations All scheduling of execution units is static VLIW Architectures Very Long Instruction Word Architecture One instruction specifies multiple operations All scheduling of execution units is static Done by compiler Static scheduling should mean less

More information

On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms

On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms Shih-Hao Hung, Po-Hsun Chiu, Chia-Heng Tu, Wei-Ting Chou and Wen-Long Yang Graduate Institute of Networking

More information

MetaRTL: Raising the Abstraction Level of RTL Design

MetaRTL: Raising the Abstraction Level of RTL Design MetaRTL: Raising the Abstraction Level of RTL Design Jianwen Zhu Electrical and Computer Engineering University of Toronto March 16, 2001 zhu@eecg.toronto.edu http://www.eecg.toronto.edu/ zhu DATE 2001,

More information

The ARM Instruction Set

The ARM Instruction Set The ARM Instruction Set Minsoo Ryu Department of Computer Science and Engineering Hanyang University msryu@hanyang.ac.kr Topics Covered Data Processing Instructions Branch Instructions Load-Store Instructions

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

F28HS Hardware-Software Interface: Systems Programming

F28HS Hardware-Software Interface: Systems Programming F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

Evaluating Inter-cluster Communication in Clustered VLIW Architectures

Evaluating Inter-cluster Communication in Clustered VLIW Architectures Evaluating Inter-cluster Communication in Clustered VLIW Architectures Anup Gangwar Embedded Systems Group, Department of Computer Science and Engineering, Indian Institute of Technology Delhi September

More information