DSP Core Instruction Set Architecture Design. Shih-Chieh Chang
|
|
- Prosper Maxwell
- 5 years ago
- Views:
Transcription
1 DSP Core Instruction Set Architecture Design Shih-Chieh Chang
2 Overview of Proposed Architecture Modified Harvard architecture (PM, XDM, YDM) Two parallel instructions per cycle Five-stage pipeline Zero-overhead Loop Dual-ALU and Dual-MAC Sixteen address generation units(agu) Compound Instructions
3 Propose Architecture
4 Experimental Results We have implemented the proposed RISC+DSP core in synthesizable Verilog HDL and targeted towards TSMC 0.35 µm cell library. 100Kgates (ex memory); 120MHz
5 Previous Results Experimental results show that our design gives reasonable computational power and provide higher code density. The results demonstrate that the compound instruction is useful to improve the performance and code density. However, compound instructions make the control logic more complex. = Control logic designed by Dynamic PLAs.
6 Dynamic PLA φ φ 1 a b AND plane O1 P1 P2 P3 P4 φ d1 OR plane O2 O3
7 Dynamic PLAs vs Standard Cell Static Standard Cell Dynamic PLA Performance Tools Capacity Slow, variation in path delay Synthesis/placemen t/routing Limited by tool Fast, all paths delays similar Layout generator, not easy to migrate to new tech. Limited by maximum size.
8 Dynamic PLAs vs Standard Cell Static Standard Cell Dynamic PLA Predictabili ty Unknown delay and area until before routing. Potentially large change with small logic modification. Wire delay is predictable Area and delay can be easily predicted. Very slightly change with logic modification.
9 Dynamic PLAs Previous design examples Dynamic PLA style is adopted in IBM 1 GHz processor [1998] Arrays of dynamic PLAs vs Standard cell style: 15% less delay [2000,Berkeley] Dynamic PlA vs standard cell style: 50% less delay in a real industrial control unit design in a microprocessor
10 Dynamic PLA compiler Architecture selection Floor plan design Leaf Cell design Buffer resizing) Layout generation by Skill code
11 SOP => PLA Layout
12 Compiler Optimizations with DSP-Specific Semantic Descriptions Yung-Chia Lin, NTHU, Taiwan Yuan-Shin Hwang, NTOU, Taiwan Jenq Kuen Lee, NTHU, Taiwan (Also in LCPC 2002)
13 DSPI: Digital Signal Processing Interface The design of Digital Signal Processing Interface ( DSPI ) is to provide: DSP-specific operation semantics Architectural preference for code generation Additional information for directive-based optimizations to work with compilers. DSPI is like the method used in OpenMP and HPF, but specialized in DSP-related domain.
14 Primitive Design of DSPI Using C/C++ standard compiler directive description: #pragma dspi directive-name [clause [,clause] ] Directives are categorized into: Environment Depiction Operation Intention Data Storage Characterization Parallelism Commentary
15 Environment Depiction To specify compiler optimization directives and user control options, like global data format, data precision, To control portability #pragma dspi algorithm_type fxp /* fixed-point computation */ #pragma dspi int_type 32bit /* integer is 32bits fixed-point number */
16 Operation Intention To specify DSPspecific computations To provide multilevel semantics Basic math operators, vector operators, matrix operators, etc. #pragma dspi fxp a:=<16:16> #pragma dspi fxp b:=<16:16> #pragma dspi fxp sadd:=<16:16>,saturated #pragma dspi op sadd:=a.add<b> Int sadd(int a, int b) /* saturated addition of two 32bits fixedpoint number */ { } Operator Examples: Basic Operator.ADD Addition Vector Operator Matrix Operator.CONV.KRONT Convolution Tensor product
17 Data Storage Characterization To provide manipulation knowledge for DSPcustomary data types. To specify accessing features that utilize specific architectural ability, like circular buffer #pragma dspi fxp a:=<24:16> int a; /* 24bits fixed-point number, 16 bits integer part */ #pragma dspi fxp d:=<24:16>,circular int d[160]; /* circular buffer of above type */
18 Parallelism Commentary To provide data parallelism constructs and simultaneous operation semantics To assist compiler in ILP, SIMD, and multiprocessor code generation To match synonymy in OpenMP #pragma dspi parallel_region [shared:var1 ] { } #pragma dspi parallel_for [schedule:=chunksize] for( ) { }
19 On-going Retargetable Compiler Construction We are developing the infrastructure based on SUIF and SPAM. We are working on retargetable code generation with ADL. Source Program Directive Preprocessor SUIF Front-end S U IF IR Architectural Code Generator Low-level IR Architecture-Reform -Interphase S p ecified Transformations Target Machine Specification Optim ization Layers Target Code
20 Preliminary Experiments Due to that the compiler construction is still in progress, we use a semi-manual code generation to evaluate some early experiments. We did only limited transformations of DSPI directives in our two test-suites; one is adopted from DSPstone, the other is the reference G codec from ITU-T. Target processor in this experiment is TI TMS320 C6
21 Experiments of Small Kernels Name conv mat1x3 lms fir iir n_complex matrix multiplication least mean square FIR filter Description convolution IIR filter in biquads complex number computation DSPI.CONV, parallelism, parallelism,.firf, parallelism,.firf, circular buffer,.iirf, circular buffer, complex data,
22 Results Illustration % Benchmark Testsuite 1 (32bit) original DSPI % Benchmark Testsuite 2 (16bit) original DSPI % % Execution Time 80.00% 60.00% 40.00% Execution Time 80.00% 60.00% 40.00% 20.00% 20.00% 0.00% conv mat1x3 lms fir iir 0.00% mat1x3 iir n_complex
23 Experiments of Large Applications G reference codec program is far larger than small kernels in test-suite1. The entire program is loaded into first-level off-core memory and on-core memory is enabled as cache when executing Cycles Program Coder 6.3K Coder 5.3K Decoder 6.3K Decoder 5.3K Original DSPI
24 Results Illustration Cycles Coder 6.3K Coder 5.3K Original DSPI Cycles Decoder 6.3K Decoder 5.3K Original DSPI
25 Decomposition of Instruction Decoder for Low Power Design TingTing Hwang
26 Motivation Execution-frequency of instructions is uneven Decomposition of instruction decoder Most of time, only a small component of decoder is activated
27 Instruction Occurrence Instruction execution frequency ARM 7TDMI instruction set (130 instructions) Statistics from DSPStone ADC(3) ADD(3) AND(3) B(2) BIC(3) CMN(3) CMP(3) EOR(3) LDM(12) LDR(30) MUL(6) MOV(3) MVN(3) ORR(3) RSB(3) RSC(3) SBC(3) STR(18) SUB(3) TEQ(3) TST(3) STM(12) SWP(2) Instruction Group (instruction count)
28 Model for Decomposed Decoder Activate Control Instruction Decoder... FF1 FF2 FF32 Instruction Control FSM Instr. Decoder instruction... Instr. Decoder_1... Instr. Decoder_2 Control Signals Decoder Intermediate code Activate ControlControl Signal Decoder... Intermediate code Other Information Instruction State Registers Control Signals Registers Control Signals Decoder_1 Control Signals Decoder_
29 Methods Instruction Decoder From instruction decode-tree (op-code) Control Signal Decoder From output control signals Intermediate code re-assignment
30 Benchmarking Process Target on ARM 7tdmi Apply to control path Instruction Decoder Control Signal Decoder Power estimated on DSPStone and Powerstone with block activated
31 Results-DSPStone
32 Results-Powerstone
A Bit of History. Program Mem Data Memory. CPU (Central Processing Unit) I/O (Input/Output) Von Neumann Architecture. CPU (Central Processing Unit)
Memory COncepts Address Contents Memory is divided into addressable units, each with an address (like an array with indices) Addressable units are usually larger than a bit, typically 8, 16, 32, or 64
More informationChapter 15. ARM Architecture, Programming and Development Tools
Chapter 15 ARM Architecture, Programming and Development Tools Lesson 4 ARM CPU 32 bit ARM Instruction set 2 Basic Programming Features- ARM code size small than other RISCs 32-bit un-segmented memory
More informationARM Processors ARM ISA. ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems
ARM Processors ARM Microprocessor 1 ARM 1 in 1985 By 2001, more than 1 billion ARM processors shipped Widely used in many successful 32-bit embedded systems stems 1 2 ARM Design Philosophy hl h Low power
More informationCS 310 Embedded Computer Systems CPUS. Seungryoul Maeng
1 EMBEDDED SYSTEM HW CPUS Seungryoul Maeng 2 CPUs Types of Processors CPU Performance Instruction Sets Processors used in ES 3 Processors used in ES 4 Processors used in Embedded Systems RISC type ARM
More informationWriting ARM Assembly. Steven R. Bagley
Writing ARM Assembly Steven R. Bagley Hello World B main hello DEFB Hello World\n\0 goodbye DEFB Goodbye Universe\n\0 ALIGN main ADR R0, hello ; put address of hello string in R0 SWI 3 ; print it out ADR
More informationSTEVEN R. BAGLEY ARM: PROCESSING DATA
STEVEN R. BAGLEY ARM: PROCESSING DATA INTRODUCTION CPU gets instructions from the computer s memory Each instruction is encoded as a binary pattern (an opcode) Assembly language developed as a human readable
More informationChapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University
Chapter 2 Instructions Sets Hsung-Pin Chang Department of Computer Science National ChungHsing University Outline Instruction Preliminaries ARM Processor SHARC Processor 2.1 Instructions Instructions sets
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationECE 571 Advanced Microprocessor-Based Design Lecture 3
ECE 571 Advanced Microprocessor-Based Design Lecture 3 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 22 January 2013 The ARM Architecture 1 Brief ARM History ACORN Wanted a chip
More informationProcessor Status Register(PSR)
ARM Registers Register internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has 13 general-purpose registers R0-R12 1 Stack Pointer (SP) R13
More informationDigital Signal Processor Core Technology
The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x
More informationARM Cortex-A9 ARM v7-a. A programmer s perspective Part 2
ARM Cortex-A9 ARM v7-a A programmer s perspective Part 2 ARM Instructions General Format Inst Rd, Rn, Rm, Rs Inst Rd, Rn, #0ximm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7
More informationChapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018
Chapters 3 ARM Assembly Embedded Systems with ARM Cortext-M Updated: Wednesday, February 7, 2018 Programming languages - Categories Interpreted based on the machine Less complex, not as efficient Efficient,
More informationBasic ARM InstructionS
Basic ARM InstructionS Instructions include various fields that encode combinations of Opcodes and arguments special fields enable extended functions (more in a minute) several 4-bit OPERAND fields, for
More informationLAB 1 Using Visual Emulator. Download the emulator https://salmanarif.bitbucket.io/visual/downloads.html Answer to questions (Q)
LAB 1 Using Visual Emulator Download the emulator https://salmanarif.bitbucket.io/visual/downloads.html Answer to questions (Q) Load the following Program in Visual Emulator ; The purpose of this program
More informationARM Assembly Language. Programming
Outline: ARM Assembly Language the ARM instruction set writing simple programs examples Programming hands-on: writing simple ARM assembly programs 2005 PEVE IT Unit ARM System Design ARM assembly language
More informationECE 471 Embedded Systems Lecture 5
ECE 471 Embedded Systems Lecture 5 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 17 September 2013 HW#1 is due Thursday Announcements For next class, at least skim book Chapter
More informationARM Instruction Set Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
ARM Instruction Set Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Condition Field (1) Most ARM instructions can be conditionally
More informationARM. Assembly Language and Machine Code. Goal: Blink an LED
ARM Assembly Language and Machine Code Goal: Blink an LED Review Turning on an LED Connect LED to GPIO 20 3.3V 1k GND 1 -> 3.3V 0 -> 0.0V (GND) Two Steps 1. Configure GPIO20 to be an OUTPUT 2. "Set" GPIO20
More informationMicroprocessors vs. DSPs (ESC-223)
Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com
More informationAn introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures
An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?
More informationARM. Assembly Language and Machine Code. Goal: Blink an LED
ARM Assembly Language and Machine Code Goal: Blink an LED Memory Map 100000000 16 4 GB Peripheral registers are mapped into address space Memory-Mapped IO (MMIO) MMIO space is above physical memory 020000000
More informationIndependent DSP Benchmarks: Methodologies and Results. Outline
Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline
More informationARM Processor. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
ARM Processor Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu CPU Architecture CPU & Memory address Memory data CPU 200 ADD r5,r1,r3 PC ICE3028:
More informationARM Architecture and Instruction Set
AM Architecture and Instruction Set Ingo Sander ingo@imit.kth.se AM Microprocessor Core AM is a family of ISC architectures, which share the same design principles and a common instruction set AM does
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA) Encoding of instructions raises some interesting choices Tradeoffs: performance, compactness, programmability Uniformity. Should different instructions Be the same size
More informationARM Instruction Set. Computer Organization and Assembly Languages Yung-Yu Chuang. with slides by Peng-Sheng Chen
ARM Instruction Set Computer Organization and Assembly Languages g Yung-Yu Chuang with slides by Peng-Sheng Chen Introduction The ARM processor is easy to program at the assembly level. (It is a RISC)
More informationARM Instruction Set. Introduction. Memory system. ARM programmer model. The ARM processor is easy to program at the
Introduction ARM Instruction Set The ARM processor is easy to program at the assembly level. (It is a RISC) We will learn ARM assembly programming at the user level l and run it on a GBA emulator. Computer
More informationSeparating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance
Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential
More informationConfigurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.
Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the
More informationVector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks
Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor
More informationECE 498 Linux Assembly Language Lecture 5
ECE 498 Linux Assembly Language Lecture 5 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 29 November 2012 Clarifications from Lecture 4 What is the Q saturate status bit? Some
More information04 - DSP Architecture and Microarchitecture
September 11, 2014 Conclusions - Instruction set design An assembly language instruction set must be more efficient than Junior Accelerations shall be implemented at arithmetic and algorithmic levels.
More informationLecture 15 ARM Processor A RISC Architecture
CPE 390: Microprocessor Systems Fall 2017 Lecture 15 ARM Processor A RISC Architecture Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationDesign of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1
Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later
More informationArchitectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.
Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central
More informationCS 310 Embedded Computer Systems CPUS. Seungryoul Maeng
1 EMBEDDED SYSTEM HW CPUS Seungryoul Maeng 2 CPUs Types of Processors CPU Performance Instruction Sets Processors used in ES 3 Processors Single Purpose ( Hardware ) General Purpose ( Software ) Application
More informationARM Assembly Programming
Introduction ARM Assembly Programming The ARM processor is very easy to program at the assembly level. (It is a RISC) We will learn ARM assembly programming at the user level and run it on a GBA emulator.
More informationAdvance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts
Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism
More informationVertex Shader Design I
The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More information3. The Instruction Set
3. The Instruction Set We now know what the ARM provides by way of memory and registers, and the sort of instructions to manipulate them.this chapter describes those instructions in great detail. As explained
More information18-349: Introduction to Embedded Real- Time Systems Lecture 3: ARM ASM
18-349: Introduction to Embedded Real- Time Systems Lecture 3: ARM ASM Anthony Rowe Electrical and Computer Engineering Carnegie Mellon University Lecture Overview Exceptions Overview (Review) Pipelining
More informationDeveloping an environment for embedded software energy estimation
Computer Standards & Interfaces 28 (25) 15 158 www.elsevier.com/locate/csi Developing an environment for embedded software energy estimation S. Nikolaidis a, A. Chatzigeorgiou b, T. Laopoulos a, T a Department
More informationChapter 15. ARM Architecture, Programming and Development Tools
Chapter 15 ARM Architecture, Programming and Development Tools Lesson 5 ARM 16-bit Thumb Instruction Set 2 - Thumb 16 bit subset Better code density than 32-bit architecture instruction set 3 Basic Programming
More informationBetter sharc data such as vliw format, number of kind of functional units
Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com
More informationResearch Progress on Compilers for DSP Cores with Specifications
Research Progress on Compilers for DSP Cores with Specifications Ching-ren Lee, Jenq-Kuen Lee crlee@pllab.cs.nthu.edu.tw, jklee@cs.nthu.edu.tw Programming Language Research Lab. Department of Computer
More informationCSE 410. Operating Systems
CSE 410 Operating Systems Handout: syllabus 1 Today s Lecture Course organization Computing environment Overview of course topics 2 Course Organization Course website http://www.cse.msu.edu/~cse410/ Syllabus
More informationARM Assembly Language
ARM Assembly Language Introduction to ARM Basic Instruction Set Microprocessors and Microcontrollers Course Isfahan University of Technology, Dec. 2010 1 Main References The ARM Architecture Presentation
More informationEnabling the design of multicore SoCs with ARM cores and programmable accelerators
Enabling the design of multicore SoCs with ARM cores and programmable accelerators Target Compiler Technologies www.retarget.com Sol Bergen-Bartel China Business Development 03 Target Compiler Technologies
More informationEmbedded C for High Performance DSP Programming with the CoSy Compiler Development System
Embedded C for High Performance DSP Programming with the CoSy Compiler Development System Marcel Beemster/Yoichi Sugiyama ACE Associated Compiler Experts/Japan Novel Corporation contact: yo_sugi@jnovel.co.jp
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components
More informationWorksheet #4 Condition Code Flag and Arithmetic Operations
Name: Student ID: Date: Name: Student ID: Objectives Worksheet #4 Condition Code Flag and Arithmetic Operations To understand the arithmetic operations of numeric values To comprehend the usage of arithmetic
More informationC6000 Compiler Roadmap
C6000 Compiler Roadmap CGT v7.4 CGT v7.3 CGT v7. CGT v8.0 CGT C6x v8. CGT Longer Term In Development Production Early Adopter Future CGT v7.2 reactive Current 3H2 4H 4H2 H H2 Future CGT C6x v7.3 Control
More informationImpact of Source-Level Loop Optimization on DSP Architecture Design
Impact of Source-Level Loop Optimization on DSP Architecture Design Bogong Su Jian Wang Erh-Wen Hu Andrew Esguerra Wayne, NJ 77, USA bsuwpc@frontier.wilpaterson.edu Wireless Speech and Data Nortel Networks,
More informationIntroduction to Digital Logic Missouri S&T University CPE 2210 Hardware Implementations
Introduction to Digital Logic Missouri S&T University CPE 2210 Hardware Implementations Egemen K. Çetinkaya Egemen K. Çetinkaya Department of Electrical & Computer Engineering Missouri University of Science
More informationAn Optimizing Compiler for the TMS320C25 DSP Chip
An Optimizing Compiler for the TMS320C25 DSP Chip Wen-Yen Lin, Corinna G Lee, and Paul Chow Published in Proceedings of the 5th International Conference on Signal Processing Applications and Technology,
More informationThe ARM processor. Morgan Kaufman ed Overheads for Computers as Components
The ARM processor Born in Acorn on 1983, after the success achieved by the BBC Micro released on 1982. Acorn is a really smaller company than most of the USA competitors, therefore it initially develops
More informationTypical DSP application
DSP markets DSP markets Typical DSP application TI DSP History: Modem applications 1982 TMS32010, TI introduces its first programmable general-purpose DSP to market Operating at 5 MIPS. It was ideal for
More informationEvolution of Computers & Microprocessors. Dr. Cahit Karakuş
Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor
More informationVLSI Signal Processing
VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface
More informationPipelining, Branch Prediction, Trends
Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping
More informationMNEMONIC OPERATION ADDRESS / OPERAND MODES FLAGS SET WITH S suffix ADC
ECE425 MNEMONIC TABLE MNEMONIC OPERATION ADDRESS / OPERAND MODES FLAGS SET WITH S suffix ADC Adds operands and Carry flag and places value in destination register ADD Adds operands and places value in
More informationEvaluation of Static and Dynamic Scheduling for Media Processors. Overview
Evaluation of Static and Dynamic Scheduling for Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Wayne Wolf Overview Media Processing Present and Future Evaluation
More informationEN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design
EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown
More informationPreliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths
Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee and M. Sato RIKEN AICS (Advanced Institute for Computational
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationDigital Electronics & Computer Engineering (E85) Harris Spring 2018
Digital Electronics & Computer Engineering (E85) Harris Spring 2018 Final Exam This is a closed-book take-home exam. Electronic devices including calculators are not allowed, except on the computer question
More informationLode DSP Core. Features. Overview
Features Two multiplier accumulator units Single cycle 16 x 16-bit signed and unsigned multiply - accumulate 40-bit arithmetic logical unit (ALU) Four 40-bit accumulators (32-bit + 8 guard bits) Pre-shifter,
More informationDistributed Vision Processing in Smart Camera Networks
Distributed Vision Processing in Smart Camera Networks CVPR-07 Hamid Aghajan, Stanford University, USA François Berry, Univ. Blaise Pascal, France Horst Bischof, TU Graz, Austria Richard Kleihorst, NXP
More informationVLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design
VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant
More informationLecture 4 (part 2): Data Transfer Instructions
Lecture 4 (part 2): Data Transfer Instructions CSE 30: Computer Organization and Systems Programming Diba Mirza Dept. of Computer Science and Engineering University of California, San Diego Assembly Operands:
More informationDSP Platforms Lab (AD-SHARC) Session 05
University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming
More informationCPU1. D $, 16-K Dual Ported South UPA
MAJC-5200: A High Performance Microprocessor for Multimedia Computing Subramania Sudharsanan Sun Microsystems, Inc., Palo Alto, CA 94303, USA Abstract. The newly introduced Microprocessor Architecture
More informationVE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS
VE7104/INTRODUCTION TO EMBEDDED CONTROLLERS UNIT III ARM BASED MICROCONTROLLERS Introduction to 32 bit Processors, ARM Architecture, ARM cortex M3, 32 bit ARM Instruction set, Thumb Instruction set, Exception
More informationIMAGINE: Signal and Image Processing Using Streams
IMAGINE: Signal and Image Processing Using Streams Brucek Khailany William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong, John D. Owens, Brian Towles Concurrent VLSI Architecture
More informationNative Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization
Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Frédéric Pétrot and Nicolas Fournel System Level Synthesis
More informationENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design
ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2016 1 ISA is the HW/SW
More informationSimultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling
Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling Shao-Yun Fang, Chung-Wei Lin, Guang-Wan Liao, and Yao-Wen Chang March 26, 2013 Graduate Institute of Electronics Engineering
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationProcessors. Young W. Lim. May 12, 2016
Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationARM Cortex-M4 Architecture and Instruction Set 2: General Data Processing Instructions
ARM Cortex-M4 Architecture and Instruction Set 2: General Data Processing Instructions M J Brockway January 31, 2016 Cortex-M4 Machine Instructions - simple example... main FUNCTION ; initialize registers
More informationDUE to the high computational complexity and real-time
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen
More informationCprE 488 Embedded Systems Design. Lecture 3 Processors and Memory
CprE 488 Embedded Systems Design Lecture 3 Processors and Memory Joseph Zambreno Electrical and Computer Engineering Iowa State University www.ece.iastate.edu/~zambreno rcl.ece.iastate.edu Although computer
More informationHi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan
ARM Programmers Model Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu Current program status register (CPSR) Prog Model 2 Data processing
More informationBenchmarking Processors for DSP Applications
Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Processors for DSP Applications Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California 94704 USA
More informationBasic Computer Architecture
Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I
More informationOne instruction specifies multiple operations All scheduling of execution units is static
VLIW Architectures Very Long Instruction Word Architecture One instruction specifies multiple operations All scheduling of execution units is static Done by compiler Static scheduling should mean less
More informationOn the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms
On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms Shih-Hao Hung, Po-Hsun Chiu, Chia-Heng Tu, Wei-Ting Chou and Wen-Long Yang Graduate Institute of Networking
More informationMetaRTL: Raising the Abstraction Level of RTL Design
MetaRTL: Raising the Abstraction Level of RTL Design Jianwen Zhu Electrical and Computer Engineering University of Toronto March 16, 2001 zhu@eecg.toronto.edu http://www.eecg.toronto.edu/ zhu DATE 2001,
More informationThe ARM Instruction Set
The ARM Instruction Set Minsoo Ryu Department of Computer Science and Engineering Hanyang University msryu@hanyang.ac.kr Topics Covered Data Processing Instructions Branch Instructions Load-Store Instructions
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationF28HS Hardware-Software Interface: Systems Programming
F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has
More informationEN164: Design of Computing Systems Lecture 24: Processor / ILP 5
EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationUsing Intel Streaming SIMD Extensions for 3D Geometry Processing
Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,
More informationEvaluating Inter-cluster Communication in Clustered VLIW Architectures
Evaluating Inter-cluster Communication in Clustered VLIW Architectures Anup Gangwar Embedded Systems Group, Department of Computer Science and Engineering, Indian Institute of Technology Delhi September
More information