An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

Similar documents
REAL TIME DIGITAL SIGNAL PROCESSING

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

General Purpose Signal Processors

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

Implementation of DSP Algorithms

INTRODUCTION TO DIGITAL SIGNAL PROCESSOR

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Embedded Systems. 7. System Components

Hardware-based Speculation

Digital Signal Processors: fundamentals & system design. Lecture 1. Maria Elena Angoletta CERN

Embedded Systems: Hardware Components (part I) Todor Stefanov

Cache Justification for Digital Signal Processors

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

Basic Computer Architecture

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Better sharc data such as vliw format, number of kind of functional units

04 - DSP Architecture and Microarchitecture

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

DSP Platforms Lab (AD-SHARC) Session 05

LECTURE 10. Pipelining: Advanced ILP

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Lecture 15: Pipelining. Spring 2018 Jason Tang

Chapter 2 Lecture 1 Computer Systems Organization

Multiple Instruction Issue. Superscalars

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

DSP Processors Lecture 13

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

The Evolution of DSP Processors

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

FAST FIR FILTERS FOR SIMD PROCESSORS WITH LIMITED MEMORY BANDWIDTH

One instruction specifies multiple operations All scheduling of execution units is static

EE201A Presentation. Memory Addressing Organization for Stream-Based Reconfigurable Computing

Exploitation of instruction level parallelism

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

SUPERSCALAR AND VLIW PROCESSORS

Superscalar Machines. Characteristics of superscalar processors

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

EE 4980 Modern Electronic Systems. Processor Advanced

COMPUTER STRUCTURE AND ORGANIZATION

M.Tech. credit seminar report, Electronic Systems Group, EE Dept, IIT Bombay, Submitted: November Evolution of DSPs

CS 3510 Comp&Net Arch

Evaluating MMX Technology Using DSP and Multimedia Applications

Graduate Institute of Electronics Engineering, NTU 9/16/2004

04 - DSP Architecture and Microarchitecture

UNIT- 5. Chapter 12 Processor Structure and Function

Typical DSP application

GENERAL-PURPOSE MICROPROCESSOR PERFORMANCE FOR DSP APPLICATIONS. University of Utah. Salt Lake City, UT USA

Microprocessors vs. DSPs (ESC-223)

Advanced Instruction-Level Parallelism

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Embedded Systems Development

ELC4438: Embedded System Design Embedded Processor

WS_CCESSH-OUT-v1.00.doc Page 1 of 8

Instruction Level Parallelism

Lec 25: Parallel Processors. Announcements

Digital Signal Processor Core Technology

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

Control Hazards. Branch Prediction

03 - The Junior Processor

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

REAL TIME DIGITAL SIGNAL PROCESSING

Superscalar Processors

COSC 122 Computer Fluency. Computer Organization. Dr. Ramon Lawrence University of British Columbia Okanagan

DSP VLSI Design. Pipelining. Byungin Moon. Yonsei University

Code Generation for TMS320C6x in Ptolemy

The University of Texas at Austin

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

Floating Point/Multicycle Pipelining in DLX

The Microarchitecture Level

Choosing a Micro for an Embedded System Application

Classification of Semiconductor LSI

Where Does The Cpu Store The Address Of The

omputer Design Concept adao Nakamura

Control Hazards. Prediction

CS 654 Computer Architecture Summary. Peter Kemper

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institution of Technology, Delhi

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

Tutorial 4 KE What are the differences among sequential access, direct access, and random access?

Advanced Computer Architecture

systems such as Linux (real time application interface Linux included). The unified 32-

Processor (IV) - advanced ILP. Hwansoo Han

Processor Architecture

Computer Organization

An Optimizing Compiler for the TMS320C25 DSP Chip

Chapter 1 Introduction

Embedded Computation

Final Lecture. A few minutes to wrap up and add some perspective

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real instruction set architectures. Part 2: a representative sample

Transcription:

An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

DSP example: mobile phone

DSP example: mobile phone with video camera

DSP: applications

Why a DSP? It s easy: we want an architecture optimized for Digital Signal Processing Some versions are further optimized for some specific applications - e.g. very low power consumption for mobile phones

Which is the difference between a DSP and a general purpose processor? (1/4) Memory architecture and bus The first processors (in the 40) had a Harvard architecture: separate memories for program and data But it s complex -> soon replaced by Von Neumann architecture: no real difference between program and data (an instruction has two fields: operation and data) Problem: the processor cannot access instructions and data simultaneously To improve performance: Harvard architecture again! In particular - separate memories and busses for program and data - possibly, another separate bus for the DMA

Which is the difference between a DSP and a general purpose processor? (2/4) A DSP is often used to realize a linear filter The convolution integral is actually a sum: y n =Σ i x n-i h i - if the number of sums is finite: FIR filter (finite impulse response), - otherwise: IIR (infinite impulse response), - which can be realized using two finite sums: y n =Σ i x n-i b i + Σ i y n-i a i

Which is the difference between a DSP and a general purpose processor? (3/4) A common operation in a FIR or IIR filter is A=BC+D: we need - a hardware multiplier (introduced in DSPs in the '70) - a multiply and accumulate in only one clock cycle: MAC instruction. Actually, the MAC is in a loop: we also need a zero overhead loop: - H/W for address generation (the access to memory is not random) - loop management - auto-increment; circular addressing Other possible H/W: - H/W saturation - Instructions to perform a division quickly - Bit reversal for FFT

Which is the difference between a DSP and a general purpose processor? (4/4) Other possible features: Often, data are 16- o 8-bit wide (e.g., audio or images) - a 32-bit ALU can be splitted in two 16-bit ALUs or four 8-bit ALUs, -> 2 o 4 operations in parallel several ALUs which work in parallel fixed point ALUs, o 16-bit ALUs, to reduce power consumption and costs optimized versions: - cost: for consumer applications - power: for mobile applications - for specific applications, e.g. electric motor control

Example: C30 (Texas Instruments, 1982)

Example: FIR filter using a C30

Note: several of these characteristics, which were born on DSPs, have been ported to general purpose processors E.g.: the cache in the Pentium processor is Harvard-like

Another example.: several units working in parallel, and splittable ALUs (see. MMX extensions) in the Pentium 4 processor

Pipeline Example of a 4-stage pipeline (TI C30) each instruction is executed in 4 clock cycles, but (normally) can be put just 1 cycle after the previous one (data are needed only 3 cycles later)

Pipeline: branch (e.g. on the C30) Standard branch: the pipeline is flushed to correctly handle the PC -> 4 cycles Delayed branch: the pipeline is not flushed, and the 3 following instructions are loaded before modifying the PC -> only 1 cycle needed! BRD label ; delayed branch MPYF ; executed ADDF ; executed SUBF ; executed AND ; not executed label MPYF ; fetched after SUBF

Two architectures In order to exploit the instruction level parallelism (ILP): two possible architectures - Superscalar: the parallelism is dynamically managed by the hardware - Very Long Instruction Word (VLIW): the parallelism is statically managed by the compiler Which is the problem? Dependences in data or control can generate conflicts - on data (an instruction needs the result of a previous instruction, but the results is not ready yet), or - on control (conditional jump, but the condition is not ready yet) -> pipeline stall

Superscalar The analysis of the independent instructions is dynamically done by hardware (which is complex!) The sequence of instructions can be executed out-of-order; then, the completion of the instructions (commit) is done inorder to correctly update the state of the CPU

VLIW Very Long Instruction Word (VLIW): the parallelism is statically managed by the compiler The analysis of independent instructions is statically realized during the compilation phase; - the instructions which can be realized in parallel are assembled in long instructions and send to the various functional units in-order Convenient solution for DSP programs (fixed length cycles, few conditional operations); less convenient for general purpose applications Simpler hardware! But a specific compilation for each platform is needed Deterministic behaviour -> exact computation of execution times