An introduction to Digital Signal Processors (DSP) Using the C55xx family

Similar documents
TMS320C3X Floating Point DSP

EECS 452 Lab 2 Basic DSP Using the C5515 ezdsp Stick

DSP Platforms Lab (AD-SHARC) Session 05

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

REAL TIME DIGITAL SIGNAL PROCESSING

CS 101, Mock Computer Architecture

General Purpose Signal Processors

ECE 471 Embedded Systems Lecture 2

Implementation of DSP Algorithms

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

REAL TIME DIGITAL SIGNAL PROCESSING

Microcomputer Architecture and Programming

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

Cache Justification for Digital Signal Processors

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Choosing a Micro for an Embedded System Application

CPE300: Digital System Architecture and Design

Topic Notes: MIPS Instruction Set Architecture

DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions

Microcontroller Systems

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

Fixed-Point Math and Other Optimizations

Processing Unit CS206T

Embedded Computation

538 Lecture Notes Week 1

Chapter 1. Microprocessor architecture ECE Dr. Mohamed Mahmoud.

Contents. Slide Set 1. About these slides. Outline of Slide Set 1. Typographical conventions: Italics. Typographical conventions. About these slides

Four Categories Of 8085 Instructions That >>>CLICK HERE<<<

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Microcontrollers. Microcontroller

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

REAL TIME DIGITAL SIGNAL PROCESSING

Embedded Target for TI C6000 DSP 2.0 Release Notes

CHAPTER ASSEMBLY LANGUAGE PROGRAMMING

DSP VLSI Design. Instruction Set. Byungin Moon. Yonsei University

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

Provided by - Microsoft Placement Paper Technical 2012

Number Representations

Understand the factors involved in instruction set

Chapter 5:: Target Machine Architecture (cont.)

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

CS Computer Architecture

02 - Numerical Representation and Introduction to Junior

Dynamic Control Hazard Avoidance

Improving our Simple Cache

Representing numbers on the computer. Computer memory/processors consist of items that exist in one of two possible states (binary states).

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA?

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

ECE 471 Embedded Systems Lecture 2

Lecture 4 - Number Representations, DSK Hardware, Assembly Programming


SECTION 5 ADDRESS GENERATION UNIT AND ADDRESSING MODES

Chapter 5. A Closer Look at Instruction Set Architectures

COMP MIPS instructions 2 Feb. 8, f = g + h i;

SAE5C Computer Organization and Architecture. Unit : I - V

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine

Homework 9: Software Design Considerations

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

1 Introduction to Networking

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

EE 354 Fall 2015 Lecture 1 Architecture and Introduction

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (

Kinds Of Data CHAPTER 3 DATA REPRESENTATION. Numbers Are Different! Positional Number Systems. Text. Numbers. Other

Independent DSP Benchmarks: Methodologies and Results. Outline

INTRODUCTION TO DIGITAL SIGNAL PROCESSOR

Efficient FFT Algorithm and Programming Tricks

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Pipelining, Branch Prediction, Trends

Digital Signal Processing Introduction to Finite-Precision Numerical Effects

Understanding the basic building blocks of a microcontroller device in general. Knows the terminologies like embedded and external memory devices,

Slide Set 1 (corrected)

CHAPTER 4 MARIE: An Introduction to a Simple Computer

8051 Overview and Instruction Set

systems such as Linux (real time application interface Linux included). The unified 32-

Computer Architecture and System Software Lecture 02: Overview of Computer Systems & Start of Chapter 2

Parallel Programming: Background Information

Parallel Programming: Background Information

address ALU the operation opcode ACC Acc memory address

Page 1. Logistics. Introduction to Embedded Systems. My Records Indicate. Intel 4004 first single chip computer? Acronyms CS/ECE 6780/5780.

(Refer Slide Time: 01:25)

Parallelism and Concurrency. Motivation, Challenges, Impact on Software Development CSE 110 Winter 2016

Representation of Numbers and Arithmetic in Signal Processors

LOW-COST SIMD. Considerations For Selecting a DSP Processor Why Buy The ADSP-21161?

ECE2049 E17 Lecture 4 MSP430 Architecture & Intro to Digital I/O

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Wednesday, September 13, Chapter 4

DESIGN OF A COMPOSITE ARITHMETIC UNIT FOR RATIONAL NUMBERS

Dec Hex Bin ORG ; ZERO. Introduction To Computing

Microprocessors, Lecture 1: Introduction to Microprocessors

Module 2: Computer Arithmetic

Computer Organization CS 206 T Lec# 2: Instruction Sets

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

Problem with Scanning an Infix Expression

EEL 4783: Hardware/Software Co-design with FPGAs

Note: we are stalling the BNEZ because it resolves in ID but needs data from the DSUB which doesn t exist until the end of DSUB being in EX.

These are notes for the third lecture; if statements and loops.

Microprocessors and Microcontrollers. Assignment 1:

Transcription:

An introduction to Digital Signal Processors (DSP) Using the C55xx family

Group status (~2 minutes each) 5 groups stand up What processor(s) you are using Wireless? If so, what technologies/chips are you using? If not, what is your primary communication bus/scheme? What roadblocks/concerns/issues are you currently worried about We re going to jump back to switching regulators later. Bring slides for next time. I wanted to get through all of DSP/specialized processors today.

There are different kinds of embedded processors There are a fair number of different kinds of microprocessors used in embedded systems Microcontrollers Small, fairly simple devices. Non-volatile storage. Generally a fair bit of basic I/O (GPIO, SPI, etc.) Processor More-or-less a desktop processor with favorable power numbers. Atom, ARM A8, etc. System on a Chip Generally more CPU power than a microcontroller, but has lots of add-ons including perhaps analog I/O and specialized devices (Ethernet controller, LCD controller, FPGA) etc.

Digital Signal Processor (DSP) DSP chips are optimized for high performance/low power on very specific types of computation. Price: C5515 hits 22mW @ 100MHz Tasks: 0.22mW/MHz@(100 or 120) 0.15mW/MHz@(60 or 75) Filtering, FFT are the big ones.

Fixed point vs. floating point A reasonable way to break down DSPs: Floating point No floating point (Fixed point) Floating point makes things a lot easier for the programmer. Fixed point A good DSP programmer can often get better power numbers with fixed point. But can be a ton of work.

Basic fixed point Qn is a naming scheme used to describe fixed point numbers. n specifies the digit which is the last before the radix point. So a normal integer is Q0. Examples 0110 is 6 in binary 0110 as a Q2 is 1.5 Numbers are generally 2 s complement 1100 is -4. 1100 as Q3 is -0.5

Factoids Signed x-bit Q x-1 numbers represent values from -1 to (almost) 1. This is the form typically used because two numbers in that range multiplied by each other are still in that range. Multiplying two 16-bit Q15 numbers yields?

And this is important

Lowpass filter template 10

FIR filter Basic idea is to take an input, x, but it into a big (and wide) shift register. Multiply each of the x values (old and new) by some constant. Sum up those product terms. Example: Say b 0 =.5, b 1 =.75, and b 2 =.25 x is 1, -1, 0, 1, -1, 0 etc. forever. What is the output? y[ n] M b k 0 k x[ n k]

Automatic tools--matlab The figure directly above is from Matlab. You specify the various parameters (f pass, f stop, p, A p, A s, etc.) and it will generate the b x values needed. If parameters are too difficult, this could be huge (500+ z -1 blocks) In that case, we may want to use an IIR filter. Those are feedbackbased and are a bit more touchy (prone to being unstable etc.).

Consider a traditional RISC CPU For reasonably large filter, b y doesn t fit in the register file. top: LD x++ LD b++ MULT a,x,b ADD accum, accum, a goto top (++ indicates auto increment) That s a lot of instructions Plus we need to shift the x values around. Also a loop Depending on how you count it, could be 8-10 instructions per Z -1 block

Some FIR tricks Most obvious is to use a circular buffer for the x values. 0 1 2 3 4 5 The problem with this is that you need more instructions to see if you ve fallen off the end of the buffer and need to wrap around And it s a branch, which is mildly annoying due to predictors etc.

A slightly different version Int16 FIR(Uint16 i) { Int32 sum; Uint16 j, index; sum=0; //The actual filter work for(j=0; j<lpl; j++) { index = ASIZE + i - j; X B 0 1 2 3 4 5 0 1 2 3 4 5 if(i>=j) else index = i - j; index = ASIZE + i - j; This part is icky } sum += (Int32)in[index] * (Int32)LP[j]; } sum = sum + 0x00004000; // So we round rather than truncate. return (Int16) (sum >> 15); // Conversion from 32 Q30 to 16 Q15.

How fast could one do it? Well, I suppose we could try one instruction. MAC y, x++, z++ That s got lots of problems. No register use for the arrays so very heavy memory use 2 data elements from memory/cache 3 register file changes (pointers, accumulator) Plus we need to do a MAC and mults are already slow hurts clock period. Plus we need to worry about wrapping around in the circular buffer. Oh yeah, we need to know when to stop.

Data I need a lot of ports to memory Instruction fetch 2 data elements I need a lot of ports to the register file Or at least banked registers

C55xx Data buses (cont.) Twelve independent buses: Three data read buses Two data write buses Five data address buses One program read bus One program address bus So yeah, we can move data Registers appear to go on the same buses. Registers are memory mapped

OK, so data seems doable Well sort of, still worried about updating pointers. 2 data reads, 1 data write, need to update 2 pointers, running out of buses.

MAC? Most CPUs don t have a Multiply and accumulate instruction Too slow. Hurts clock period So unless we use the MAC a LOT it hurts. But for a DSP this is our bread and butter. So we ll take the 10% clock period hit or whatever so we don t have to use two separate instructions.

Wrapping around? Seems possible. Imagine a fairly smart memory. You can tell it the start address, end-of-buffer address and start-of-buffer address. It knows enough to be able to generate the next address, even with wrap around. This also takes care of our pointer problem. 0 1 2 3 4 5

Circular Buffer Start Address Registers (BSA01, BSA23, BSA45, BSA67, BSAC) The CPU includes five 16-bit circular buffer start address registers Each buffer start address register is associated with a particular pointer A buffer start address is added to the pointer only when the pointer is configured for circular addressing in status register ST2_55.

Circular Buffer Size Registers (BK03, BK47, BKC) Three 16-bit circular buffer size registers specify the number of words (up to 65535) in a circular buffer. Each buffer size register is associated with particular pointers In the TMS320C54x-compatible mode (C54CM = 1), BK03 is used for all the auxiliary registers, and BK47 is not used.

By the way If we know the start and end of the buffer We know the length of the loop. Pretty much down to one instruction once we get going. The TI optimized FIR filter takes 25 cycles to set things up and then takes 1 cycle per MAC.

FFTs Another common thing we want to do is an FFT Tells you about the frequency parts of a signal Breaks down the signal into sin bins Useful in a lot of applications

Discrete Fourier Transform (DFT) The DFT is commonly written as: One might also use

The Fast Fourier Transform (FFT) Algorithm There are many fast algorithms (FFTs) that can be used to compute the Discrete Fourier Transform (DFT). Since the DFT is defined as: How many MACs do we need? Real or complex? Any algorithm which reduces this can be said to be fast

WN = e-j2π/n

FFT support FFTs typically take an array in normal order and return the output in bit reversed order. Or the other way around (as on prev. page) Hardware often able to swap the order of the address bits makes it (much) faster to deal with the bitreversed data.

And a bit more Other support? Verterbi is an algorithm commonly used for error correct/communication. Provide special instructions for it Mainly data movement, pointer, and compare instructions. Overflow is a constant worry in filters TI s accumulators provide 4 guard bits for detection. That s unheard of in a mainstream processor. Saves instructions for checking for overflow.

Why do I care again? To be clear the point of this slide set is For certain special purpose tasks, there are processors dedicated to doing those tasks Those processors can be both more powerful and lowerpower than a generic CPU at doing that task. The trick is that they are designed to do tasks that are common in that field. Here we see they have optimized a common DSP task (FIR filter stage) from about 8-10 assembly instructions down to 1! Other such devices? GPUs For graphics Move large amounts of data with simple operations SIMD Network processors 1 Pattern matching - the ability to find specific patterns of bits or bytes within packets in a packet stream. Key lookup - the ability to quickly undertake a database lookup using a key (typically an address in a packet) to find a result, typically routing information. Data bitfield manipulation - the ability to change certain data fields contained in the packet as it is being processed. Etc. 1 See https://en.wikipedia.org/wiki/network_processor for more details. List taken from there.