INTRODUCTION TO DIGITAL SIGNAL PROCESSORS

Size: px
Start display at page:

Download "INTRODUCTION TO DIGITAL SIGNAL PROCESSORS"

Transcription

1 INTRODUCTION TO DIGITAL SIGNAL PROCESSORS Accumulator architecture Memoryregister architecture Prof. Brian L. Evans in collaboration with Niranjan DameraVenkata and Magesh Valliappan Loadstore architecture Embedded Signal Processing Laboratory The University of Texas at Austin Austin, T

2 Outline n Signal processing applications n Conventional DSP architecture n Pipelining in DSP processors n RISC vs. DSP processor architectures n TI TMS320C6x VLIW DSP architecture n Signal and image processing applications n Signal processing on generalpurpose processors n Conclusion 2

3 Signal Processing Applications n Lowcost embedded systems 4 Modems, cellular telephones, disk drives, printers n Highthroughput applications 4 Halftoning, radar, highresolution sonar, tomography n PC based multimedia 4 Compression/decompression of audio, graphics, video n Embedded processor requirements 4 Inexpensive with small area and volume 4 Deterministic interrupt service routine latency 4 Low power: ~50 mw (TMS320C5402/20: 0.32 ma/mip) 3

4 Conventional DSP Architecture n High data throughput 4 Harvard architecture n n Separate data memory/bus and program memory/bus Three reads and one or two writes per instruction cycle 4 Short deterministic interrupt service routine latency 4 Multiplyaccumulate (MAC) in a single instruction cycle 4 Special addressing modes supported in hardware n n Modulo addressing for circular buffers (e.g. FIR filters) Bitreversed addressing (e.g. fast Fourier transforms) 4 Instructions to keep the 34 stages of the pipeline full n n Zerooverhead looping (one pipeline flush to set up) Delayed branches 4

5 Conventional DSP Architecture (con t) n Modulo addressing 4 implementing circular buffers and delay lines Datashifting Time Buffer contents Next sample n=n x NK+1 x NK+1 x N1 x N n=n+1 x NK+2 x NK+3 x N x N+1 n=n+2 x NK+3 x NK+4 x N+1 x N+2 x N+1 x N+2 x N+3 Modulo addressing n Bit reversed addressing 4 used to implement the radix2 FFT Time Buffer contents Next sample n=n x N2 x N1 x N x NK+1 x NK+2 n=n+1 x N2 x N1 x N x N+1 x NK+2 n=n+2 x N2 x N1 x N x N+1 x N+2 x NK+3 x NK+3 x NK+4 x NK+4 x N+1 x N+2 x N+3 5

6 Conventional DSP Architecture (con t) FixedPoint FloatingPoint Cost/Unit $5 $10 Architecture accumulator loadstore or memoryregister Registers 2 4 data 8 address 8 or 16 data 8 or 16 address Data Words 16 or 24 bit integer and fixedpoint 32 bit integer and fixed/floatingpoint OnChip Memory 2 64 kwords data * 2 64 kwords program 8 64 kwords data 8 64 kwords program Address Space 16 kw 8 Mw data kw program ** 16 Mw 4Gw data 16 Mw 4 Gw program Compilers C compilers; poor code generation C, C++ compilers; better code generation Examples TI TMS320C5x; Motorola TI TMS320C3x; Analog Devices SHARC 6

7 Conventional DSP Architecture (con t) n Market share: 95% fixedpoint, 5% floatingpoint n Each processor has dozens of configurations 4 Size and map of data and program memory 4 A/D, input/output buffers, interfaces, timers, and D/A n Drawbacks to conventional DSP processors 4 No byte addressing (desirable for image and video) 4 Limited onchip memory 4 Limited addressable memory on fixedpoint DSPs: except Motorola (16 Mw data; 32 Mw program) and C548/C549/C54xx (8 Mw data; 256 kw program) 4 Nonstandard C extensions to support fixedpoint data 7

8 Pipelining Sequential (Motorola 56000) Fetch Decode Read Execute Pipelined (Most conventional DSP processors) Fetch Decode Read Execute Superscalar (Pentium, MIPS) Fetch Decode Read Execute Superpipelined (CDC7600) Managing Pipelines compiler or programmer interlocking hardware instruction scheduling Fetch Decode Read Execute 8

9 Pipelining: Operation n Timestationary pipeline model 4 Programmer controls each cycle 4 Motorola DSP56001 MAC 0,Y0,A :(R0)+,0 Y:(R4),Y0 n Datastationary pipeline model 4 Programmer specifies data operations 4 TMS320C30/40 MPYF *++AR0(1),*++AR1(IR0),R0 n Interlocked pipeline 4 Programmer is protected from pipeline effects Fetch F D E F G H I J K L L Decode D R E C D E F G H I J K L B C D E F G H I J K L Read Execute A B C D E F G H I J K L 9

10 Pipelining: Hazards n A control hazard occurs when a branch instruction is decoded 4 Flush the pipeline 4 or: Delayed branch (expose pipeline) n A data hazard occurs because an operand cannot be read yet 4 Intended by programmer 4 or: Interlock hardware inserts bubble TMS320C5x example LAC #064h SAMM AR2 NOP LACC * LAR AR2, DATA LACC * Fetch F D R E D E F br G Y Y Z Decode C D E F br Y Z B C D E F br Y Z Read Execute A B C D E F br Y Z 10

11 Pipelining: Avoiding Control Hazards A key factor in the numeric performance of DSPs is the provision of special hardware to perform looping. RPT COUNT TBLR *+ n A repeat instruction repeats one instruction or a block of instructions after repeat n The pipeline is filled with repeated instruction (or block of instructions) n Cost: one pipeline flush only Fetch F D R E D E F rpt Decode C D E F rpt Read B C D E F rpt A B C D E F rpt Execute 11

12 RISC vs. DSP: Instruction Encoding n RISC: Superscalar Reorder Load/store FP Unit n DSP: Horizontal microcode Integer Unit Load/store Load/store ALU Multiplier Address 12

13 RISC vs. DSP: Memory Hierarchy n RISC Registers Out of order I/D Cache Physical memory TLB TLB: Translation Lookaside Buffer n DSP I Cache Internal memories Registers External memories DMA Controller DMA: Direct Memory Access 13

14 TI TMS320C6x VLIW DSP Architecture Simplified Architecture External Memory Sync Async Addr Data Program RAM Data RAM or Cache Internal Buses Regs (A0A15).D1.D2.M1.M2.L1.L2.S1.S2 Control Regs CPU Regs (B0B15) DMA Serial Port Host Port Boot Load Timers Pwr Down 14

15 TI TMS320C6x VLIW DSP Architecture n One instruction cycle per clock cycle n Two parallel data paths with singlecycle units: 4 Data unit 32bit address calculations (modulo, linear) 4 Multiplier unit 16 bit x 16 bit with 32bit result 4 Logical unit 40bit (saturation) arithmetic & compares 4 Shifter unit 32bit integer ALU and 40bit shifter n 16 32bit general purpose registers in each path 4 40 bits can be stored in adjacent even/odd registers n 32bit addressing of 8/16/32 bit data n Fixedpoint (C62x) and floatingpoint (C67x) n C67x computes floatingpoint multiply in 4 cycles 15

16 TI TMS320C6x VLIW DSP Architecture n TMS320C6211: $21 in volume MHz, 300 million MACs/sec, 1200 RISC MIPS 4 onchip: 4k x 8 bits program and 4k x 8 bits data (plus 64k x 8 bits L2 cache) n Deep pipeline stages in C62x: fetch 4, decode 2, execute stages in C67x: fetch 4, decode 2, execute If a branch is in the pipeline, interrupts are disabled (the latency of a branch instruction is 5 cycles) 4 Avoid branches by using conditional execution n No hardware protection against pipeline hazards 4 Compiler and assembler must prevent pipeline hazards 16

17 C5x and C6x Addressing Modes n Immediate 4 The operand is part of the instruction n Register 4 The operand is specified in a register n Direct 4 The address of the operand is part of the instruction (added to imply memory page) n Indirect 4 The address of the operand is stored in a register TMS320C5x TMS320C6x ADD #0FFh add.l1 13,A1,A6 (implied) add.l1 A7,A6,A7 ADD 010h not supported ADD * ldw.l1 *A5++[8],A1 17

18 TMS320C6x vs. Pentium MM Processor Peak MIPS BDTI marks ISR latency Power Unit Price Area Volume Pentium MM 233 Pentium MM 266 C62x 150 MHz C62x 200 MHz µs 4.25 W $ x in µs 4.85 W $ x in µs 1.45 W $ x in µs 1.94 W $ x in 3 BDTImarks: Berkeley Design Technology Inc. DSP benchmark results (larger means better)

19 n Each tap requires Application: FIR Filter 4 Fetching one data sample 4 Fetching one operand 4 Multiplying two numbers 4 Accumulating multiplication result 4 Shifting one sample in the delay line z 1 z 1 z 1 n Computing an FIR tap in one instruction cycle 4 Three data memory accesses 4 Autoincrement or decrement addressing modes 4 Modulo addressing to implement delay line as circular buffer 4 Eleven RISC instructions 19

20 Application: FIR Filter on a TMS320C5x Coefficients Data COEFFP.set 02000h ; Program mem address.set 037Fh ; Newest data sample LASTAP.set 037FH ; Oldest data sample LAR AR3, #LASTAP ; Point to oldest sample RPT #127 MACD COEFFP, * ; Do the thing APAC SACH Y,1 ; Store result note shift 20

21 Application: FIR Filter on a TMS320C62x Coefficients Data SingleCycle Loop... C7: ldh.d1 *A1++, A2 ; Read coefficient ldh.d2 *B1++, B2 ; Read data [B0] sub.l2 B0, 1, B0 ; Decrement counter [B0] B.S2 c7 ; Branch if not zero mpy.m1x A2, B2, A3 ; Form product add.l1 A4, A3, A4 ; Accumulate result... 21

22 Ordered Dithering on a TMS320C62x 1/8 5/8 7/8 3/8 SingleCycle Loop Array of thresholds... C7: ldb.d1 *A1++, A2 ; Read pixel ldb.d2 *B1++, B2 ; Read threshold [B0] sub.l2 B0, 1, B0 ; Decrement counter [B0] B.S2 c7 ; Branch if not zero cmpgtu.l1x A2, B2, A3 ; Threshold and store stb.d1 A3, *A5++ ; Accumulate result... 22

23 DSP Cores n ASIC with: 4 Programmable DSP 4 RAM 4 ROM 4 Standard cells 4 Codec 4 Peripherals 4 Gate array 4 Microcontroller 23

24 DSP on General Purpose Processors n Multimedia applications on PCs 4 Video, audio, graphics and animation 4 Repetitive parallel sequences of instructions n Native signal processing examples 4 Sun Visual Instruction Set (UltraSPARC 1/2) 4 Intel MM (Pentium I/II/III) 4 Intel Concurrent SIMDFP (Pentium III) n Single Instruction Multiple Data (SIMD) 4 One instruction acts on multiple data in parallel 4 Wellsuited for graphics 24

25 DSP on General Purpose Processors (con t) n Programming is considerably tougher 4 C/C++ compilers do not generate native signal processing code except Metrowerks CodeWarrior 4 gives MM code 4 Libraries of routines using native signal processing 4 Hand code using inline assembly for best performance 4 Pack/unpack data not aligned on SIMD word boundaries 4 50cycle penalty to switch out of MM; 0 penalty for VIS 4 Saturation arithmetic in MM; not supported in VIS 4 Extendedprecision accumulation in MM; none in VIS n Speedup for applications 4 Signal and image processing 1.5:1 to 2:1 4 Graphics 4:1 to 6:1 (no packing/unpacking) 25

26 Intel MM Instruction Set n 64bit SIMD register (4 data types) 4 64bit quad word 4 Packed byte (8 bytes packed into 64 bits) 4 Packed word (4 16bit words packed into 64 bits) 4 Packed double word (2 double words packed into 64 bits) n 57 new instructions 4 Pack and unpack 4 Add, subtract, multiply, and multiply/accumulate n Saturation and wraparound arithmetic n Maximum parallelism possible 4 8:1 for 8bit additions 4 4:1 for 8 x 16 multiplication or 16bit additions 26

27 Concluding Remarks n Conventional digital signal processors 4 High performance vs. power consumption/cost/volume 4 Excel at onedimensional processing 4 Have instructions tailored to specific applications n TMS320C6x VLIW DSP 4 High performance vs. cost/volume 4 Excel at multidimensional signal processing 4 A maximum of 22 RISC instructions per cycle n Native Signal Processing 4 Available on desktop computers 4 Excels at graphics 4 A maximum of 8 RISC instructions per cycle n Inline assembly code for best performance 27

28 Concluding Remarks n Digital signal processor market 4 40% annual growth rate since $3.5 billion revenue in % TI, 25% Lucent, 10% Motorola, 8% Analog Devices n Independent benchmarking by industry 4 Berkeley Design Technology Inc. 4 EDN Embedded Microprocessor Benchmark Consortium n Web resources 4 comp.dsp newsgroup: FAQ 4 embedded processors and systems: 4 online courses and DSP boards: 28

29 References 1 G. E. Allen, B. L. Evans, and D. C. Schanbacher, RealTime Sonar Beamforming on a Unix Workstation, Proc. IEEE Asilomar Conf. On Signals, Systems, and Computers, pp , R. Bhargava, R. Radhakrishnan, B. L. Evans, and L. K. John, Evaluating MM Technology Using DSP and Multimedia Applications, Proc. IEEE Sym. On Microarchitecture, pp. 3746, W. Chen, H. J. Reekie, S. Bhave, and E. A. Lee, Native Signal Processing on the UltraSPARC in the Ptolemy Environment, Proc. IEEE Asilomar Conf. On Signals, Systems, and Computers, B. L. Evans, EE379K17 RealTime DSP Laboratory, UT Austin. 5 B. L. Evans, EE382C Embedded Software Systems, UT Austin. 6 A. Kulkarni and A. Dube, Evaluation of the Code Generation Domain in Ptolemy, 7 P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals, IEEE Press,

INTRODUCTION TO DIGITAL SIGNAL PROCESSORS

INTRODUCTION TO DIGITAL SIGNAL PROCESSORS INTRODUCTION TO DIGITAL SIGNAL PROCESSORS Accumulator architecture Memoryregister architecture Prof. Brian L. Evans in collaboration with Niranjan DameraVenkata and Magesh Valliappan Loadstore architecture

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Independent DSP Benchmarks: Methodologies and Results. Outline

Independent DSP Benchmarks: Methodologies and Results. Outline Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

Microprocessors vs. DSPs (ESC-223)

Microprocessors vs. DSPs (ESC-223) Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Evaluating MMX Technology Using DSP and Multimedia Applications

Evaluating MMX Technology Using DSP and Multimedia Applications Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

Typical DSP application

Typical DSP application DSP markets DSP markets Typical DSP application TI DSP History: Modem applications 1982 TMS32010, TI introduces its first programmable general-purpose DSP to market Operating at 5 MIPS. It was ideal for

More information

DSP Processors Lecture 13

DSP Processors Lecture 13 DSP Processors Lecture 13 Ingrid Verbauwhede Department of Electrical Engineering University of California Los Angeles ingrid@ee.ucla.edu 1 References The origins: E.A. Lee, Programmable DSP Processors,

More information

Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures

Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures Deependra Talla, Lizy K. John, Viktor Lapinskii, and Brian L. Evans Department of Electrical and Computer

More information

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:

More information

INTRODUCTION TO DIGITAL SIGNAL PROCESSOR

INTRODUCTION TO DIGITAL SIGNAL PROCESSOR INTRODUCTION TO DIGITAL SIGNAL PROCESSOR By, Snehal Gor snehalg@embed.isquareit.ac.in 1 PURPOSE Purpose is deliberately thought-through goal-directedness. - http://en.wikipedia.org/wiki/purpose This document

More information

Processors for Embedded Digital Signal Processing

Processors for Embedded Digital Signal Processing Insight, Analysis, and Advice on Signal Processing Technology Processors for Embedded Jeff Bier President BDTI Oakland, California USA +1 (510) 451-1800 info@bdti.com http://www.bdti.com 2008 Berkeley

More information

TMS320C3X Floating Point DSP

TMS320C3X Floating Point DSP TMS320C3X Floating Point DSP Microcontrollers & Microprocessors Undergraduate Course Isfahan University of Technology Oct 2010 By : Mohammad 1 DSP DSP : Digital Signal Processor Why A DSP? Example Voice

More information

Real-Time High-Throughput Sonar Beamforming Kernels Using Native Signal Processing and Memory Latency Hiding Techniques

Real-Time High-Throughput Sonar Beamforming Kernels Using Native Signal Processing and Memory Latency Hiding Techniques Real-Time High-Throughput Sonar Beamforming Kernels Using Native Signal Processing and Memory Latency Hiding Techniques Gregory E. Allen Applied Research Laboratories: The University of Texas at Austin

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

LOW-COST SIMD. Considerations For Selecting a DSP Processor Why Buy The ADSP-21161?

LOW-COST SIMD. Considerations For Selecting a DSP Processor Why Buy The ADSP-21161? LOW-COST SIMD Considerations For Selecting a DSP Processor Why Buy The ADSP-21161? The Analog Devices ADSP-21161 SIMD SHARC vs. Texas Instruments TMS320C6711 and TMS320C6712 Author : K. Srinivas Introduction

More information

Characterization of Native Signal Processing Extensions

Characterization of Native Signal Processing Extensions Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if

More information

DSP Platforms Lab (AD-SHARC) Session 05

DSP Platforms Lab (AD-SHARC) Session 05 University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming

More information

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

omputer Design Concept adao Nakamura

omputer Design Concept adao Nakamura omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu 1 1 Pascal s Calculator Leibniz s Calculator Babbage s Calculator Von Neumann Computer Flynn s Classification

More information

Embedded Systems Development

Embedded Systems Development Embedded Systems Development Lecture 8 Code Generation for Embedded Processors Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com 2 Life Range and Register Interference A symbolic register

More information

Evaluating the DSP Processor Options (DSP-522)

Evaluating the DSP Processor Options (DSP-522) Insight, Analysis, and Advice on Signal Processing Technology Evaluating the DSP Processor Options (DSP-522) Jeff Bier Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

Digital Signal Processor Core Technology

Digital Signal Processor Core Technology The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

M.Tech. credit seminar report, Electronic Systems Group, EE Dept, IIT Bombay, Submitted: November Evolution of DSPs

M.Tech. credit seminar report, Electronic Systems Group, EE Dept, IIT Bombay, Submitted: November Evolution of DSPs M.Tech. credit seminar report, Electronic Systems Group, EE Dept, IIT Bombay, Submitted: November 2002 Evolution of DSPs Author: Kartik Kariya (Roll No. 02307923) Supervisor: Prof. Vikram M. Gadre, Associate

More information

Embedded Computation

Embedded Computation Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,

More information

Digital Signal Processors: fundamentals & system design. Lecture 1. Maria Elena Angoletta CERN

Digital Signal Processors: fundamentals & system design. Lecture 1. Maria Elena Angoletta CERN Digital Signal Processors: fundamentals & system design Lecture 1 Maria Elena Angoletta CERN Topical CAS/Digital Signal Processing Sigtuna, June 1-9, 2007 Lectures plan Lecture 1 (now!) introduction, evolution,

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Computer Architecture 2/26/01 Lecture #

Computer Architecture 2/26/01 Lecture # Computer Architecture 2/26/01 Lecture #9 16.070 On a previous lecture, we discussed the software development process and in particular, the development of a software architecture Recall the output of the

More information

Chapter 2 Lecture 1 Computer Systems Organization

Chapter 2 Lecture 1 Computer Systems Organization Chapter 2 Lecture 1 Computer Systems Organization This chapter provides an introduction to the components Processors: Primary Memory: Secondary Memory: Input/Output: Busses The Central Processing Unit

More information

Microprocessor Architecture Dr. Charles Kim Howard University

Microprocessor Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals Microprocessor Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer System CPU (with PC, Register, SR) + Memory 2 Computer Architecture ALU

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

Intel s MMX. Why MMX?

Intel s MMX. Why MMX? Intel s MMX Dr. Richard Enbody CSE 820 Why MMX? Make the Common Case Fast Multimedia and Communication consume significant computing resources. Providing specific hardware support makes sense. 1 Goals

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley. Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 9. Pipelining Design Techniques

Chapter 9. Pipelining Design Techniques Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask

More information

The Evolution of DSP Processors

The Evolution of DSP Processors Berkeley Design Technology, Inc. Optimized DSP Software Independent DSP Analysis A BDTI White Paper The Evolution of DSP Processors By Jennifer Eyre and Jeff Bier, Berkeley Design Technology, Inc. (BDTI)

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction The Motorola DSP56300 family of digital signal processors uses a programmable, 24-bit, fixed-point core. This core is a high-performance, single-clock-cycle-per-instruction engine

More information

7/7/2013 TIFAC CORE IN NETWORK ENGINEERING

7/7/2013 TIFAC CORE IN NETWORK ENGINEERING TMS320C50 Architecture 1 OVERVIEW OF DSP PROCESSORS BY Dr. M.Pallikonda Rajasekaran, Professor/ECE 2 3 4 Short history of DSPs 1960 DSP hardware using discrete components 1970 Monolithic components for

More information

TMS320C54x DSP Programming Environment APPLICATION BRIEF: SPRA182

TMS320C54x DSP Programming Environment APPLICATION BRIEF: SPRA182 TMS320C54x DSP Programming Environment APPLICATION BRIEF: SPRA182 M. Tim Grady Senior Member, Technical Staff Texas Instruments April 1997 IMPORTANT NOTICE Texas Instruments (TI) reserves the right to

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Micro-programmed Control Ch 17

Micro-programmed Control Ch 17 Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to

More information

E0-243: Computer Architecture

E0-243: Computer Architecture E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

SH4 RISC Microprocessor for Multimedia

SH4 RISC Microprocessor for Multimedia SH4 RISC Microprocessor for Multimedia Fumio Arakawa, Osamu Nishii, Kunio Uchiyama, Norio Nakagawa Hitachi, Ltd. 1 Outline 1. SH4 Overview 2. New Floating-point Architecture 3. Length-4 Vector Instructions

More information

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4) Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary Hardwired Control (4) Complex Fast Difficult to design Difficult to modify

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Native Signal Processing on the UltraSparc in the Ptolemy Environment

Native Signal Processing on the UltraSparc in the Ptolemy Environment Presented at the Thirtieth Annual Asilomar Conference on Signals, Systems, and Computers - November 1996 Native Signal Processing on the UltraSparc in the Ptolemy Environment William Chen, H. John Reekie,

More information

Towards a Java-enabled 2Mbps wireless handheld device

Towards a Java-enabled 2Mbps wireless handheld device Towards a Java-enabled 2Mbps wireless handheld device John Glossner 1, Michael Schulte 2, and Stamatis Vassiliadis 3 1 Sandbridge Technologies, White Plains, NY 2 Lehigh University, Bethlehem, PA 3 Delft

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

GENERAL-PURPOSE MICROPROCESSOR PERFORMANCE FOR DSP APPLICATIONS. University of Utah. Salt Lake City, UT USA

GENERAL-PURPOSE MICROPROCESSOR PERFORMANCE FOR DSP APPLICATIONS. University of Utah. Salt Lake City, UT USA GENERAL-PURPOSE MICROPROCESSOR PERFORMANCE FOR DSP APPLICATIONS J.N. Barkdull and S.C. Douglas Department of Electrical Engineering University of Utah Salt Lake City, UT 84112 USA ABACT Digital signal

More information

,1752'8&7,21. Figure 1-0. Table 1-0. Listing 1-0.

,1752'8&7,21. Figure 1-0. Table 1-0. Listing 1-0. ,1752'8&7,21 Figure 1-0. Table 1-0. Listing 1-0. The ADSP-21065L SHARC is a high-performance, 32-bit digital signal processor for communications, digital audio, and industrial instrumentation applications.

More information

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Module 1. Introduction. Version 2 EE IIT, Kharagpur 1

Module 1. Introduction. Version 2 EE IIT, Kharagpur 1 Module 1 Introduction Version 2 EE IIT, Kharagpur 1 Lesson 4 Embedded Systems Components Part II Version 2 EE IIT, Kharagpur 2 Overview on Components Instructional Objectives After going through this lesson

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Control unit. Input/output devices provide a means for us to make use of a computer system. Computer System. Computer.

Control unit. Input/output devices provide a means for us to make use of a computer system. Computer System. Computer. Lecture 6: I/O and Control I/O operations Control unit Microprogramming Zebo Peng, IDA, LiTH 1 Input/Output Devices Input/output devices provide a means for us to make use of a computer system. Computer

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

C66x CorePac: Achieving High Performance

C66x CorePac: Achieving High Performance C66x CorePac: Achieving High Performance Agenda 1. CorePac Architecture 2. Single Instruction Multiple Data (SIMD) 3. Memory Access 4. Pipeline Concept CorePac Architecture 1. CorePac Architecture 2. Single

More information

Transparent Data-Memory Organizations for Digital Signal Processors

Transparent Data-Memory Organizations for Digital Signal Processors Transparent Data-Memory Organizations for Digital Signal Processors Sadagopan Srinivasan, Vinodh Cuppu, and Bruce Jacob Dept. of Electrical & Computer Engineering University of Maryland at College Park

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

SA-1500: A 300 MHz RISC CPU with Attached Media Processor*

SA-1500: A 300 MHz RISC CPU with Attached Media Processor* and Bridges Division SA-1500: A 300 MHz RISC CPU with Attached Media Processor* Prashant P. Gandhi, Ph.D. and Bridges Division Computing Enhancement Group Intel Corporation Santa Clara, CA 95052 Prashant.Gandhi@intel.com

More information