INTRODUCTION TO DIGITAL SIGNAL PROCESSORS
|
|
- Evan Quinn
- 5 years ago
- Views:
Transcription
1 INTRODUCTION TO DIGITAL SIGNAL PROCESSORS Accumulator architecture Memoryregister architecture Prof. Brian L. Evans in collaboration with Niranjan DameraVenkata and Magesh Valliappan Loadstore architecture Embedded Signal Processing Laboratory The University of Texas at Austin Austin, T
2 Outline n Signal processing applications n Conventional DSP architecture n Pipelining in DSP processors n RISC vs. DSP processor architectures n TI TMS320C6x VLIW DSP architecture n Signal and image processing applications n Signal processing on generalpurpose processors n Conclusion 2
3 Signal Processing Applications n Lowcost embedded systems 4 Modems, cellular telephones, disk drives, printers n Highthroughput applications 4 Halftoning, radar, highresolution sonar, tomography n PC based multimedia 4 Compression/decompression of audio, graphics, video n Embedded processor requirements 4 Inexpensive with small area and volume 4 Deterministic interrupt service routine latency 4 Low power: ~50 mw (TMS320C5402/20: 0.32 ma/mip) 3
4 Conventional DSP Architecture n High data throughput 4 Harvard architecture n n Separate data memory/bus and program memory/bus Three reads and one or two writes per instruction cycle 4 Short deterministic interrupt service routine latency 4 Multiplyaccumulate (MAC) in a single instruction cycle 4 Special addressing modes supported in hardware n n Modulo addressing for circular buffers (e.g. FIR filters) Bitreversed addressing (e.g. fast Fourier transforms) 4 Instructions to keep the 34 stages of the pipeline full n n Zerooverhead looping (one pipeline flush to set up) Delayed branches 4
5 Conventional DSP Architecture (con t) n Modulo addressing 4 implementing circular buffers and delay lines Datashifting Time Buffer contents Next sample n=n x NK+1 x NK+1 x N1 x N n=n+1 x NK+2 x NK+3 x N x N+1 n=n+2 x NK+3 x NK+4 x N+1 x N+2 x N+1 x N+2 x N+3 Modulo addressing n Bit reversed addressing 4 used to implement the radix2 FFT Time Buffer contents Next sample n=n x N2 x N1 x N x NK+1 x NK+2 n=n+1 x N2 x N1 x N x N+1 x NK+2 n=n+2 x N2 x N1 x N x N+1 x N+2 x NK+3 x NK+3 x NK+4 x NK+4 x N+1 x N+2 x N+3 5
6 Conventional DSP Architecture (con t) FixedPoint FloatingPoint Cost/Unit $5 $10 Architecture accumulator loadstore or memoryregister Registers 2 4 data 8 address 8 or 16 data 8 or 16 address Data Words 16 or 24 bit integer and fixedpoint 32 bit integer and fixed/floatingpoint OnChip Memory 2 64 kwords data * 2 64 kwords program 8 64 kwords data 8 64 kwords program Address Space 16 kw 8 Mw data kw program ** 16 Mw 4Gw data 16 Mw 4 Gw program Compilers C compilers; poor code generation C, C++ compilers; better code generation Examples TI TMS320C5x; Motorola TI TMS320C3x; Analog Devices SHARC 6
7 Conventional DSP Architecture (con t) n Market share: 95% fixedpoint, 5% floatingpoint n Each processor has dozens of configurations 4 Size and map of data and program memory 4 A/D, input/output buffers, interfaces, timers, and D/A n Drawbacks to conventional DSP processors 4 No byte addressing (desirable for image and video) 4 Limited onchip memory 4 Limited addressable memory on fixedpoint DSPs: except Motorola (16 Mw data; 32 Mw program) and C548/C549/C54xx (8 Mw data; 256 kw program) 4 Nonstandard C extensions to support fixedpoint data 7
8 Pipelining Sequential (Motorola 56000) Fetch Decode Read Execute Pipelined (Most conventional DSP processors) Fetch Decode Read Execute Superscalar (Pentium, MIPS) Fetch Decode Read Execute Superpipelined (CDC7600) Managing Pipelines compiler or programmer interlocking hardware instruction scheduling Fetch Decode Read Execute 8
9 Pipelining: Operation n Timestationary pipeline model 4 Programmer controls each cycle 4 Motorola DSP56001 MAC 0,Y0,A :(R0)+,0 Y:(R4),Y0 n Datastationary pipeline model 4 Programmer specifies data operations 4 TMS320C30/40 MPYF *++AR0(1),*++AR1(IR0),R0 n Interlocked pipeline 4 Programmer is protected from pipeline effects Fetch F D E F G H I J K L L Decode D R E C D E F G H I J K L B C D E F G H I J K L Read Execute A B C D E F G H I J K L 9
10 Pipelining: Hazards n A control hazard occurs when a branch instruction is decoded 4 Flush the pipeline 4 or: Delayed branch (expose pipeline) n A data hazard occurs because an operand cannot be read yet 4 Intended by programmer 4 or: Interlock hardware inserts bubble TMS320C5x example LAC #064h SAMM AR2 NOP LACC * LAR AR2, DATA LACC * Fetch F D R E D E F br G Y Y Z Decode C D E F br Y Z B C D E F br Y Z Read Execute A B C D E F br Y Z 10
11 Pipelining: Avoiding Control Hazards A key factor in the numeric performance of DSPs is the provision of special hardware to perform looping. RPT COUNT TBLR *+ n A repeat instruction repeats one instruction or a block of instructions after repeat n The pipeline is filled with repeated instruction (or block of instructions) n Cost: one pipeline flush only Fetch F D R E D E F rpt Decode C D E F rpt Read B C D E F rpt A B C D E F rpt Execute 11
12 RISC vs. DSP: Instruction Encoding n RISC: Superscalar Reorder Load/store FP Unit n DSP: Horizontal microcode Integer Unit Load/store Load/store ALU Multiplier Address 12
13 RISC vs. DSP: Memory Hierarchy n RISC Registers Out of order I/D Cache Physical memory TLB TLB: Translation Lookaside Buffer n DSP I Cache Internal memories Registers External memories DMA Controller DMA: Direct Memory Access 13
14 TI TMS320C6x VLIW DSP Architecture Simplified Architecture External Memory Sync Async Addr Data Program RAM Data RAM or Cache Internal Buses Regs (A0A15).D1.D2.M1.M2.L1.L2.S1.S2 Control Regs CPU Regs (B0B15) DMA Serial Port Host Port Boot Load Timers Pwr Down 14
15 TI TMS320C6x VLIW DSP Architecture n One instruction cycle per clock cycle n Two parallel data paths with singlecycle units: 4 Data unit 32bit address calculations (modulo, linear) 4 Multiplier unit 16 bit x 16 bit with 32bit result 4 Logical unit 40bit (saturation) arithmetic & compares 4 Shifter unit 32bit integer ALU and 40bit shifter n 16 32bit general purpose registers in each path 4 40 bits can be stored in adjacent even/odd registers n 32bit addressing of 8/16/32 bit data n Fixedpoint (C62x) and floatingpoint (C67x) n C67x computes floatingpoint multiply in 4 cycles 15
16 TI TMS320C6x VLIW DSP Architecture n TMS320C6211: $21 in volume MHz, 300 million MACs/sec, 1200 RISC MIPS 4 onchip: 4k x 8 bits program and 4k x 8 bits data (plus 64k x 8 bits L2 cache) n Deep pipeline stages in C62x: fetch 4, decode 2, execute stages in C67x: fetch 4, decode 2, execute If a branch is in the pipeline, interrupts are disabled (the latency of a branch instruction is 5 cycles) 4 Avoid branches by using conditional execution n No hardware protection against pipeline hazards 4 Compiler and assembler must prevent pipeline hazards 16
17 C5x and C6x Addressing Modes n Immediate 4 The operand is part of the instruction n Register 4 The operand is specified in a register n Direct 4 The address of the operand is part of the instruction (added to imply memory page) n Indirect 4 The address of the operand is stored in a register TMS320C5x TMS320C6x ADD #0FFh add.l1 13,A1,A6 (implied) add.l1 A7,A6,A7 ADD 010h not supported ADD * ldw.l1 *A5++[8],A1 17
18 TMS320C6x vs. Pentium MM Processor Peak MIPS BDTI marks ISR latency Power Unit Price Area Volume Pentium MM 233 Pentium MM 266 C62x 150 MHz C62x 200 MHz µs 4.25 W $ x in µs 4.85 W $ x in µs 1.45 W $ x in µs 1.94 W $ x in 3 BDTImarks: Berkeley Design Technology Inc. DSP benchmark results (larger means better)
19 n Each tap requires Application: FIR Filter 4 Fetching one data sample 4 Fetching one operand 4 Multiplying two numbers 4 Accumulating multiplication result 4 Shifting one sample in the delay line z 1 z 1 z 1 n Computing an FIR tap in one instruction cycle 4 Three data memory accesses 4 Autoincrement or decrement addressing modes 4 Modulo addressing to implement delay line as circular buffer 4 Eleven RISC instructions 19
20 Application: FIR Filter on a TMS320C5x Coefficients Data COEFFP.set 02000h ; Program mem address.set 037Fh ; Newest data sample LASTAP.set 037FH ; Oldest data sample LAR AR3, #LASTAP ; Point to oldest sample RPT #127 MACD COEFFP, * ; Do the thing APAC SACH Y,1 ; Store result note shift 20
21 Application: FIR Filter on a TMS320C62x Coefficients Data SingleCycle Loop... C7: ldh.d1 *A1++, A2 ; Read coefficient ldh.d2 *B1++, B2 ; Read data [B0] sub.l2 B0, 1, B0 ; Decrement counter [B0] B.S2 c7 ; Branch if not zero mpy.m1x A2, B2, A3 ; Form product add.l1 A4, A3, A4 ; Accumulate result... 21
22 Ordered Dithering on a TMS320C62x 1/8 5/8 7/8 3/8 SingleCycle Loop Array of thresholds... C7: ldb.d1 *A1++, A2 ; Read pixel ldb.d2 *B1++, B2 ; Read threshold [B0] sub.l2 B0, 1, B0 ; Decrement counter [B0] B.S2 c7 ; Branch if not zero cmpgtu.l1x A2, B2, A3 ; Threshold and store stb.d1 A3, *A5++ ; Accumulate result... 22
23 DSP Cores n ASIC with: 4 Programmable DSP 4 RAM 4 ROM 4 Standard cells 4 Codec 4 Peripherals 4 Gate array 4 Microcontroller 23
24 DSP on General Purpose Processors n Multimedia applications on PCs 4 Video, audio, graphics and animation 4 Repetitive parallel sequences of instructions n Native signal processing examples 4 Sun Visual Instruction Set (UltraSPARC 1/2) 4 Intel MM (Pentium I/II/III) 4 Intel Concurrent SIMDFP (Pentium III) n Single Instruction Multiple Data (SIMD) 4 One instruction acts on multiple data in parallel 4 Wellsuited for graphics 24
25 DSP on General Purpose Processors (con t) n Programming is considerably tougher 4 C/C++ compilers do not generate native signal processing code except Metrowerks CodeWarrior 4 gives MM code 4 Libraries of routines using native signal processing 4 Hand code using inline assembly for best performance 4 Pack/unpack data not aligned on SIMD word boundaries 4 50cycle penalty to switch out of MM; 0 penalty for VIS 4 Saturation arithmetic in MM; not supported in VIS 4 Extendedprecision accumulation in MM; none in VIS n Speedup for applications 4 Signal and image processing 1.5:1 to 2:1 4 Graphics 4:1 to 6:1 (no packing/unpacking) 25
26 Intel MM Instruction Set n 64bit SIMD register (4 data types) 4 64bit quad word 4 Packed byte (8 bytes packed into 64 bits) 4 Packed word (4 16bit words packed into 64 bits) 4 Packed double word (2 double words packed into 64 bits) n 57 new instructions 4 Pack and unpack 4 Add, subtract, multiply, and multiply/accumulate n Saturation and wraparound arithmetic n Maximum parallelism possible 4 8:1 for 8bit additions 4 4:1 for 8 x 16 multiplication or 16bit additions 26
27 Concluding Remarks n Conventional digital signal processors 4 High performance vs. power consumption/cost/volume 4 Excel at onedimensional processing 4 Have instructions tailored to specific applications n TMS320C6x VLIW DSP 4 High performance vs. cost/volume 4 Excel at multidimensional signal processing 4 A maximum of 22 RISC instructions per cycle n Native Signal Processing 4 Available on desktop computers 4 Excels at graphics 4 A maximum of 8 RISC instructions per cycle n Inline assembly code for best performance 27
28 Concluding Remarks n Digital signal processor market 4 40% annual growth rate since $3.5 billion revenue in % TI, 25% Lucent, 10% Motorola, 8% Analog Devices n Independent benchmarking by industry 4 Berkeley Design Technology Inc. 4 EDN Embedded Microprocessor Benchmark Consortium n Web resources 4 comp.dsp newsgroup: FAQ 4 embedded processors and systems: 4 online courses and DSP boards: 28
29 References 1 G. E. Allen, B. L. Evans, and D. C. Schanbacher, RealTime Sonar Beamforming on a Unix Workstation, Proc. IEEE Asilomar Conf. On Signals, Systems, and Computers, pp , R. Bhargava, R. Radhakrishnan, B. L. Evans, and L. K. John, Evaluating MM Technology Using DSP and Multimedia Applications, Proc. IEEE Sym. On Microarchitecture, pp. 3746, W. Chen, H. J. Reekie, S. Bhave, and E. A. Lee, Native Signal Processing on the UltraSPARC in the Ptolemy Environment, Proc. IEEE Asilomar Conf. On Signals, Systems, and Computers, B. L. Evans, EE379K17 RealTime DSP Laboratory, UT Austin. 5 B. L. Evans, EE382C Embedded Software Systems, UT Austin. 6 A. Kulkarni and A. Dube, Evaluation of the Code Generation Domain in Ptolemy, 7 P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals, IEEE Press,
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS
INTRODUCTION TO DIGITAL SIGNAL PROCESSORS Accumulator architecture Memoryregister architecture Prof. Brian L. Evans in collaboration with Niranjan DameraVenkata and Magesh Valliappan Loadstore architecture
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationIndependent DSP Benchmarks: Methodologies and Results. Outline
Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationMicroprocessors vs. DSPs (ESC-223)
Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationEvaluating MMX Technology Using DSP and Multimedia Applications
Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical
More informationAn introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures
An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?
More informationTypical DSP application
DSP markets DSP markets Typical DSP application TI DSP History: Modem applications 1982 TMS32010, TI introduces its first programmable general-purpose DSP to market Operating at 5 MIPS. It was ideal for
More informationDSP Processors Lecture 13
DSP Processors Lecture 13 Ingrid Verbauwhede Department of Electrical Engineering University of California Los Angeles ingrid@ee.ucla.edu 1 References The origins: E.A. Lee, Programmable DSP Processors,
More informationEvaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures Deependra Talla, Lizy K. John, Viktor Lapinskii, and Brian L. Evans Department of Electrical and Computer
More informationOptimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology
Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:
More informationINTRODUCTION TO DIGITAL SIGNAL PROCESSOR
INTRODUCTION TO DIGITAL SIGNAL PROCESSOR By, Snehal Gor snehalg@embed.isquareit.ac.in 1 PURPOSE Purpose is deliberately thought-through goal-directedness. - http://en.wikipedia.org/wiki/purpose This document
More informationProcessors for Embedded Digital Signal Processing
Insight, Analysis, and Advice on Signal Processing Technology Processors for Embedded Jeff Bier President BDTI Oakland, California USA +1 (510) 451-1800 info@bdti.com http://www.bdti.com 2008 Berkeley
More informationTMS320C3X Floating Point DSP
TMS320C3X Floating Point DSP Microcontrollers & Microprocessors Undergraduate Course Isfahan University of Technology Oct 2010 By : Mohammad 1 DSP DSP : Digital Signal Processor Why A DSP? Example Voice
More informationReal-Time High-Throughput Sonar Beamforming Kernels Using Native Signal Processing and Memory Latency Hiding Techniques
Real-Time High-Throughput Sonar Beamforming Kernels Using Native Signal Processing and Memory Latency Hiding Techniques Gregory E. Allen Applied Research Laboratories: The University of Texas at Austin
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.
More informationLOW-COST SIMD. Considerations For Selecting a DSP Processor Why Buy The ADSP-21161?
LOW-COST SIMD Considerations For Selecting a DSP Processor Why Buy The ADSP-21161? The Analog Devices ADSP-21161 SIMD SHARC vs. Texas Instruments TMS320C6711 and TMS320C6712 Author : K. Srinivas Introduction
More informationCharacterization of Native Signal Processing Extensions
Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if
More informationDSP Platforms Lab (AD-SHARC) Session 05
University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming
More informationSeparating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance
Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential
More informationVIII. DSP Processors. Digital Signal Processing 8 December 24, 2009
Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access
More informationEECS 322 Computer Architecture Superpipline and the Cache
EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:
More informationASSEMBLY LANGUAGE MACHINE ORGANIZATION
ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction
More informationomputer Design Concept adao Nakamura
omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu 1 1 Pascal s Calculator Leibniz s Calculator Babbage s Calculator Von Neumann Computer Flynn s Classification
More informationEmbedded Systems Development
Embedded Systems Development Lecture 8 Code Generation for Embedded Processors Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com 2 Life Range and Register Interference A symbolic register
More informationEvaluating the DSP Processor Options (DSP-522)
Insight, Analysis, and Advice on Signal Processing Technology Evaluating the DSP Processor Options (DSP-522) Jeff Bier Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationBetter sharc data such as vliw format, number of kind of functional units
Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com
More informationUniversität Dortmund. ARM Architecture
ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture
More informationChapter 4 The Processor (Part 4)
Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline
More informationDigital Signal Processor Core Technology
The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More information04 - DSP Architecture and Microarchitecture
September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationM.Tech. credit seminar report, Electronic Systems Group, EE Dept, IIT Bombay, Submitted: November Evolution of DSPs
M.Tech. credit seminar report, Electronic Systems Group, EE Dept, IIT Bombay, Submitted: November 2002 Evolution of DSPs Author: Kartik Kariya (Roll No. 02307923) Supervisor: Prof. Vikram M. Gadre, Associate
More informationEmbedded Computation
Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,
More informationDigital Signal Processors: fundamentals & system design. Lecture 1. Maria Elena Angoletta CERN
Digital Signal Processors: fundamentals & system design Lecture 1 Maria Elena Angoletta CERN Topical CAS/Digital Signal Processing Sigtuna, June 1-9, 2007 Lectures plan Lecture 1 (now!) introduction, evolution,
More informationComputer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can
More informationComputer Architecture 2/26/01 Lecture #
Computer Architecture 2/26/01 Lecture #9 16.070 On a previous lecture, we discussed the software development process and in particular, the development of a software architecture Recall the output of the
More informationChapter 2 Lecture 1 Computer Systems Organization
Chapter 2 Lecture 1 Computer Systems Organization This chapter provides an introduction to the components Processors: Primary Memory: Secondary Memory: Input/Output: Busses The Central Processing Unit
More informationMicroprocessor Architecture Dr. Charles Kim Howard University
EECE416 Microcomputer Fundamentals Microprocessor Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer System CPU (with PC, Register, SR) + Memory 2 Computer Architecture ALU
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationCache Justification for Digital Signal Processors
Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose
More informationIntel s MMX. Why MMX?
Intel s MMX Dr. Richard Enbody CSE 820 Why MMX? Make the Common Case Fast Multimedia and Communication consume significant computing resources. Providing specific hardware support makes sense. 1 Goals
More informationAdvanced Instruction-Level Parallelism
Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu
More informationArchitectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.
Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture
More informationPipelining and Vector Processing
Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC
More informationProcessor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.
Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 9. Pipelining Design Techniques
Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask
More informationThe Evolution of DSP Processors
Berkeley Design Technology, Inc. Optimized DSP Software Independent DSP Analysis A BDTI White Paper The Evolution of DSP Processors By Jennifer Eyre and Jeff Bier, Berkeley Design Technology, Inc. (BDTI)
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationChapter 1 Introduction
Chapter 1 Introduction The Motorola DSP56300 family of digital signal processors uses a programmable, 24-bit, fixed-point core. This core is a high-performance, single-clock-cycle-per-instruction engine
More information7/7/2013 TIFAC CORE IN NETWORK ENGINEERING
TMS320C50 Architecture 1 OVERVIEW OF DSP PROCESSORS BY Dr. M.Pallikonda Rajasekaran, Professor/ECE 2 3 4 Short history of DSPs 1960 DSP hardware using discrete components 1970 Monolithic components for
More informationTMS320C54x DSP Programming Environment APPLICATION BRIEF: SPRA182
TMS320C54x DSP Programming Environment APPLICATION BRIEF: SPRA182 M. Tim Grady Senior Member, Technical Staff Texas Instruments April 1997 IMPORTANT NOTICE Texas Instruments (TI) reserves the right to
More informationChapter 13 Reduced Instruction Set Computers
Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining
More informationMicro-programmed Control Ch 17
Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to
More informationE0-243: Computer Architecture
E0-243: Computer Architecture L1 ILP Processors RG:E0243:L1-ILP Processors 1 ILP Architectures Superscalar Architecture VLIW Architecture EPIC, Subword Parallelism, RG:E0243:L1-ILP Processors 2 Motivation
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationSH4 RISC Microprocessor for Multimedia
SH4 RISC Microprocessor for Multimedia Fumio Arakawa, Osamu Nishii, Kunio Uchiyama, Norio Nakagawa Hitachi, Ltd. 1 Outline 1. SH4 Overview 2. New Floating-point Architecture 3. Length-4 Vector Instructions
More informationMachine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationHardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions
Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary Hardwired Control (4) Complex Fast Difficult to design Difficult to modify
More informationComputer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue
More informationInstruction Set Principles and Examples. Appendix B
Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationNative Signal Processing on the UltraSparc in the Ptolemy Environment
Presented at the Thirtieth Annual Asilomar Conference on Signals, Systems, and Computers - November 1996 Native Signal Processing on the UltraSparc in the Ptolemy Environment William Chen, H. John Reekie,
More informationTowards a Java-enabled 2Mbps wireless handheld device
Towards a Java-enabled 2Mbps wireless handheld device John Glossner 1, Michael Schulte 2, and Stamatis Vassiliadis 3 1 Sandbridge Technologies, White Plains, NY 2 Lehigh University, Bethlehem, PA 3 Delft
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationGENERAL-PURPOSE MICROPROCESSOR PERFORMANCE FOR DSP APPLICATIONS. University of Utah. Salt Lake City, UT USA
GENERAL-PURPOSE MICROPROCESSOR PERFORMANCE FOR DSP APPLICATIONS J.N. Barkdull and S.C. Douglas Department of Electrical Engineering University of Utah Salt Lake City, UT 84112 USA ABACT Digital signal
More information,1752'8&7,21. Figure 1-0. Table 1-0. Listing 1-0.
,1752'8&7,21 Figure 1-0. Table 1-0. Listing 1-0. The ADSP-21065L SHARC is a high-performance, 32-bit digital signal processor for communications, digital audio, and industrial instrumentation applications.
More informationUniprocessors. HPC Fall 2012 Prof. Robert van Engelen
Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationModule 1. Introduction. Version 2 EE IIT, Kharagpur 1
Module 1 Introduction Version 2 EE IIT, Kharagpur 1 Lesson 4 Embedded Systems Components Part II Version 2 EE IIT, Kharagpur 2 Overview on Components Instructional Objectives After going through this lesson
More informationEvolution of Computers & Microprocessors. Dr. Cahit Karakuş
Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor
More informationAdvance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts
Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationControl unit. Input/output devices provide a means for us to make use of a computer system. Computer System. Computer.
Lecture 6: I/O and Control I/O operations Control unit Microprogramming Zebo Peng, IDA, LiTH 1 Input/Output Devices Input/output devices provide a means for us to make use of a computer system. Computer
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationThe ARM10 Family of Advanced Microprocessor Cores
The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationC66x CorePac: Achieving High Performance
C66x CorePac: Achieving High Performance Agenda 1. CorePac Architecture 2. Single Instruction Multiple Data (SIMD) 3. Memory Access 4. Pipeline Concept CorePac Architecture 1. CorePac Architecture 2. Single
More informationTransparent Data-Memory Organizations for Digital Signal Processors
Transparent Data-Memory Organizations for Digital Signal Processors Sadagopan Srinivasan, Vinodh Cuppu, and Bruce Jacob Dept. of Electrical & Computer Engineering University of Maryland at College Park
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationTi Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr
Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions
More informationLatches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter
IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more
More informationIntroduction to Microprocessor
Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device
More information2 MARKS Q&A 1 KNREDDY UNIT-I
2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,
More informationSA-1500: A 300 MHz RISC CPU with Attached Media Processor*
and Bridges Division SA-1500: A 300 MHz RISC CPU with Attached Media Processor* Prashant P. Gandhi, Ph.D. and Bridges Division Computing Enhancement Group Intel Corporation Santa Clara, CA 95052 Prashant.Gandhi@intel.com
More information