Design of Embedded DSP Processors Unit 8: Firmware design and benchmarking. 9/27/2017 Unit 8 of TSEA H1 1

Size: px
Start display at page:

Download "Design of Embedded DSP Processors Unit 8: Firmware design and benchmarking. 9/27/2017 Unit 8 of TSEA H1 1"

Transcription

1 Design of Embedded DSP Processors Unit 8: Firmware design and benchmarking 9/27/2017 Unit 8 of TSEA H1 1

2 Contents Introduction to FW and its coding flow 1. Application modeling under HW constraints 2. Stream-kernel (master / slave) programming 3. Programming algorithm / computing kernels 4. Assembly code implementation 5. Code benchmarking and integration 9/27/2017 Unit 8 of TSEA H1 2

3 FW design flow 9/27/2017 Unit 8 of TSEA H1 3

4 Firmware FW is SW with fixed functions and firmed (not yet HW) in a system. FW permanently installed in non-volatile memory, rarely changed. Typical baseband firmware in SDR processor, video CODEC firmware in TV, in Surveillance camera 9/27/2017 Unit 8 of TSEA H1 4

5 FW coding / implementation flow Documents, STD High level behavior modeling Code inspection HW constraints HW related C-modeling Assembly programmin g C-compiler Code inspection Source xx.asm Source xxx.c C-compiler Assembler objective file xxx.bin objective file xxx.bin LIB 9/27/2017 Unit 8 of TSEA H1 5 Object linker Simulator debugger

6 The role of Programmer / Compiler 1. Programmer: partition and assign to different instruction domains /streams, domain coding & debugging, and integrate heterogeneous codes In an instruction stream, a programmer codes kernel codes to approach the best performance 2. A compiler translate C to codes of its machine language and optimize the translation. 3. API is finally added by a programming model 9/27/2017 Unit 8 of TSEA H1 6

7 Understand Applications Product Portable audio player DTV and video player Application components RTOS Audio decoder Voice encoder DVB modem Video decoder Function kernels Filter (I)DCT Huffman decoder Waveform generator (I)FFT Innermost loop design 9/27/2017 Unit 8 of TSEA H1 7

8 Task partition, allocation, scheduling before coding / compiling Mostly do it by hand, rarely available tools. Based on computing cost prediction (code profile), algorithm features, & HW constraints There are different partition objectives: to reach the highest performance lowest power (lower speed, less communication) Lowest memory cost Job balancing 9/27/2017 Unit 8 of TSEA H1 8

9 Understanding applications HW Aware algorithm selections High level language modeling Finite length design Coding finite length firmware Expose memory costs Coding FW with memory costs Run time budget Coding cycle accurate FW Re-allocatable assembly coding Binary machine code Copyright of Linköping University, all rights reserved FW Design flow Behavior modeling Simplified firmware design flow Bit accurate modeling Memory accurate modeling Timing budget Assembly coding Design entry 1 Design entry 2 Design entry 3 embedded.com codehelp.co.uk

10 High level FW design 9/27/2017 Unit 8 of TSEA H1 10

11 Algorithm selection Function! Do not forget your function! Select algorithms for the architecture (adapt to HW 1 advanced feature and 2 constraints) Reuse of available algorithms (SW reuse) Minimize computing cost (innermost loop) Minimize code cost (of high level codes) Minimize data accesses (mostly focused today) 9/27/2017 Unit 8 of TSEA H1 11

12 Stream-kernel based programming Stream The main consists of FSM, prepare & use subroutines Prolog (start a subrouting in device) Epilog (finish subrouting in device, handover results) API insertion: CUDA, OpenCL, OpenGL, OpenMP Kernel Interwork, task/resource management, and function call Speed up innermost loops by assembly level coding That what we are going to do today! 9/27/2017 For teachers using the book 12

13 Assembly kernel coding 9/27/2017 Unit 8 of TSEA H1 13

14 Finite Length Finite Length Integer/Fractional data with limited dynamic range Low cost/power with acceptable quantization noise Technique Integer/fractional guard bits for iterations Scaling and Round before truncation Saturation instead of exception Block floating, half precision floating point 9/27/2017 Unit 8 of TSEA H1 14

15 Filter DEC DSP Filter Copyright of Linköping University, all rights reserved Added quality control codes A/D Main task flow DSP DSP DSP D/A Scaling Scaling Scaling coefficient paramet Scaling Scaling scaling scaling Scaling flow tasks are executed only after running the measurement flow MAX AVG counters Measurement flow tasks are executed only when needed 9/27/2017 Unit 8 of TSEA H1 15

16 Firmware in a fixed point processing Start Program booting and parameter initialization Loading inputs and pre-processing Main task flow Executing the kernel part algorithms Data quality control flow Default No operation In case needed Measurement flow After measurement Scaling flow Post processing, result storing 9/27/2017 For teachers using the book 16

17 Bit accurate behavior coding Fractional v.s. integer A=0.25 v.s. 8192=0.25*32768 Mask including guard: A=(long)(int)A&0001FFFF Arithmetic, for example: yn= yn+((long)(int)a*xn>>15) 9/27/2017 Unit 8 of TSEA H1 17

18 Bit accurate specification HW Ceiling Headroom ADC resolution Scale up to avoid accumulated quantization errors MAX gain result 0dB Feet-room 9/27/2017 Unit 8 of TSEA H1 18

19 Measuring Data Quality D RMS ( R 1 r 1 ) 2 ( R 2 r 2 ) 2... ( R n r n ) 2 N D ABSMAX MAX{ R r1, R2 r2,..., Rn 1 rn 1 1 n n, R r } SNR 20log MAX 10 headroom D RMS dbv 9/27/2017 Unit 8 of TSEA H1 19

20 Memory and memory access Using SPM instead of cache Expose flexibilities for data access Minimize memory cost or access cost? Memory hardware constraints may induce extra execution time Code loading, load/store data, swapping data when memory size is not sufficient Adapt your implementation to memory HW 9/27/2017 Unit 8 of TSEA H1 20

21 Memory efficiency 1. Minimize memory costs Low program cost, low data memory costs 2. Minimum memory access costs Minimize on off chip swapping (SPM efficiency?) Multi tasks/threads sharing data Memory block re-connect (sharing out/in FIFO) 9/27/2017 Unit 8 of TSEA H1 21

22 Memory efficient Select algorithms with full memory access predictability. Much data can thus be stored in the off-chip memory and pre-fetch it when needed. 9/27/2017 Unit 8 of TSEA H1 22

23 Reduce register cost Number of registers required a b c d s t u v x y ACR0 ACR1 R0 R Cycles 9/27/2017 Unit 8 of TSEA H1 23 R1 R2 R4 R5 R0 R3 R1 R2

24 Real-time Firmware Implementation Correct = correct result + results available in time Find critical path & time constraints, WCET, minimize memory uncertainty 9/27/2017 Unit 8 of TSEA H1 24

25 Real Time Real time Cycle true: based on known cycle count Short distance between WCET: Worst Case Execution Time BCET: Best Case Execution Time Dynamic / static run time analysis Quality coding of innermost loops 9/27/2017 Unit 8 of TSEA H1 25

26 Code compiling The closer the C-code to HW, the better can be the C-compiler result Understand the compiler in detail. Annotate enough Compiler known Do we trust compiler Functional verification of compiled code 9/27/2017 Unit 8 of TSEA H1 26

27 Low cycle cost assembly kernels Focus on low cycle cost of inner most loops! Use REPEAT instead of conditional jump Loop unrolling & low cycle cost scheduling! Do not care much the code cost of inner loop! Use as much vector instruction as possible Keep useful data in RF as long time as possible C Algorithms for Real-Time DSP, Prentice Hall, ISBN Hacker's Delight, Addison-Wesley, ISBN /27/2017 Unit 8 of TSEA H1 27

28 Low cycle cost assembly kernels Implementation models Function Matrix Basic Video Baseband HPC Large matrix Transform Larger size T Filter ISP CODEC Post process Coding Searching Sorting FSM Storage Channel Decoding FEC Taylor series Task partition Data partition Grouping Pipeline Recursive SPMD Master-slave Fork-join BSPM Data sharing Reading:A Pattern Language for Parallel Programming

29 Reading:A Pattern Language for Parallel Programming 9/27/2017 Unit 8 of TSEA H1 29

30 Kernel programming tips CISC (if available) V.S. RISC (always there) RISC: Memory RF Computing RF Memory DSP loop: Memory Computing RF Trade off 10% - 90%, prolog, epilog, iterations Minimize cycle cost by acceleration / quality coding Amdahl s law: To minimize the parts can not run in parallel 9/27/2017 Unit 8 of TSEA H1 30

31 Code integration Oh my god! Where are cycles consumed! Extra cycles are needed during SW integration Be sure you predicted / accounted cycles during early SW plan / design phases Extra cost can come from (not limited to) Control: prolog/epilog, asynch, synchronization Data dependencies: loading, waiting for data available Communications: master/device (slave, I/O) 9/27/2017 Unit 8 of TSEA H1 31

32 Assembly-level Release WCET (the worst-case execution time) should be analyzed based on static timing analysis Remove paths which can never be true Avoid releasing code based on dynamic timing (code simulation) Stack overflow should be checked if multiple tasks are running simultaneously and associated with many interrupts and subroutine calls 9/27/2017 Unit 8 of TSEA H1 32

33 Benchmark 9/27/2017 Unit 8 of TSEA H1 33

34 Benchmark Benchmark is a type of program to measure the performance of a processor. Benchmarking is the execution of such type of programs which allows processor users to measure machine clock cycles consumed by a specific section of code. 9/27/2017 Unit 8 of TSEA H1 34

35 ASIP design flow Source code analysis, Decision for ISA of ASIP Design instruction set and toolchain for prototyping Benchmark (kernel), evaluate microarchitecturte Change ISA? No Satisfied? Yes Microarchitecture design, VLSI design, Verifications 2017/9/27 Unit 8 of TSEA H1 35

36 Third Party Benchmarks BDTI: Berkeley Design Tech Incorporation Hand written assembly by professional engineers EEMBC (the EDN Embedded Microprocessor Benchmark Consortium), five classes: automotive/industrial, consumer, networking, office automation, and telecommunication 9/27/2017 Unit 8 of TSEA H1 36

37 Benchmark example: for a simple DSP Algorithm Kernels Number of samples Taps Total cycle cost Kernel cycle cost P-Mem cost D-mem cost Block transfer point complex FFT Single data sample FIR Frame FIR (multi samples) Complex FIR IIR biquad type I LMS Adaptive FIR bit division Vector add Vector dot Vector Max Floating to fixed Fixed to floating X8DCT FSM (Packet classification) /27/2017 Unit 8 of TSEA H1 37

38 How to write a benchmark All operation, operands, and results are native length. Try to keep high precision in MAC. Round and saturate before storing data from MAC (after truncation) to memory or registers. All programs are implemented by experienced DSP firmware engineers. Complete program including loop prolog and epilog, program initialization, and wrapping up. All related memory access cost shall be included. 9/27/2017 Unit 8 of TSEA H1 38

39 An example: FIR benchmark A FIR filter is a weighted sum of a finite set of inputs. y(n)= m 1 k 0 a x( n x(n) is the input y(n) is the output k k) a k is a vector as the filter coefficients 9/27/2017 Unit 8 of TSEA H1 39

40 An example: FIR benchmark x(n) T T T a 0 a 1 a n + y(n) 9/27/2017 Unit 8 of TSEA H1 40

41 An example: FIR benchmark Behavior level code (single sample FIR) { Reset ACR DM(DP) <= The latest Sample DP <= DP + 1 /*Store latest sample in computing buffer, and then load the oldest sample, using same pointer. */ For i=0 to 15 do { ACR =< ACR + DM(DP)*TM(TP) /* 16-tap convolution for a sample */ DP <= DP + 1 /* implied modulo DP */ TP <= TP + 1; Round and Sat ACR; Output result; } Store the data pointer DP. } 9/27/2017 Unit 8 of TSEA H1 41

42 An example: FIR benchmark The first part of the program Set AP1, $SEG_FIR -- load segment (block) address to DM1pointer Set LoopR, N -- load the loop counter -- filter program parameters are stored in DM1 Set R15, $Resultpt -- Result pointer to R15 Set AP0, $Datapt -- data pointer to AP0 Set BTR, $Bottom -- FIFO bottom pointer Set TPR, $Top -- FIFO top pointer Set AP1, $Coeffpt -- coefficient pointer to AP The prolog consumes 7 cycles Repeat N -- Number of samples --for every data sample Store DM0(AP0++), R1 -- a sample data from R1 to DM0(DM0pointer) CLR ACR1 -- Clean the accumulator buffer ACR1 9/27/2017 Unit 8 of TSEA H1 42

43 An example: FIR benchmark The second part of the program CONV ACR1 SSF 16 DM0(AP0) DM1(AP1) -- Signed fractional convolution -- iteration uses N+1 = 16(17) clock cycles Convolution iteration --consumes 16 cycles if the following --instruction does not use ACR1 9/27/2017 Unit 8 of TSEA H1 43

44 An example: FIR benchmark The third part of the program PostOP R1, ACR1 -- Sat Round(ACR), store result in ACRH and R1 Store DM1(R15), R1 -- Store result in R1 to DM1(GRX++) INC R15 -- position to the next result End repeat Store DM1(AP1++), R15 - Store Y pointer after updating result Y Store DM1(AP1), AP1 - Store X pointer of the FIFO filter The epilog consumes 6 cycles /27/2017 Unit 8 of TSEA H1 44

45 The data memory space The FIFO buffer X(0) X(1) X(2) X(3) X(4) X(5) X(13) X(14) X(15) Copyright of Linköping University, all rights reserved Example: Frame sample FIR C-code: 40 samples filtered by a 16-tap FIR Push new data once a FIR tap Load each data once for signal processing of a FIR tap (a) The FIFO behavior Removed data MIN address MAX address Bottom Top DM Btm + 0 Btm + 1 Btm + 14 Btm + 15 State 0 State 1 R0 R7 X (n) X (n-15)... X (n-2) R5 X (n-1) R7 X (n-15) X (n-14)... X (n-1) R5 X (n) Read a new value to replace the oldest value in the buffer: x (n-15) R7 R5 State 2 State 3 R7 X (n-1) X (n) X (n-15) X (n-2) X (n-2) X (n-1) X (n) X (n-15) 9/27/2017 For teachers using the book 45 (b) The FIFO implementation R0 R5 R0 Increase the address counter R0. It points to the (next) oldest value in the FIFO. Replace the (next) oldest value x (n-15) with the new incoming value R0

46 Example: Frame sample FIR C-code: 16-tap FIR filter runs 40 samples Kernel cycle cost 17x40=680 cycles Prolog and epilog of inner loop: 40x5=200 cycles Prolog and epilog of the top loop: 9 cycles Typical BDTI benchmarking Algorithm 40 sample 16-tap FIR Innermost loop pro epilogue Kernel cycle cost 5x40=200 17x40 = 680 Total code cost DM cost /27/2017 Unit 8 of TSEA H1 46

47 Review on today s discussions Quality firmware design is based on rich FW experiences, deep understanding of applications, and HW. A formal design will never offer quality code. Firmware design can be divided into three steps: the algorithm selection and behavior modeling, the C-coding under hardware constraint, the assembly language coding Benchmark fundamentals Learn heterogeneous programming model in other courses 9/27/2017 Unit 8 of TSEA H1 47

48 Concepts Copyright of Linköping University, all rights reserved Summarize what/how to learn Skills System understanding FW coding Integration Assembly coding tools Further understanding tools after reading chapter 18 Debug skill Verification Firmware plan & design Skills to select algorithms Bit accurate Memory accurate Cycle accurate plan vs code To find extra cycle cost which you could not find out during coding subroutines 9/27/2017 Unit 8 of TSEA H1 48

49 Self reading after the lecture Your hardware knowledge will help you to design quality firmware, try to summarize it by yourself Reading Chapter 18 and chapter 9 1. Collect experiences to design quality innermost loop codes. 2. How to accelerate innermost loop in HW. 9/27/2017 Unit 8 of TSEA H1 49

50 Exciting time now! Let us discuss Whatever you want to discuss and related to HW You will have the chance after each lecture (Fö), do take the chance! Prepare your Qs for the next time 9/27/2017 Unit 8 of TSEA H1 50

51 LOGO Welcome to ask any questions you want to I can answer Or discuss together I want to know what you want Dake Liu, Room 556 coridoor B, Hus-B, phone , dake.liu@liu.se

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1 Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later

More information

Design of Embedded DSP Processors

Design of Embedded DSP Processors Design of Embedded DSP Processors Unit 3: Microarchitecture, Register file, and ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 1 Contents 1. Microarchitecture and its design 2. Hardware design fundamentals 3.

More information

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1 Design of Embedded DSP Processors Unit 5: Data access 9/11/2017 Unit 5 of TSEA26-2017 H1 1 Data memory in a Processor Store Data FIFO supporting DSP executions Computing buffer Parameter storage Access

More information

Design of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1

Design of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1 Design of Embedded DSP Processors Unit 7: Programming toolchain 9/26/2017 Unit 7 of TSEA26 2017 H1 1 Toolchain introduction There are two kinds of tools 1.The ASIP design tool for HW designers Frontend

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2014 Conclusions - Instruction set design An assembly language instruction set must be more efficient than Junior Accelerations shall be implemented at arithmetic and algorithmic levels.

More information

02 - Numerical Representation and Introduction to Junior

02 - Numerical Representation and Introduction to Junior 02 - Numerical Representation and Introduction to Junior September 10, 2013 Todays lecture Finite length effects, continued from Lecture 1 How to handle overflow Introduction to the Junior processor Demonstration

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

TSEA 26 exam page 1 of Examination. Design of Embedded DSP Processors, TSEA26 Date 8-12, G34, G32, FOI hus G

TSEA 26 exam page 1 of Examination. Design of Embedded DSP Processors, TSEA26 Date 8-12, G34, G32, FOI hus G TSEA 26 exam page 1 of 10 20171019 Examination Design of Embedded DSP Processors, TSEA26 Date 8-12, 2017-10-19 Room G34, G32, FOI hus G Time 08-12AM Course code TSEA26 Exam code TEN1 Design of Embedded

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

05 - Microarchitecture, RF and ALU

05 - Microarchitecture, RF and ALU September 15, 2015 Microarchitecture Design Step 1: Partition each assembly instruction into microoperations, allocate each microoperation into corresponding hardware modules. Step 2: Collect all microoperations

More information

03 - The Junior Processor

03 - The Junior Processor September 10, 2014 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing

More information

03 - The Junior Processor

03 - The Junior Processor September 8, 2015 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

Design and Implementation of Single Issue DSP Processor Core. Vinodh Ravinath

Design and Implementation of Single Issue DSP Processor Core. Vinodh Ravinath Design and Implementation of Single Issue DSP Processor Core Examensarbete utfört i Datirteknik Vid Tekniska högskolan i Linköping av Vinodh Ravinath LiTH-ISY-EX--07/4094--SE Linköping 2007 Design and

More information

02 - Numerical Representations

02 - Numerical Representations September 3, 2014 Todays lecture Finite length effects, continued from Lecture 1 Floating point (continued from Lecture 1) Rounding Overflow handling Example: Floating Point Audio Processing Example: MPEG-1

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

ECE 450:DIGITAL SIGNAL. Lecture 10: DSP Arithmetic

ECE 450:DIGITAL SIGNAL. Lecture 10: DSP Arithmetic ECE 450:DIGITAL SIGNAL PROCESSORS AND APPLICATIONS Lecture 10: DSP Arithmetic Last Session Floating Point Arithmetic Addition Block Floating Point format Dynamic Range and Precision 2 Today s Session Guard

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

Design of Embedded DSP Processors

Design of Embedded DSP Processors Design of Embedded DSP Processors Unit 10: Integration and Verification 10/3/2017 Unit 10 of TSEA26 2017 H1 1 Three integrations 1. Hardware integration (Integration of RTL codes) 2. Integration of the

More information

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C Final Review CS304 Introduction to C Why C? Difference between Python and C C compiler stages Basic syntax in C Pointers What is a pointer? declaration, &, dereference... Pointer & dynamic memory allocation

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

Independent DSP Benchmarks: Methodologies and Results. Outline

Independent DSP Benchmarks: Methodologies and Results. Outline Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10032011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Chapter 3 Number Systems Fixed Point

More information

What is Computer Architecture?

What is Computer Architecture? What is Computer Architecture? Architecture abstraction of the hardware for the programmer instruction set architecture instructions: operations operands, addressing the operands how instructions are encoded

More information

1. Micro Architecture and Finite Length. Olle Seger Andreas Ehliar Dake Liu, Rizwan Azhgar

1. Micro Architecture and Finite Length. Olle Seger Andreas Ehliar Dake Liu, Rizwan Azhgar 1. Micro Architecture and Finite Length Olle Seger (olle.seger@liu.se) Andreas Ehliar (ehliar@isy.liu.se) Dake Liu, Rizwan Azhgar 1 Outline Introduction Some Administrative Information Basic Components

More information

EQUALIZER DESIGN FOR SHAPING THE FREQUENCY CHARACTERISTICS OF DIGITAL VOICE SIGNALS IN IP TELEPHONY. Manpreet Kaur Gakhal

EQUALIZER DESIGN FOR SHAPING THE FREQUENCY CHARACTERISTICS OF DIGITAL VOICE SIGNALS IN IP TELEPHONY. Manpreet Kaur Gakhal EQUALIZER DESIGN FOR SHAPING THE FREQUENCY CHARACTERISTICS OF DIGITAL VOICE SIGNALS IN IP TELEPHONY By: Manpreet Kaur Gakhal A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE

More information

DSP Platforms Lab (AD-SHARC) Session 05

DSP Platforms Lab (AD-SHARC) Session 05 University of Miami - Frost School of Music DSP Platforms Lab (AD-SHARC) Session 05 Description This session will be dedicated to give an introduction to the hardware architecture and assembly programming

More information

07 - Program Flow Control

07 - Program Flow Control September 23, 2014 Schedule change this week The lecture on thursday needs to move Lab computers The current computer lab (Bussen) is pretty nice since it has dual monitors However, the computers does

More information

REAL-TIME DIGITAL SIGNAL PROCESSING

REAL-TIME DIGITAL SIGNAL PROCESSING REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,

More information

Lode DSP Core. Features. Overview

Lode DSP Core. Features. Overview Features Two multiplier accumulator units Single cycle 16 x 16-bit signed and unsigned multiply - accumulate 40-bit arithmetic logical unit (ALU) Four 40-bit accumulators (32-bit + 8 guard bits) Pre-shifter,

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems

Job Posting (Aug. 19) ECE 425. ARM7 Block Diagram. ARM Programming. Assembly Language Programming. ARM Architecture 9/7/2017. Microprocessor Systems Job Posting (Aug. 19) ECE 425 Microprocessor Systems TECHNICAL SKILLS: Use software development tools for microcontrollers. Must have experience with verification test languages such as Vera, Specman,

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Designing with STM32F2x & STM32F4

Designing with STM32F2x & STM32F4 Designing with STM32F2x & STM32F4 Course Description Designing with STM32F2x & STM32F4 is a 3 days ST official course. The course provides all necessary theoretical and practical know-how for start developing

More information

Real-time Signal Processing on the Ultrasparc

Real-time Signal Processing on the Ultrasparc Technical Memorandum M97/4, Electronics Research Labs, 1/17/97 February 21, 1997 U N T H E I V E R S I T Y A O F LET TH E R E B E 1 8 6 8 LI G H T C A L I A I F O R N Real-time Signal Processing on the

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.

More information

Introducing the Superscalar Version 5 ColdFire Core

Introducing the Superscalar Version 5 ColdFire Core Introducing the Superscalar Version 5 ColdFire Core Microprocessor Forum October 16, 2002 Joe Circello Chief ColdFire Architect Motorola Semiconductor Products Sector Joe Circello, Chief ColdFire Architect

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction The Motorola DSP56300 family of digital signal processors uses a programmable, 24-bit, fixed-point core. This core is a high-performance, single-clock-cycle-per-instruction engine

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

MODERN OPERATING SYSTEMS. Chapter 3 Memory Management

MODERN OPERATING SYSTEMS. Chapter 3 Memory Management MODERN OPERATING SYSTEMS Chapter 3 Memory Management No Memory Abstraction Figure 3-1. Three simple ways of organizing memory with an operating system and one user process. Base and Limit Registers Figure

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

EE 354 Fall 2015 Lecture 1 Architecture and Introduction

EE 354 Fall 2015 Lecture 1 Architecture and Introduction EE 354 Fall 2015 Lecture 1 Architecture and Introduction Note: Much of these notes are taken from the book: The definitive Guide to ARM Cortex M3 and Cortex M4 Processors by Joseph Yiu, third edition,

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer Outline Learn the components common to every modern computer system. Be able to explain how each component contributes to program execution. Understand a simple

More information

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Computer Systems A Programmer s Perspective 1 (Beta Draft) Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface

More information

Chapter 4. MARIE: An Introduction to a Simple Computer

Chapter 4. MARIE: An Introduction to a Simple Computer Chapter 4 MARIE: An Introduction to a Simple Computer Chapter 4 Objectives Learn the components common to every modern computer system. Be able to explain how each component contributes to program execution.

More information

CS450/550 Operating Systems

CS450/550 Operating Systems CS450/550 Operating Systems Lecture 4 memory Palden Lama Department of Computer Science CS450/550 Memory.1 Review: Summary of Chapter 3 Deadlocks and its modeling Deadlock detection Deadlock recovery Deadlock

More information

EMBEDDED SYSTEM BASICS AND APPLICATION

EMBEDDED SYSTEM BASICS AND APPLICATION EMBEDDED SYSTEM BASICS AND APPLICATION Dr.Syed Ajmal IIT- Robotics TOPICS TO BE DISCUSSED System Embedded System Components Classifications Processors Other Hardware Software Applications 2 INTRODUCTION

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 177 4.2 CPU Basics and Organization 177 4.2.1 The Registers 178 4.2.2 The ALU 179 4.2.3 The Control Unit 179 4.3 The Bus 179 4.4 Clocks

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015 Advanced Parallel Architecture Lesson 3 Annalisa Massini - Von Neumann Architecture 2 Two lessons Summary of the traditional computer architecture Von Neumann architecture http://williamstallings.com/coa/coa7e.html

More information

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely

More information

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the

More information

Chapter 4. Chapter 4 Objectives. MARIE: An Introduction to a Simple Computer

Chapter 4. Chapter 4 Objectives. MARIE: An Introduction to a Simple Computer Chapter 4 MARIE: An Introduction to a Simple Computer Chapter 4 Objectives Learn the components common to every modern computer system. Be able to explain how each component contributes to program execution.

More information

Characterization of Native Signal Processing Extensions

Characterization of Native Signal Processing Extensions Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices

VICP Signal Processing Library. Further extending the performance and ease of use for VICP enabled devices Signal Processing Library Further extending the performance and ease of use for enabled devices Why is library effective for customer application? Get to market faster with ready-to-use signal processing

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

Microprocessors, Lecture 1: Introduction to Microprocessors

Microprocessors, Lecture 1: Introduction to Microprocessors Microprocessors, Lecture 1: Introduction to Microprocessors Computing Systems General-purpose standalone systems (سيستم ھای نھفته ( systems Embedded 2 General-purpose standalone systems Stand-alone computer

More information

An introduction to Digital Signal Processors (DSP) Using the C55xx family

An introduction to Digital Signal Processors (DSP) Using the C55xx family An introduction to Digital Signal Processors (DSP) Using the C55xx family Group status (~2 minutes each) 5 groups stand up What processor(s) you are using Wireless? If so, what technologies/chips are you

More information

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD OPERATING SYSTEMS #8 After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD MEMORY MANAGEMENT MEMORY MANAGEMENT The memory is one of

More information

Evaluating MMX Technology Using DSP and Multimedia Applications

Evaluating MMX Technology Using DSP and Multimedia Applications Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical

More information

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of An Independent Analysis of the: MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of Berkeley Design Technology, Inc. OVERVIEW MIPS Technologies, Inc. is an Intellectual Property (IP)

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

ELC4438: Embedded System Design Embedded Processor

ELC4438: Embedded System Design Embedded Processor ELC4438: Embedded System Design Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University 1. Processor Architecture General PC Von Neumann Architecture a.k.a. Princeton Architecture

More information

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION Radu Balaban Computer Science student, Technical University of Cluj Napoca, Romania horizon3d@yahoo.com Horea Hopârtean Computer Science student,

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

RISC-V CUSTOMIZATION WITH STUDIO 8

RISC-V CUSTOMIZATION WITH STUDIO 8 RISC-V CUSTOMIZATION WITH STUDIO 8 Zdeněk Přikryl CTO, Codasip GmbH WHO IS CODASIP Leading provider of RISC-V processor IP Introduced its first RISC-V processor in November 2015 Offers its own portfolio

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

Anand Raghunathan

Anand Raghunathan ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Embedded Systems Design (630414) Lecture 1 Introduction to Embedded Systems Prof. Kasim M. Al-Aubidy Computer Eng. Dept.

Embedded Systems Design (630414) Lecture 1 Introduction to Embedded Systems Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Embedded Systems Design (630414) Lecture 1 Introduction to Embedded Systems Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Definition of an E.S. It is a system whose principal function is not computational,

More information

Low-Power Processor Solutions for Always-on Devices

Low-Power Processor Solutions for Always-on Devices Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile

More information

Porting LLVM to a Next Generation DSP

Porting LLVM to a Next Generation DSP Porting LLVM to a Next Generation DSP Presented by: L. Taylor Simpson LLVM Developers Meeting: 11/18/2011 PAGE 1 Agenda Hexagon DSP Initial porting Performance improvement Future plans PAGE 2 Hexagon DSP

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda

More information

MICROPROCESSOR BASED SYSTEM DESIGN

MICROPROCESSOR BASED SYSTEM DESIGN MICROPROCESSOR BASED SYSTEM DESIGN Lecture 5 Xmega 128 B1: Architecture MUHAMMAD AMIR YOUSAF VON NEUMAN ARCHITECTURE CPU Memory Execution unit ALU Registers Both data and instructions at the same system

More information

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1... Instruction-set Design Issues: what is the format(s) Opcode Dest. Operand Source Operand 1... 1) Which instructions to include: How many? Complexity - simple ADD R1, R2, R3 complex e.g., VAX MATCHC substrlength,

More information

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

URL:   Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture 01 1 EE 4720 Computer Architecture 01 1 URL: https://www.ece.lsu.edu/ee4720/ RSS: https://www.ece.lsu.edu/ee4720/rss home.xml Offered by: David M. Koppelman 3316R P. F. Taylor Hall, 578-5482, koppel@ece.lsu.edu,

More information

14.1 Control Path in General

14.1 Control Path in General AGU PC FSM Configuration and status Program address Instruction Instruction decoder DM Operand & result control Exec unit ALU/MAC Results RF Control Path Design Hardware organization and micro architecture

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture Lecture Topics ECE 486/586 Computer Architecture Lecture # 5 Spring 2015 Portland State University Quantitative Principles of Computer Design Fallacies and Pitfalls Instruction Set Principles Introduction

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

When addressing VLSI design most books start from a welldefined

When addressing VLSI design most books start from a welldefined Objectives An ASIC application MSDAP Analyze the application requirement System level setting of an application Define operation mode Define signals and pins Top level model Write a specification When

More information

Fixed-Point Math and Other Optimizations

Fixed-Point Math and Other Optimizations Fixed-Point Math and Other Optimizations Embedded Systems 8-1 Fixed Point Math Why and How Floating point is too slow and integers truncate the data Floating point subroutines: slower than native, overhead

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information