VLSI Programming 2016: Lecture 6

Size: px
Start display at page:

Download "VLSI Programming 2016: Lecture 6"

Transcription

1 VLSI Programming 2016: Lecture 6 Course: 2IMN35 Teachers: Kees van Berkel c.h.v.berkel@tue.nl Rudolf Mak r.h.mak@tue.nl Lab: Kees van Berkel, Rudolf Mak, Alok Lele www: Lecture 6 T3, T4, digital signal processors 1

2 VLSI Programming (2IMN35): time table in Tue: h5-h8; MF.07 out 2016 in Thu: h1-h4; Gemini-Z3A-08/10/13 out 19-Apr introduc/on, DSP graphs, bounds, 21-Apr pipelining, re/ming, transposi/on, J-slow, unfolding 26-Apr tools Introduc/ons to L1: audio filter L1 28-Apr T1 unfolding, look-ahead, L1 cntd installed FPGA and Verilog simula/on L2 + T2 strength reduc/on 3-May folding L2: audio filter 5-May on XUP board 10-May T3 + T4 DSP processors L2 cntd L3 12-May L3: sequen/al FIR + strength-reduced FIR 17-May L3 cntd 19-May L3 cntd L4 24-May systolic computa/on T5 26-May L3 L4 31-May T5 L4: 2-Jun L4 cntd L5 audio sample rate convertor 7-Jun L5: 1024x audio sample rate convertor 9-Jun L4 L5 cntd 14-Jun 16-Jun L5 deadline report L5 T1 + T2 T3 + T4 2

3 Outline Lecture 6 T3, T4 The SW-HW performance spectrum an architecture-morphing exercise Mandatory reading (reminder): Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp

4 T3: Parallel IIR assignment Consider IIR: y(n) = x(n) + a*y(n-2) Assume add and multiply time: 2 and 5 nsec resp. 1. Derive parallel look-ahead IIR, L=4, 2. Pipeline and retime for maximal throughput using a minimum number of D-elements. 3. Include throughput and latency calculation. Return deadline: Tuesday May 10 4

5 IIR assignment 4 (from lecture 3) 4. pipeline and retime unfolded IIR; draw DFG; throughput? y(2(k-1)) no pipelining possible a D same DFG x(2k) + y(2k) y(2(k-1)+1) same troughput! a D f sample 2 T M +T A = 2 7 x(2k+1) = 286 MHz + y(2k+1) 5

6 Parallel IIR assignment unrolling 3x (n n+1) note: rewrite a la Parhi: u(n) = x(n+2) y(n+2) = a*y(n) + u(n) y(n+3) = a*y(n+1) + u(n+1) y(n+4) = a*y(n+2) + u(n+2) = a 2 *y(n) + a*u(n) + u(n+2) y(n+5) = a*y(n+3) + u(n+3) = a 2 *y(n+1) + a*u(n+1) + u(n+3) unfolding (L=4: n 4k) y(4k+2) = a *y(4k) + u(4k) y(4k+4) = a 2 *y(4k) + a*u(4k) + u(4k+2) y(4k+3) = a *y(4k+1) + u(4k+1) y(4k+5) = a 2 *y(4k+1) + a*u(4k+1) + u(4k+3) 6

7 Parallel IIR assignment u(4k+2) a u(4k) a 2 f 4 + 2T sample = = 444 TM A 9 4 MHz y(4k) u(4k+3) u(4k+1) a 2 y(4k+2) a y(4k+1) y(4k+5) y(4k+3) 7

8 Parallel IIR assignment u(4k+2) a u(4k) a 2 f sample 4 T M +T A = 4 7 = 571 MHz u(4k+3) a u(4k+1) a 2 y(4k+5) y(4k) -4) y(4k+2) -4) +4 D elements y(4k+1) -4) y(4k+3) -4) 8

9 Parallel IIR assignment 2 slow! u(4k+3) u(4k+2) u(4k+1) u(4k) a 2 f 2 + 0T sample = = 400 TM A 5 2 MHz a y(4k) y(4k+2) y(4k+1) y(4k+3) 9

10 T4: Strength-reduced FIR assignment Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-6) + d*x(n-7) Assume add and multiply times: 2 and 5 nsec resp. 1. Draw DFG of FIR, calculate throughput. 2. Apply strength reduction, L=2. 3. Pipeline and retime for maximal throughput using a minimum number of D-elements. 4. Include throughput and latency calculation. Return deadline: Tuesday May 10 10

11 Assignment T4: Strength-reduced FIR 1a Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-6) + d*x(n-7) Assume add and multiply times: 2 and 5 nsec resp. 1. Draw DFG of FIR, calculate throughput. Transposed form (for high throughput): d c b a f sample 5D 1 = 1 T M +T A 5+ 2 = 1 7 =143 MHz 11

12 Assignment T4: Strength-reduced FIR 1b 2. Pipeline and retime FIR for maximal throughput.. d c b a 5D -1) f 1 + 0T sample = = = 200 TM A MHz +4 D elements 12

13 Assignment T4: Strength-reduced FIR 2 y(n) = a*x(n) + b*x(n-1) + c*x(n-6) + d*x(n-7) y(2k) = a*x(2k) + b*x(2k-1) + c*x(2k-6) + d*x(2k-7) = a*x(2k) + b*x(2(k-1) +1) + c*x(2(k-3)) + d*x(2(k-4)+1) y(2k+1) = a*x(2k+1) + b*x(2k) + c*x(2k-5) + d*x(2k-6) = a*x(2k+1) + b*x(2k) + c*x(2(k-3)+1) + d*x(2(k-3)) y(2k) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + b*[ x(2(k-1)+1)) - x(2k) ] + d*[ x(2(k-4)+1)) - x(2(k-3)) ] y(2k+1) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + a*[x(2k+1) - x(2k) ] + c*[x(2(k-3) +1) - x(2(k-3)) ] 13

14 Assignment T4: Strength-reduced FIR 2 y(2k) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + b*[ x(2(k-1)+1)) - x(2k) ] + d*[ x(2(k-4)+1)) - x(2(k-3)) ] y(2k+1) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + a*[x(2k+1) - x(2k) ] + c*[x(2(k-3) +1) - x(2(k-3)) ] When we assume (a+b) and (c+d) are pre-computed constants: 3 2 multipliers for sub-firs 3 adders for sub-firs 2 adders +2 subtractors strength reduction overhead = 6 multipliers + 7 adds/subs (versus 8 multipliers + 6 adds/subs) 14

15 Assignment T4: Strength-reduced FIR 2 x(2k+1) + - c a 3D + + y(2k+1) x(2k) c+d a+b D - + 3D + d b f 2 + 3T sample = = 182 TM A 11 2 MHz 3D + + y(2k) 15

16 Assignment T4: Strength-reduced FIR 3,4 x(2k+1) + - c a Pipelining : +9 D-elements 3D + + y(2k+1-4) x(2k) c+d a+b D - + 3D + d b f 2 + 0T sample = = 400 TM A 5 2 MHz 3D + + y(2k-4) 16

17 Assignment T4: Strength-reduced FIR 3,4 y(2k+1-6) c a 3D x(2k+1) y(2k-6) + + D - c+d a+b + 3D + Transposition: 12 D-elements d b 3D x(2k) 17

18 DIGITAL SIGNAL PROCESSORS 18

19 The SW-HW Spectrum: outline FIR on a microprocessor (MIPS, DLX) FIR on a DSP DSP Arithmetic DSP memory addressing and organization DSP control Zürich zip and Eindhoven zip REAL and Motorola DSP programming FIR on a Vector DSP FIR in VLSI or on an FPGA 19

20 Typical DSP algorithms: FIR Filters Filters reduce signal noise and enhance image or signal quality by removing unwanted frequencies. Finite Impulse Response (FIR) filters compute y[i] : y( i) = h( k) x( i k ) = h( n) * k = 0 where x is the input sequence y is the output sequence h is the impulse response (filter coefficients) N is the number of taps (coefficients) in the filter N 1 x( n) Output sequence depends only on input sequence and impulse response. 20

21 FIR filter in ANSI C #define N 16 int X[N]; int C[N]; int sum; int FIRstep(b) int b; tap { int s=0; int i; for (i=0; i<n; i++){s=s + C[i]*X[(b+i)%N]; } return s; } main() { int base=0; Cinit(); while (1){scanf ("%d",&x[base]); sum=firstep(base); printf("%d\n",sum); base=(base+1)%n; } } 21

22 FIRSTEP in MIPS assembler (from C-compiler) # 16 s=s + C[i]*X[(b+i)%N]; lw $14, 0($sp) addu $15, $4, $14 rem $24, $15, 8 mul $25, $24, 4 la $8, X addu $9, $25, $8 lw $10, 0($9) mul $11, $14, 4 la $12, C addu $13, $11, $12 lw $15, 0($13) mul $24, $10, $15 lw $25, 4($sp) addu $8, $24, $25 sw $8, 4($sp) # 15 for (i=0; i<n; i++){ sw $0, 0($sp) lw $9, 0($sp) addu $14, $9, 1 sw $14, 0($sp) blt $14, 8, $33 # 16 s=s + C[i]*X[(b+i)%N]; # 17 } # 18 return s; lw $2, 4($sp) addu $sp, 8 22

23 FIR filter on a MIPS micro processor 15 instructions per tap per sample of which 7 load/store 2-cycle instructions 22 clock cycles/tap What does this mean? How to appreciate this? A brush up on Computer architecture: Computer Architecture, Hennessey and Patterson 23

24 Instruction Set Architecture An Instruction Set Architecture (ISA) = interface between hardware and software. Hence, a good ISA: allows easy programming (compilers, OS,..); allows efficient implementations (hardware); has a long lifetime (survives many HW generations); is general purpose. 24

25 Reduced Instruction Set Computer 1980: Patterson and Ditzel: The Case for RISC fixed 32-bit instruction set, with few formats load-store architecture large register bank (32 registers), all general purpose On processor organization: hard-wired decode logic pipelined execution single clock-cycle execution 25

26 DLX [MIPS-like] instruction formats 31 26, 25 21, 20 16, 15 11, 10 0 R-type Opcode rs1 rs2 rd function Reg-reg ALU operations I-type Opcode rs1 rd Immediate loads, stores, conditional branch,.. J-type Opcode offset Jump, jump and link, trap, return from exception 26

27 Example DLX instructions Example instruction name meaning LW I R1, 30(R2) Load Word Reg[R1] := Mem[30 + Reg[R2]] SW I 500 (R4), R3 Store Word Mem[500+Reg[R4]] := Reg[R3] ADD R R1, R2, R3 Add Reg[R1] := Reg[R2] + Reg[R3] ADDI I R1, R2, #3 Add Immediate Reg[R1] := Reg[R2] + 3 BEQZ I R4, imm. Branch Equal 0 if Reg[R4] = 0 then pc:= imm. fi J J offset Jump pc := offset 27

28 DLX instruction mixes instruction SPECInt92 average freq. [%] load 26 cond branch 16 add 14 compare 13 store 9 or 5 shift 4 load imm. 3 subtotal 90 [from H&P, Figs 2.26, 2.27] instruction SPECfp92 average freq. [%] load FP 23 mul FP 13 add 14 store FP 9 cond branch 8 add FP 8 sub FP 6 compare FP 6 subtotal 84 28

29 DLX interface, state Instruction memory address instruct. pc DLX CPU r0 r1 r2 Reg r31 address data r/w Mem (Data memory) clock interrupt 29

30 DLX: 5-step sequential execution stage IF ID any instruction read instruction update PC (depending on branch condition) read register values, sign-extend of immediate alu instruction load/store instruction branch instruction EX do alu operation compute address compute branch condition MM read data memory or write data memory WB write back in REG write back in REG ALU result loaded value 30

31 DLX: 5-step sequential execution IF ID EX MM WB 4 npc 0? cond pc Instr. mem ir Reg A B aluo Mem lmd Imm 31

32 DLX: pipelined execution Program execution [instructions] Time [in clock cycles] IF ID EX MM WB IF ID EX MM WB IF ID EX MM WB IF ID EX MM WB IF ID EX MM IF ID EX NB. This ignores dependencies among successive instructions 32

33 DLX: pipelined execution Instruction Fetch Inst.Decode EXecute Memory Write 4 0? Back Mem Reg Instr. mem pc 33

34 FIRSTEP in MIPS assembler # 16 s=s + C[i]*X[(b+i)%N]; lw $14, 0($sp) addu $15, $4, $14 rem $24, $15, 8 mul $25, $24, 4 la $8, X addu $9, $25, $8 lw $10, 0($9) mul $11, $14, 4 la $12, C addu $13, $11, $12 lw $15, 0($13) mul $24, $10, $15 lw $25, 4($sp) addu $8, $24, $25 sw $8, 4($sp) # 15 for (i=0; i<n; i++){ sw $0, 0($sp) lw $9, 0($sp) addu $14, $9, 1 sw $14, 0($sp) blt $14, 8, $33 # 16 s=s + C[i]*X[(b+i)%N]; # 17 } # 18 return s; lw $2, 4($sp) addu $sp, 8 34

35 FIR on DLX: manually optimized assembly # 16 s=s + C[i]*X[(b+i)%N]; pseudo assembly $33: addu $12, $8, $10 ca := &C + ci addu $13, $9, $11 xa := &C + ci lw $14, 0($12) cv := M[ca] lw $15, 0($13) xv := M[xa] mul $16, $14, $15 p := cv xv addu $17, $17, $16 s :=s+p add $10, $10, 4 ci :=ci+4 add $11, $11, 4 xi :=xi+4 blt $11, 40, $34 if xi<40 goto $34 addi $11, $0, 0 xi := 0 $34 blt $10, 40, $33 if ci<40 goto $33 sw $17, 4($sp) M[$sp+4] := s 35

36 FIR on DLX: manually optimized assembly 11 instructions per tap 2 loads, 0 stores (load/store takes 2 cycles) 2 branches (branch-delay slots can be used) (from 22 clock cycles to) 13 clock cycles Aim of classical DSPs: reduce cycle count to 1 clock cycle on a fully programmable processor 36

37 Basic features of DSP processors Compared to a micro processor, a DSP has fixed/floating point arithmetic fast multiply accumulate specialized addressing modes multiple-access memory architecture specialized program control DSP ICs (versus embedded DSPs) also have: On-chip peripherals and I/O devices 37

38 Fixed-point arithmetic Representation for fractions in the range [-1.. 1). sign bit radix point bit weights Examples: = = = = = =

39 Fixed-point arithmetic (cntd) Fixed point word size: typically 16 or 24 bits. Hardware for integer and fixed-point arithmetic is very similar (details on multiplication differ). Support for saturation, rounding, scaling, etc. Most DSPs support both integer and fixed-point; some also support floating point arithmetic. Fixed-point is the norm for consumer applications; algorithms are often converted from floating to fixed point. 39

40 Multiply Accumulate instruction operand registers x0 x1 y0 y1 shifter alu Accumulator A Accumulator B Instruction A := A + x y in a single clock cycle 40

41 Multiply-Accumulate: FS: s := 0 ci:= 0 xi:= 4*b $33 ca:= &C+ci xa:= &X+xi cv:= M[ca] xv:= M[xa] s :=s+cv xv ci:=ci+4 xi:=xi+4 if xi<40 then goto $34 xi:=0 $34 if ci<40 then goto $33 41

42 Memory addressing Specialized addressing modes to support: register-indirect addressing with post increment, post decrement modulo N addressing: start address must be a power of 2; N must be stored in a dedicated modulo register. 42

43 Memory addressing: bit words, post increment, modulo xa:= &X {must be a 16 fold} xm:= 10 {modulo register} FS: s := 0 ci:= 0 ca:= &C $33 cv:= M[ca], ca:=ca+1 xv:= M[xa], xa:=(xa+1) mod xm s :=s+cv xv ci:=ci+1 $34 if ci<10 then goto $33 xa:=(xa+1)mod xm 43

44 Memory Organizations Processor address bus data/instruction bus Memory Von Neumann architecture Processor address bus 1 data bus address bus 2 instruction bus D Memory I Memory Harvard architecture 44

45 Memory Organizations (cntd) Dual Harvard (or Modified Harvard): 1 instruction bus + 2 data buses (named X, Y). X,Y buses connected to separate memories (RAM). More difficult to program: programmer must specify which RAM for each variable. Instruction bus to program memory (ROM, Flash). 45

46 Dual Harvard Memory: 7 4 xa:= &X {must be a 16 fold} xm:= 10 {modulo register} FS: ci:= 0 ca:= &C $33 cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1)mod xm s :=s+cv xv ci:=ci+1 $34 if ci<10 then goto $33 xa:=(xa+1)mod xm 46

47 Control: zero-overhead looping Zero-overhead looping (hardware looping): repeat L instructions N times: sometimes L restricted to 1; usually N < 2 16 ; usually some form of nesting allowed; N must be stored in a dedicated modulo register. 47

48 Zero-overhead looping : 4 2 xa:= &X xm:= 10 {must be a 16-fold} {modulo register} FS: s := 0 ca:= &C repeat 10 times { cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1)mod xm s :=s+cv*xv } xa:=(xa+1)mod xm 48

49 Pipelining: 2 1 xa:= &X xm:= 10 {must be a 16-fold} {modulo register} FS: s := 0 ca:= &C cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1)mod xm repeat 9 times { s:= s+cv xv, cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1) mod xm } s:= s+cv xv 49

50 Programming a DSP Today: usually in assembler. Most popular high-level language: C. However, C lacks common DSP data types (fixed point, complex numbers). Compilers are rather inefficient (see next slide). Emerging practice: use optimized hand-coded kernels (e.g. from libraries); use compiled code for non-critical parts; use Real-Time Operating Systems to schedule multiple tasks. 50

51 DSPs and compilers DSP architectures are compiler unfriendly: multiple memory spaces, small number of dedicated registers, non-orthogonal instruction sets, no hardware support for stacks. Furthermore, it is hard to make good use of: multi-operation instructions, parallel data moves, hardware looping, small local data memories. New DSPs, based on Very Long Instruction Words(VLIW) more compiler friendly at the expense of area and code size. 51

52 SIMD parallelism (Vector processing) A 1 A 2 A 3 A 4 A 5 A 6 A 7... A N B 1 B 2 B 3 B 4 B 5 B 6 B 7... B N ADD A 1 +B 1 A 2 +B 2 A 3 +B 3 A 4 +B 4 A 5 +B 5 A 6 +B 6 A 7 +B 7... A N +B N SIMD = Single Instruction stream, Multiple Data stream + 1 {program memory, instruction decoder, L1 controller) (no/less SRAM fragmentation) + simple single-thread program model (e.g. task switch)? less general (how much SIMD parallelism in application?)? suffers from Amdahl s Law? Efficient! Flexible? 52

53 Amdahl s Law overall speedup (S) S= (1 - f + f/p) -1 P=32 easy, small benefit desired, feasible?? fraction vectorized (f) 53

54 EVP = (scalar + vector ) VLIW vector memory VLIW parallelism mul shuffle scalar vector parallelism operations/clock cycle: 50 typical 100 max 54

55 EVP architecture [ST-Ericsson] program memory ACU VLIW controller 16 words wide vector memory vector registers [16] Load/Store Unit ALU MAC/Shift Unit Shuffle Unit Intra-Vector Unit 1 word wide scalar regs[32] Load/Store U ALU MAC U VLIW EVP 16 in 90 nm CMOS: 600 k gates 2.5 mm MHz (worst case) 0.5 mw/mhz core only 1 mw/mhz typ memory configuration Code-Generation Unit AXU 55

56 Many algorithms can be vectorized 56 Communication algorithms rake receiver UMTS acquisition cordic (I)FFT Fast Hadamard Transform OFDM symbol (de)mapping 16 QAM equalization symbol-timing estimation interference cancellation Viterbi decoder etc. Performance typically scales well with vector size Media algorithms DCT SAD (incl. bilinear interpol.) motion estimation (feasible) video scaling vertical peaking disparity matching for mobile RGB2YUV, RGB rendering color segmentation noise filtering (morphology) object filtering (i.e. aspect ratio) color interpolation etc.

57 hardware FIR filters x1 c1 c2 tap cn-1.. y D + D. + D + k x2 X N-1 x3... cn x N Depending on performance requirements: (up to) N multiplier-accumulators Hence N taps per clock cycle 57

58 hardware FIR filters x1 c1 c2 tap cn-1.. y D + D. + D + k x2 X N-1 x3... cn x N Depending on performance requirements: Compute M output samples in parallel (block processing) M outputs per clock cycle M N taps per clock cycle 58

59 FIR taps per clock cycle VLSI FPGA reconfigurable HW vector DSP Conventional DSP embedded CPU micro processor 0.01 (hardwired) specific generic (programmable) 59

60 System = HW + vector processor + DSP + GP load [ops] 100G 10G 1G hardware efficiency (area, power) vector processor application scope, generality DSP function migration [over time] micro-controller 100M 100 1k 10k 100k 1M 10M code size [Bytes] 100M 60

61 DSP references Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science Richard G. Lyons. Understanding Digital Signal Processing (2 nd edition). Prentice Hall John G. Proakis and Dimitris K Manolakis. Digital Signal Processing (4th edition), Prentice Hall, Simon Haykin. Neural Networks, a Comprehensive Foundation (2 nd edition). Prentice Hall

62 Computer Architecture and DSP references Hennessy and Patterson, Computer Architecture, a Quantitative Approach. 6th edition. Morgan Kaufmann, Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSP Processor Fundamentals, Berkeley Design Technology, Inc, Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEE Signal Processing Magazine, Kees van Berkel et al. Vector Processing as an Enabler for Software-Defined Radio in Handheld Devices, EURASIP Journal on Applied Signal Processing 2005:16,

63 2IN35: reporting guidelines 2013 (1) 1. Submit one report per team (1 or 2 students) 2. Respect deadlines: Assignment L3: Tuesday June 4, 2013 Assignment L4: Tuesday June 11, 2013 Assignment L5: Tuesday June 18, Make sure that assignments L3, L4, and L5 are demonstrated to and signed of by Alok, Hrishikesh, Rudolf, or Kees. 4. Submit two printed copies on paper (electronic copies will not be accepted). 5. Report on lab assignments L3, L4, and L5. 63

64 2IN35: reporting guidelines 2013 (2) General guidelines (each assignments), to be followed strictly: 6. Analyze the specifications and requirements. 7. Present/motivate key ideas/decisions, design options, alternatives, trade-offs. 8. Draw architecture block diagram (= picture!). 9. Explain functional correctness of your Verilog programs (include your complete Verilog programs in an appendix). 10. Report, analyze & explain FPGA-resource usage and utilization {#multipliers, #BRAMS, #LUTs} in relation to your design. 11. Report, analyze & explain (min) sample time T s and (max) sample frequency f s, both after synthesis and after placement & routing. 12. Include simulation results: both wave forms in time domain, and in frequency domain (apply FFT) (assignments 3 and 4 only). 13. Include answers to the inline questions 64

65 THANK YOU

VLSI Programming 2016: Lecture 3

VLSI Programming 2016: Lecture 3 VLSI Programming 2016: Lecture 3 Course: 2IMN35 Teachers: Kees van Berkel c.h.v.berkel@tue.nl Rudolf Mak r.h.mak@tue.nl Lab: Kees van Berkel, Rudolf Mak, Alok Lele www: http://www.win.tue.nl/~wsinmak/education/2imn35/

More information

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology Computer Organization MIPS Architecture Department of Computer Science Missouri University of Science & Technology hurson@mst.edu Computer Organization Note, this unit will be covered in three lectures.

More information

DSP Processors Lecture 13

DSP Processors Lecture 13 DSP Processors Lecture 13 Ingrid Verbauwhede Department of Electrical Engineering University of California Los Angeles ingrid@ee.ucla.edu 1 References The origins: E.A. Lee, Programmable DSP Processors,

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Pipelining. CS701 High Performance Computing

Pipelining. CS701 High Performance Computing Pipelining CS701 High Performance Computing Student Presentation 1 Two 20 minute presentations Burks, Goldstine, von Neumann. Preliminary Discussion of the Logical Design of an Electronic Computing Instrument.

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1 55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

ISA and RISCV. CASS 2018 Lavanya Ramapantulu ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS

More information

A Model RISC Processor. DLX Architecture

A Model RISC Processor. DLX Architecture DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register

More information

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design 1 TDT4255 Computer Design Lecture 4 Magnus Jahre 2 Outline Chapter 4.1 to 4.4 A Multi-cycle Processor Appendix D 3 Chapter 4 The Processor Acknowledgement: Slides are adapted from Morgan Kaufmann companion

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Systems Architecture

Systems Architecture Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

Laboratory Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction

Laboratory Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction Laboratory 6 6. Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction 6.1. Objectives Study, design, implement and test Instruction Decode Unit for the 16-bit Single-Cycle MIPS

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 2 Lecture 1 Computer Systems Organization

Chapter 2 Lecture 1 Computer Systems Organization Chapter 2 Lecture 1 Computer Systems Organization This chapter provides an introduction to the components Processors: Primary Memory: Secondary Memory: Input/Output: Busses The Central Processing Unit

More information

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may

More information

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

More information

Control Instructions

Control Instructions Control Instructions Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Instruction Set

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

COE608: Computer Organization and Architecture

COE608: Computer Organization and Architecture Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More

More information

Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630

Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630 Computers and Microprocessors Lecture 34 PHYS3360/AEP3630 1 Contents Computer architecture / experiment control Microprocessor organization Basic computer components Memory modes for x86 series of microprocessors

More information

Implementation of DSP Algorithms

Implementation of DSP Algorithms Implementation of DSP Algorithms Main frame computers Dedicated (application specific) architectures Programmable digital signal processors voice band data modem speech codec 1 PDSP and General-Purpose

More information

CSE A215 Assembly Language Programming for Engineers

CSE A215 Assembly Language Programming for Engineers CSE A215 Assembly Language Programming for Engineers Lecture 7 MIPS vs. ARM (COD Chapter 2 and Exam #1 Review) October 12, 2012 Sam Siewert Comparison of MIPS32 and ARM Instruction Formats and Addressing

More information

The Single Cycle Processor

The Single Cycle Processor EECS 322 Computer Architecture The Single Cycle Processor Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow CWRU

More information

Systems Architecture I

Systems Architecture I Systems Architecture I Topics Assemblers, Linkers, and Loaders * Alternative Instruction Sets ** *This lecture was derived from material in the text (sec. 3.8-3.9). **This lecture was derived from material

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

ECE232: Hardware Organization and Design. Computer Organization - Previously covered ECE232: Hardware Organization and Design Part 6: MIPS Instructions II http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Computer Organization

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 15: Midterm 1 Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Basics Midterm to cover Book Sections (inclusive) 1.1 1.5

More information

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary Control Instructions Computer Organization Architectures for Embedded Computing Thursday, 26 September 2013 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1 Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later

More information

Programmable Machines

Programmable Machines Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational

More information

Programmable Machines

Programmable Machines Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational

More information

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Computer Architecture and IC Design Lab Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

General Purpose Processors

General Purpose Processors Calcolatori Elettronici e Sistemi Operativi Specifications Device that executes a program General Purpose Processors Program list of instructions Instructions are stored in an external memory Stored program

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching

More information

Instructions: Language of the Computer

Instructions: Language of the Computer CS359: Computer Architecture Instructions: Language of the Computer Yanyan Shen Department of Computer Science and Engineering 1 The Language a Computer Understands Word a computer understands: instruction

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Efficient filtering with the Co-Vector Processor

Efficient filtering with the Co-Vector Processor Efficient filtering with the Co-Vector Processor BL Dang yy, Nur Engin y, GN Gaydadjiev yy y Philips Research Laboratories, Eindhoven,The Netherlands yy EEMCS Faculty, Delft University of Technology, The

More information

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. See P&H Chapter: 2.16-2.20, 4.1-4.4,

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING SASE 2010 Universidad Tecnológica Nacional - FRBA Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds? Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide

More information

Introduction to Field Programmable Gate Arrays

Introduction to Field Programmable Gate Arrays Introduction to Field Programmable Gate Arrays Lecture 2/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Digital Signal

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture Computer Science 324 Computer Architecture Mount Holyoke College Fall 2009 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture. Idea:

More information

Computer Architecture

Computer Architecture CS3350B Computer Architecture Winter 2015 Lecture 4.2: MIPS ISA -- Instruction Representation Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design,

More information

EC-801 Advanced Computer Architecture

EC-801 Advanced Computer Architecture EC-801 Advanced Computer Architecture Lecture 5 Instruction Set Architecture I Dr Hashim Ali Fall 2018 Department of Computer Science and Engineering HITEC University Taxila!1 Instruction Set Architecture

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

CS3350B Computer Architecture Winter 2015

CS3350B Computer Architecture Winter 2015 CS3350B Computer Architecture Winter 2015 Lecture 5.5: Single-Cycle CPU Datapath Design Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design ECE 1160/2160 Embedded Systems Design Midterm Review Wei Gao ECE 1160/2160 Embedded Systems Design 1 Midterm Exam When: next Monday (10/16) 4:30-5:45pm Where: Benedum G26 15% of your final grade What about:

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instructions and Addressing 1 ISA vs. Microarchitecture An ISA or Instruction Set Architecture describes the aspects

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

CN310 Microprocessor Systems Design

CN310 Microprocessor Systems Design CN310 Microprocessor Systems Design Micro Architecture Nawin Somyat Department of Electrical and Computer Engineering Thammasat University 28 August 2018 Outline Course Contents 1 Introduction 2 Simple

More information