VLSI Programming 2016: Lecture 6
|
|
- Solomon Curtis
- 5 years ago
- Views:
Transcription
1 VLSI Programming 2016: Lecture 6 Course: 2IMN35 Teachers: Kees van Berkel c.h.v.berkel@tue.nl Rudolf Mak r.h.mak@tue.nl Lab: Kees van Berkel, Rudolf Mak, Alok Lele www: Lecture 6 T3, T4, digital signal processors 1
2 VLSI Programming (2IMN35): time table in Tue: h5-h8; MF.07 out 2016 in Thu: h1-h4; Gemini-Z3A-08/10/13 out 19-Apr introduc/on, DSP graphs, bounds, 21-Apr pipelining, re/ming, transposi/on, J-slow, unfolding 26-Apr tools Introduc/ons to L1: audio filter L1 28-Apr T1 unfolding, look-ahead, L1 cntd installed FPGA and Verilog simula/on L2 + T2 strength reduc/on 3-May folding L2: audio filter 5-May on XUP board 10-May T3 + T4 DSP processors L2 cntd L3 12-May L3: sequen/al FIR + strength-reduced FIR 17-May L3 cntd 19-May L3 cntd L4 24-May systolic computa/on T5 26-May L3 L4 31-May T5 L4: 2-Jun L4 cntd L5 audio sample rate convertor 7-Jun L5: 1024x audio sample rate convertor 9-Jun L4 L5 cntd 14-Jun 16-Jun L5 deadline report L5 T1 + T2 T3 + T4 2
3 Outline Lecture 6 T3, T4 The SW-HW performance spectrum an architecture-morphing exercise Mandatory reading (reminder): Edward A. Lee and David G. Messerschmitt. Synchronous Data Flow. Proc. of the IEEE, Vol. 75, No. 9, Sept 1987, pp
4 T3: Parallel IIR assignment Consider IIR: y(n) = x(n) + a*y(n-2) Assume add and multiply time: 2 and 5 nsec resp. 1. Derive parallel look-ahead IIR, L=4, 2. Pipeline and retime for maximal throughput using a minimum number of D-elements. 3. Include throughput and latency calculation. Return deadline: Tuesday May 10 4
5 IIR assignment 4 (from lecture 3) 4. pipeline and retime unfolded IIR; draw DFG; throughput? y(2(k-1)) no pipelining possible a D same DFG x(2k) + y(2k) y(2(k-1)+1) same troughput! a D f sample 2 T M +T A = 2 7 x(2k+1) = 286 MHz + y(2k+1) 5
6 Parallel IIR assignment unrolling 3x (n n+1) note: rewrite a la Parhi: u(n) = x(n+2) y(n+2) = a*y(n) + u(n) y(n+3) = a*y(n+1) + u(n+1) y(n+4) = a*y(n+2) + u(n+2) = a 2 *y(n) + a*u(n) + u(n+2) y(n+5) = a*y(n+3) + u(n+3) = a 2 *y(n+1) + a*u(n+1) + u(n+3) unfolding (L=4: n 4k) y(4k+2) = a *y(4k) + u(4k) y(4k+4) = a 2 *y(4k) + a*u(4k) + u(4k+2) y(4k+3) = a *y(4k+1) + u(4k+1) y(4k+5) = a 2 *y(4k+1) + a*u(4k+1) + u(4k+3) 6
7 Parallel IIR assignment u(4k+2) a u(4k) a 2 f 4 + 2T sample = = 444 TM A 9 4 MHz y(4k) u(4k+3) u(4k+1) a 2 y(4k+2) a y(4k+1) y(4k+5) y(4k+3) 7
8 Parallel IIR assignment u(4k+2) a u(4k) a 2 f sample 4 T M +T A = 4 7 = 571 MHz u(4k+3) a u(4k+1) a 2 y(4k+5) y(4k) -4) y(4k+2) -4) +4 D elements y(4k+1) -4) y(4k+3) -4) 8
9 Parallel IIR assignment 2 slow! u(4k+3) u(4k+2) u(4k+1) u(4k) a 2 f 2 + 0T sample = = 400 TM A 5 2 MHz a y(4k) y(4k+2) y(4k+1) y(4k+3) 9
10 T4: Strength-reduced FIR assignment Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-6) + d*x(n-7) Assume add and multiply times: 2 and 5 nsec resp. 1. Draw DFG of FIR, calculate throughput. 2. Apply strength reduction, L=2. 3. Pipeline and retime for maximal throughput using a minimum number of D-elements. 4. Include throughput and latency calculation. Return deadline: Tuesday May 10 10
11 Assignment T4: Strength-reduced FIR 1a Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-6) + d*x(n-7) Assume add and multiply times: 2 and 5 nsec resp. 1. Draw DFG of FIR, calculate throughput. Transposed form (for high throughput): d c b a f sample 5D 1 = 1 T M +T A 5+ 2 = 1 7 =143 MHz 11
12 Assignment T4: Strength-reduced FIR 1b 2. Pipeline and retime FIR for maximal throughput.. d c b a 5D -1) f 1 + 0T sample = = = 200 TM A MHz +4 D elements 12
13 Assignment T4: Strength-reduced FIR 2 y(n) = a*x(n) + b*x(n-1) + c*x(n-6) + d*x(n-7) y(2k) = a*x(2k) + b*x(2k-1) + c*x(2k-6) + d*x(2k-7) = a*x(2k) + b*x(2(k-1) +1) + c*x(2(k-3)) + d*x(2(k-4)+1) y(2k+1) = a*x(2k+1) + b*x(2k) + c*x(2k-5) + d*x(2k-6) = a*x(2k+1) + b*x(2k) + c*x(2(k-3)+1) + d*x(2(k-3)) y(2k) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + b*[ x(2(k-1)+1)) - x(2k) ] + d*[ x(2(k-4)+1)) - x(2(k-3)) ] y(2k+1) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + a*[x(2k+1) - x(2k) ] + c*[x(2(k-3) +1) - x(2(k-3)) ] 13
14 Assignment T4: Strength-reduced FIR 2 y(2k) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + b*[ x(2(k-1)+1)) - x(2k) ] + d*[ x(2(k-4)+1)) - x(2(k-3)) ] y(2k+1) = (a+b) *x(2k) + (c+d) *x(2(k-3)) + a*[x(2k+1) - x(2k) ] + c*[x(2(k-3) +1) - x(2(k-3)) ] When we assume (a+b) and (c+d) are pre-computed constants: 3 2 multipliers for sub-firs 3 adders for sub-firs 2 adders +2 subtractors strength reduction overhead = 6 multipliers + 7 adds/subs (versus 8 multipliers + 6 adds/subs) 14
15 Assignment T4: Strength-reduced FIR 2 x(2k+1) + - c a 3D + + y(2k+1) x(2k) c+d a+b D - + 3D + d b f 2 + 3T sample = = 182 TM A 11 2 MHz 3D + + y(2k) 15
16 Assignment T4: Strength-reduced FIR 3,4 x(2k+1) + - c a Pipelining : +9 D-elements 3D + + y(2k+1-4) x(2k) c+d a+b D - + 3D + d b f 2 + 0T sample = = 400 TM A 5 2 MHz 3D + + y(2k-4) 16
17 Assignment T4: Strength-reduced FIR 3,4 y(2k+1-6) c a 3D x(2k+1) y(2k-6) + + D - c+d a+b + 3D + Transposition: 12 D-elements d b 3D x(2k) 17
18 DIGITAL SIGNAL PROCESSORS 18
19 The SW-HW Spectrum: outline FIR on a microprocessor (MIPS, DLX) FIR on a DSP DSP Arithmetic DSP memory addressing and organization DSP control Zürich zip and Eindhoven zip REAL and Motorola DSP programming FIR on a Vector DSP FIR in VLSI or on an FPGA 19
20 Typical DSP algorithms: FIR Filters Filters reduce signal noise and enhance image or signal quality by removing unwanted frequencies. Finite Impulse Response (FIR) filters compute y[i] : y( i) = h( k) x( i k ) = h( n) * k = 0 where x is the input sequence y is the output sequence h is the impulse response (filter coefficients) N is the number of taps (coefficients) in the filter N 1 x( n) Output sequence depends only on input sequence and impulse response. 20
21 FIR filter in ANSI C #define N 16 int X[N]; int C[N]; int sum; int FIRstep(b) int b; tap { int s=0; int i; for (i=0; i<n; i++){s=s + C[i]*X[(b+i)%N]; } return s; } main() { int base=0; Cinit(); while (1){scanf ("%d",&x[base]); sum=firstep(base); printf("%d\n",sum); base=(base+1)%n; } } 21
22 FIRSTEP in MIPS assembler (from C-compiler) # 16 s=s + C[i]*X[(b+i)%N]; lw $14, 0($sp) addu $15, $4, $14 rem $24, $15, 8 mul $25, $24, 4 la $8, X addu $9, $25, $8 lw $10, 0($9) mul $11, $14, 4 la $12, C addu $13, $11, $12 lw $15, 0($13) mul $24, $10, $15 lw $25, 4($sp) addu $8, $24, $25 sw $8, 4($sp) # 15 for (i=0; i<n; i++){ sw $0, 0($sp) lw $9, 0($sp) addu $14, $9, 1 sw $14, 0($sp) blt $14, 8, $33 # 16 s=s + C[i]*X[(b+i)%N]; # 17 } # 18 return s; lw $2, 4($sp) addu $sp, 8 22
23 FIR filter on a MIPS micro processor 15 instructions per tap per sample of which 7 load/store 2-cycle instructions 22 clock cycles/tap What does this mean? How to appreciate this? A brush up on Computer architecture: Computer Architecture, Hennessey and Patterson 23
24 Instruction Set Architecture An Instruction Set Architecture (ISA) = interface between hardware and software. Hence, a good ISA: allows easy programming (compilers, OS,..); allows efficient implementations (hardware); has a long lifetime (survives many HW generations); is general purpose. 24
25 Reduced Instruction Set Computer 1980: Patterson and Ditzel: The Case for RISC fixed 32-bit instruction set, with few formats load-store architecture large register bank (32 registers), all general purpose On processor organization: hard-wired decode logic pipelined execution single clock-cycle execution 25
26 DLX [MIPS-like] instruction formats 31 26, 25 21, 20 16, 15 11, 10 0 R-type Opcode rs1 rs2 rd function Reg-reg ALU operations I-type Opcode rs1 rd Immediate loads, stores, conditional branch,.. J-type Opcode offset Jump, jump and link, trap, return from exception 26
27 Example DLX instructions Example instruction name meaning LW I R1, 30(R2) Load Word Reg[R1] := Mem[30 + Reg[R2]] SW I 500 (R4), R3 Store Word Mem[500+Reg[R4]] := Reg[R3] ADD R R1, R2, R3 Add Reg[R1] := Reg[R2] + Reg[R3] ADDI I R1, R2, #3 Add Immediate Reg[R1] := Reg[R2] + 3 BEQZ I R4, imm. Branch Equal 0 if Reg[R4] = 0 then pc:= imm. fi J J offset Jump pc := offset 27
28 DLX instruction mixes instruction SPECInt92 average freq. [%] load 26 cond branch 16 add 14 compare 13 store 9 or 5 shift 4 load imm. 3 subtotal 90 [from H&P, Figs 2.26, 2.27] instruction SPECfp92 average freq. [%] load FP 23 mul FP 13 add 14 store FP 9 cond branch 8 add FP 8 sub FP 6 compare FP 6 subtotal 84 28
29 DLX interface, state Instruction memory address instruct. pc DLX CPU r0 r1 r2 Reg r31 address data r/w Mem (Data memory) clock interrupt 29
30 DLX: 5-step sequential execution stage IF ID any instruction read instruction update PC (depending on branch condition) read register values, sign-extend of immediate alu instruction load/store instruction branch instruction EX do alu operation compute address compute branch condition MM read data memory or write data memory WB write back in REG write back in REG ALU result loaded value 30
31 DLX: 5-step sequential execution IF ID EX MM WB 4 npc 0? cond pc Instr. mem ir Reg A B aluo Mem lmd Imm 31
32 DLX: pipelined execution Program execution [instructions] Time [in clock cycles] IF ID EX MM WB IF ID EX MM WB IF ID EX MM WB IF ID EX MM WB IF ID EX MM IF ID EX NB. This ignores dependencies among successive instructions 32
33 DLX: pipelined execution Instruction Fetch Inst.Decode EXecute Memory Write 4 0? Back Mem Reg Instr. mem pc 33
34 FIRSTEP in MIPS assembler # 16 s=s + C[i]*X[(b+i)%N]; lw $14, 0($sp) addu $15, $4, $14 rem $24, $15, 8 mul $25, $24, 4 la $8, X addu $9, $25, $8 lw $10, 0($9) mul $11, $14, 4 la $12, C addu $13, $11, $12 lw $15, 0($13) mul $24, $10, $15 lw $25, 4($sp) addu $8, $24, $25 sw $8, 4($sp) # 15 for (i=0; i<n; i++){ sw $0, 0($sp) lw $9, 0($sp) addu $14, $9, 1 sw $14, 0($sp) blt $14, 8, $33 # 16 s=s + C[i]*X[(b+i)%N]; # 17 } # 18 return s; lw $2, 4($sp) addu $sp, 8 34
35 FIR on DLX: manually optimized assembly # 16 s=s + C[i]*X[(b+i)%N]; pseudo assembly $33: addu $12, $8, $10 ca := &C + ci addu $13, $9, $11 xa := &C + ci lw $14, 0($12) cv := M[ca] lw $15, 0($13) xv := M[xa] mul $16, $14, $15 p := cv xv addu $17, $17, $16 s :=s+p add $10, $10, 4 ci :=ci+4 add $11, $11, 4 xi :=xi+4 blt $11, 40, $34 if xi<40 goto $34 addi $11, $0, 0 xi := 0 $34 blt $10, 40, $33 if ci<40 goto $33 sw $17, 4($sp) M[$sp+4] := s 35
36 FIR on DLX: manually optimized assembly 11 instructions per tap 2 loads, 0 stores (load/store takes 2 cycles) 2 branches (branch-delay slots can be used) (from 22 clock cycles to) 13 clock cycles Aim of classical DSPs: reduce cycle count to 1 clock cycle on a fully programmable processor 36
37 Basic features of DSP processors Compared to a micro processor, a DSP has fixed/floating point arithmetic fast multiply accumulate specialized addressing modes multiple-access memory architecture specialized program control DSP ICs (versus embedded DSPs) also have: On-chip peripherals and I/O devices 37
38 Fixed-point arithmetic Representation for fractions in the range [-1.. 1). sign bit radix point bit weights Examples: = = = = = =
39 Fixed-point arithmetic (cntd) Fixed point word size: typically 16 or 24 bits. Hardware for integer and fixed-point arithmetic is very similar (details on multiplication differ). Support for saturation, rounding, scaling, etc. Most DSPs support both integer and fixed-point; some also support floating point arithmetic. Fixed-point is the norm for consumer applications; algorithms are often converted from floating to fixed point. 39
40 Multiply Accumulate instruction operand registers x0 x1 y0 y1 shifter alu Accumulator A Accumulator B Instruction A := A + x y in a single clock cycle 40
41 Multiply-Accumulate: FS: s := 0 ci:= 0 xi:= 4*b $33 ca:= &C+ci xa:= &X+xi cv:= M[ca] xv:= M[xa] s :=s+cv xv ci:=ci+4 xi:=xi+4 if xi<40 then goto $34 xi:=0 $34 if ci<40 then goto $33 41
42 Memory addressing Specialized addressing modes to support: register-indirect addressing with post increment, post decrement modulo N addressing: start address must be a power of 2; N must be stored in a dedicated modulo register. 42
43 Memory addressing: bit words, post increment, modulo xa:= &X {must be a 16 fold} xm:= 10 {modulo register} FS: s := 0 ci:= 0 ca:= &C $33 cv:= M[ca], ca:=ca+1 xv:= M[xa], xa:=(xa+1) mod xm s :=s+cv xv ci:=ci+1 $34 if ci<10 then goto $33 xa:=(xa+1)mod xm 43
44 Memory Organizations Processor address bus data/instruction bus Memory Von Neumann architecture Processor address bus 1 data bus address bus 2 instruction bus D Memory I Memory Harvard architecture 44
45 Memory Organizations (cntd) Dual Harvard (or Modified Harvard): 1 instruction bus + 2 data buses (named X, Y). X,Y buses connected to separate memories (RAM). More difficult to program: programmer must specify which RAM for each variable. Instruction bus to program memory (ROM, Flash). 45
46 Dual Harvard Memory: 7 4 xa:= &X {must be a 16 fold} xm:= 10 {modulo register} FS: ci:= 0 ca:= &C $33 cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1)mod xm s :=s+cv xv ci:=ci+1 $34 if ci<10 then goto $33 xa:=(xa+1)mod xm 46
47 Control: zero-overhead looping Zero-overhead looping (hardware looping): repeat L instructions N times: sometimes L restricted to 1; usually N < 2 16 ; usually some form of nesting allowed; N must be stored in a dedicated modulo register. 47
48 Zero-overhead looping : 4 2 xa:= &X xm:= 10 {must be a 16-fold} {modulo register} FS: s := 0 ca:= &C repeat 10 times { cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1)mod xm s :=s+cv*xv } xa:=(xa+1)mod xm 48
49 Pipelining: 2 1 xa:= &X xm:= 10 {must be a 16-fold} {modulo register} FS: s := 0 ca:= &C cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1)mod xm repeat 9 times { s:= s+cv xv, cv:= Y[ca], ca:=ca+1, xv:= X[xa], xa:=(xa+1) mod xm } s:= s+cv xv 49
50 Programming a DSP Today: usually in assembler. Most popular high-level language: C. However, C lacks common DSP data types (fixed point, complex numbers). Compilers are rather inefficient (see next slide). Emerging practice: use optimized hand-coded kernels (e.g. from libraries); use compiled code for non-critical parts; use Real-Time Operating Systems to schedule multiple tasks. 50
51 DSPs and compilers DSP architectures are compiler unfriendly: multiple memory spaces, small number of dedicated registers, non-orthogonal instruction sets, no hardware support for stacks. Furthermore, it is hard to make good use of: multi-operation instructions, parallel data moves, hardware looping, small local data memories. New DSPs, based on Very Long Instruction Words(VLIW) more compiler friendly at the expense of area and code size. 51
52 SIMD parallelism (Vector processing) A 1 A 2 A 3 A 4 A 5 A 6 A 7... A N B 1 B 2 B 3 B 4 B 5 B 6 B 7... B N ADD A 1 +B 1 A 2 +B 2 A 3 +B 3 A 4 +B 4 A 5 +B 5 A 6 +B 6 A 7 +B 7... A N +B N SIMD = Single Instruction stream, Multiple Data stream + 1 {program memory, instruction decoder, L1 controller) (no/less SRAM fragmentation) + simple single-thread program model (e.g. task switch)? less general (how much SIMD parallelism in application?)? suffers from Amdahl s Law? Efficient! Flexible? 52
53 Amdahl s Law overall speedup (S) S= (1 - f + f/p) -1 P=32 easy, small benefit desired, feasible?? fraction vectorized (f) 53
54 EVP = (scalar + vector ) VLIW vector memory VLIW parallelism mul shuffle scalar vector parallelism operations/clock cycle: 50 typical 100 max 54
55 EVP architecture [ST-Ericsson] program memory ACU VLIW controller 16 words wide vector memory vector registers [16] Load/Store Unit ALU MAC/Shift Unit Shuffle Unit Intra-Vector Unit 1 word wide scalar regs[32] Load/Store U ALU MAC U VLIW EVP 16 in 90 nm CMOS: 600 k gates 2.5 mm MHz (worst case) 0.5 mw/mhz core only 1 mw/mhz typ memory configuration Code-Generation Unit AXU 55
56 Many algorithms can be vectorized 56 Communication algorithms rake receiver UMTS acquisition cordic (I)FFT Fast Hadamard Transform OFDM symbol (de)mapping 16 QAM equalization symbol-timing estimation interference cancellation Viterbi decoder etc. Performance typically scales well with vector size Media algorithms DCT SAD (incl. bilinear interpol.) motion estimation (feasible) video scaling vertical peaking disparity matching for mobile RGB2YUV, RGB rendering color segmentation noise filtering (morphology) object filtering (i.e. aspect ratio) color interpolation etc.
57 hardware FIR filters x1 c1 c2 tap cn-1.. y D + D. + D + k x2 X N-1 x3... cn x N Depending on performance requirements: (up to) N multiplier-accumulators Hence N taps per clock cycle 57
58 hardware FIR filters x1 c1 c2 tap cn-1.. y D + D. + D + k x2 X N-1 x3... cn x N Depending on performance requirements: Compute M output samples in parallel (block processing) M outputs per clock cycle M N taps per clock cycle 58
59 FIR taps per clock cycle VLSI FPGA reconfigurable HW vector DSP Conventional DSP embedded CPU micro processor 0.01 (hardwired) specific generic (programmable) 59
60 System = HW + vector processor + DSP + GP load [ops] 100G 10G 1G hardware efficiency (area, power) vector processor application scope, generality DSP function migration [over time] micro-controller 100M 100 1k 10k 100k 1M 10M code size [Bytes] 100M 60
61 DSP references Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science Richard G. Lyons. Understanding Digital Signal Processing (2 nd edition). Prentice Hall John G. Proakis and Dimitris K Manolakis. Digital Signal Processing (4th edition), Prentice Hall, Simon Haykin. Neural Networks, a Comprehensive Foundation (2 nd edition). Prentice Hall
62 Computer Architecture and DSP references Hennessy and Patterson, Computer Architecture, a Quantitative Approach. 6th edition. Morgan Kaufmann, Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSP Processor Fundamentals, Berkeley Design Technology, Inc, Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEE Signal Processing Magazine, Kees van Berkel et al. Vector Processing as an Enabler for Software-Defined Radio in Handheld Devices, EURASIP Journal on Applied Signal Processing 2005:16,
63 2IN35: reporting guidelines 2013 (1) 1. Submit one report per team (1 or 2 students) 2. Respect deadlines: Assignment L3: Tuesday June 4, 2013 Assignment L4: Tuesday June 11, 2013 Assignment L5: Tuesday June 18, Make sure that assignments L3, L4, and L5 are demonstrated to and signed of by Alok, Hrishikesh, Rudolf, or Kees. 4. Submit two printed copies on paper (electronic copies will not be accepted). 5. Report on lab assignments L3, L4, and L5. 63
64 2IN35: reporting guidelines 2013 (2) General guidelines (each assignments), to be followed strictly: 6. Analyze the specifications and requirements. 7. Present/motivate key ideas/decisions, design options, alternatives, trade-offs. 8. Draw architecture block diagram (= picture!). 9. Explain functional correctness of your Verilog programs (include your complete Verilog programs in an appendix). 10. Report, analyze & explain FPGA-resource usage and utilization {#multipliers, #BRAMS, #LUTs} in relation to your design. 11. Report, analyze & explain (min) sample time T s and (max) sample frequency f s, both after synthesis and after placement & routing. 12. Include simulation results: both wave forms in time domain, and in frequency domain (apply FFT) (assignments 3 and 4 only). 13. Include answers to the inline questions 64
65 THANK YOU
VLSI Programming 2016: Lecture 3
VLSI Programming 2016: Lecture 3 Course: 2IMN35 Teachers: Kees van Berkel c.h.v.berkel@tue.nl Rudolf Mak r.h.mak@tue.nl Lab: Kees van Berkel, Rudolf Mak, Alok Lele www: http://www.win.tue.nl/~wsinmak/education/2imn35/
More informationComputer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology
Computer Organization MIPS Architecture Department of Computer Science Missouri University of Science & Technology hurson@mst.edu Computer Organization Note, this unit will be covered in three lectures.
More informationDSP Processors Lecture 13
DSP Processors Lecture 13 Ingrid Verbauwhede Department of Electrical Engineering University of California Los Angeles ingrid@ee.ucla.edu 1 References The origins: E.A. Lee, Programmable DSP Processors,
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA
CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization
CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationDesign of a Pipelined 32 Bit MIPS Processor with Floating Point Unit
Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationThese actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.
MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously
More informationPipelining. CS701 High Performance Computing
Pipelining CS701 High Performance Computing Student Presentation 1 Two 20 minute presentations Burks, Goldstine, von Neumann. Preliminary Discussion of the Logical Design of an Electronic Computing Instrument.
More informationReminder: tutorials start next week!
Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationLecture 4: Instruction Set Architecture
Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationPipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1
55:3/C:60 Spring 00 Pipelined Design Motivation: Increase processor throughput with modest increase in hardware. Bandwidth or Throughput = Performance Pipelined Processors Chapter Bandwidth (BW) = no.
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationISA and RISCV. CASS 2018 Lavanya Ramapantulu
ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program
More informationVIII. DSP Processors. Digital Signal Processing 8 December 24, 2009
Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationComputer Systems Architecture Spring 2016
Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,
More informationMath 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro
Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS
More informationA Model RISC Processor. DLX Architecture
DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register
More informationTDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design
1 TDT4255 Computer Design Lecture 4 Magnus Jahre 2 Outline Chapter 4.1 to 4.4 A Multi-cycle Processor Appendix D 3 Chapter 4 The Processor Acknowledgement: Slides are adapted from Morgan Kaufmann companion
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationSystems Architecture
Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software
More informationVLSI Signal Processing
VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface
More informationLaboratory Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction
Laboratory 6 6. Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction 6.1. Objectives Study, design, implement and test Instruction Decode Unit for the 16-bit Single-Cycle MIPS
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationHi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan
Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationECE 486/586. Computer Architecture. Lecture # 7
ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 2 Lecture 1 Computer Systems Organization
Chapter 2 Lecture 1 Computer Systems Organization This chapter provides an introduction to the components Processors: Primary Memory: Secondary Memory: Input/Output: Busses The Central Processing Unit
More informationCAD for VLSI 2 Pro ject - Superscalar Processor Implementation
CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may
More informationHead, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India
Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control
ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849
More informationControl Instructions
Control Instructions Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Instruction Set
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationCOE608: Computer Organization and Architecture
Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More
More informationComputers and Microprocessors. Lecture 34 PHYS3360/AEP3630
Computers and Microprocessors Lecture 34 PHYS3360/AEP3630 1 Contents Computer architecture / experiment control Microprocessor organization Basic computer components Memory modes for x86 series of microprocessors
More informationImplementation of DSP Algorithms
Implementation of DSP Algorithms Main frame computers Dedicated (application specific) architectures Programmable digital signal processors voice band data modem speech codec 1 PDSP and General-Purpose
More informationCSE A215 Assembly Language Programming for Engineers
CSE A215 Assembly Language Programming for Engineers Lecture 7 MIPS vs. ARM (COD Chapter 2 and Exam #1 Review) October 12, 2012 Sam Siewert Comparison of MIPS32 and ARM Instruction Formats and Addressing
More informationThe Single Cycle Processor
EECS 322 Computer Architecture The Single Cycle Processor Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow CWRU
More informationSystems Architecture I
Systems Architecture I Topics Assemblers, Linkers, and Loaders * Alternative Instruction Sets ** *This lecture was derived from material in the text (sec. 3.8-3.9). **This lecture was derived from material
More informationCS3350B Computer Architecture Quiz 3 March 15, 2018
CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationEITF20: Computer Architecture Part2.1.1: Instruction Set Architecture
EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationECE232: Hardware Organization and Design. Computer Organization - Previously covered
ECE232: Hardware Organization and Design Part 6: MIPS Instructions II http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Computer Organization
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 15: Midterm 1 Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Basics Midterm to cover Book Sections (inclusive) 1.1 1.5
More informationControl Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary
Control Instructions Computer Organization Architectures for Embedded Computing Thursday, 26 September 2013 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,
More informationCS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationProcessor (I) - datapath & control. Hwansoo Han
Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two
More informationAn introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures
An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?
More informationDesign of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1
Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later
More informationProgrammable Machines
Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational
More informationProgrammable Machines
Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational
More informationChapter 4. The Processor. Computer Architecture and IC Design Lab
Chapter 4 The Processor Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationCache Justification for Digital Signal Processors
Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose
More informationGeneral Purpose Processors
Calcolatori Elettronici e Sistemi Operativi Specifications Device that executes a program General Purpose Processors Program list of instructions Instructions are stored in an external memory Stored program
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationInstructions: Language of the Computer
CS359: Computer Architecture Instructions: Language of the Computer Yanyan Shen Department of Computer Science and Engineering 1 The Language a Computer Understands Word a computer understands: instruction
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationEfficient filtering with the Co-Vector Processor
Efficient filtering with the Co-Vector Processor BL Dang yy, Nur Engin y, GN Gaydadjiev yy y Philips Research Laboratories, Eindhoven,The Netherlands yy EEMCS Faculty, Delft University of Technology, The
More informationAnne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B
Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. See P&H Chapter: 2.16-2.20, 4.1-4.4,
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components
More informationREAL TIME DIGITAL SIGNAL PROCESSING
REAL TIME DIGITAL SIGNAL PROCESSING SASE 2010 Universidad Tecnológica Nacional - FRBA Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.
More information4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?
Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide
More informationIntroduction to Field Programmable Gate Arrays
Introduction to Field Programmable Gate Arrays Lecture 2/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Digital Signal
More informationComputer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture
Computer Science 324 Computer Architecture Mount Holyoke College Fall 2009 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture. Idea:
More informationComputer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 4.2: MIPS ISA -- Instruction Representation Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design,
More informationEC-801 Advanced Computer Architecture
EC-801 Advanced Computer Architecture Lecture 5 Instruction Set Architecture I Dr Hashim Ali Fall 2018 Department of Computer Science and Engineering HITEC University Taxila!1 Instruction Set Architecture
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationCS3350B Computer Architecture Winter 2015
CS3350B Computer Architecture Winter 2015 Lecture 5.5: Single-Cycle CPU Datapath Design Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More information04 - DSP Architecture and Microarchitecture
September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design
ECE 1160/2160 Embedded Systems Design Midterm Review Wei Gao ECE 1160/2160 Embedded Systems Design 1 Midterm Exam When: next Monday (10/16) 4:30-5:45pm Where: Benedum G26 15% of your final grade What about:
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationInstruction Set Architecture. "Speaking with the computer"
Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design
More informationCS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing
CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE Debdeep Mukhopadhyay, CSE, IIT Kharagpur Instructions and Addressing 1 ISA vs. Microarchitecture An ISA or Instruction Set Architecture describes the aspects
More informationChapter 13 Reduced Instruction Set Computers
Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining
More informationCN310 Microprocessor Systems Design
CN310 Microprocessor Systems Design Micro Architecture Nawin Somyat Department of Electrical and Computer Engineering Thammasat University 28 August 2018 Outline Course Contents 1 Introduction 2 Simple
More information