Digital Logic. Ch. 4 and Appendix C
|
|
- Barbara Armstrong
- 6 years ago
- Views:
Transcription
1 Digital Logic Ch. 4 and Appendix C
2 Gates The most obvious gates are AND and OR We can combine them to implement any logic function
3 Conventions Zero volts is logic 0 5 volt is logic 1 Unless we use negative logic Most computers use smaller voltages now 1.5 volt is used by DDR3 memories In this case 1.5 volt is logic 1 Dues to electrical noise the logic levels are define by a range.
4 Other gates The little circle means not NOR gate (not OR) NAND gate WHAT??? WHAT???
5 Truth Tables It is the opposite of an AND gate It is a NAND gate
6 Example Try to figure out what this does It is a one bit adder with carry in.
7 Simpler Drawing
8 Programmable Logic Arrays PLA for short The dots are really fuses inside a chip Fuses can be programmed once Can implement any logic function Modern fuses are programed many times PLAs on hormones are called Field Programmable Gate Arrays (FPGA)
9 PLAs AND gate array OR gate array
10 Standard Components Decoders Multiplexers ROM
11 Decoder
12 Multiplexer
13 ROM
14 Boolean Algebra Laws Identity Law: A+0=A, A*1=A Zero & One Law: A+1=1, A*0=0 Existence of inverse: A+A' = 1, A*A' = 0 Commutative Law: A+B=B+A, A*B=B*A Associative Law: A+(B+C)=(A+B)+C A*(B*C)=(A*B)*C Distributive Law: A*(B+C)=A*B+A*C A+(B*C)=(A+B)*(A+C)
15 De Morgan's Law (A+B)' = A' * B' (A*B)' = A' + B' Principle of Duality AND and OR are symmetric So is 0 and 1
16 Optimization Two different logic expressions can have exactly the same behavior. Two different expressions with identical behavior may have different cost of implementation Choosing the cheapest is optimization May have to satisfy other criteria Propagation delay, no glitches, etc
17 Optimization AB + AB' =A(B+B') =A*1=A A'B'C + ABC = (A'B' + AB)C =( (A'B' + A)(A'B' + B) )C ( B' + A)(A'+B)C
18 Half Adder S = A'B + AB' C = AB AB SC
19 Full Adder S = A'B'C + AB'C' + A'B'C + ABC Cout = ABC + A'BC + ABC' + AB'C C out = AB + BC + CA (optimized) ABC SC
20 Verilog A hardware description language Can be used to design, optimize and simulate hardware Started in the mid 80's as a hardware simulation system Hardware synthesis was added later Its main competitor is VHDL
21 What can Verilog do? Describe a circuit for simulation purposes Many of the Verilog constructs can be synthesizeable. Allows the designer to specify Behavior and/or Structure
22 Structure of a Verilog Module Contains initial constructs Parallel blocks called always constructs Continuous assignments to specify combinational circuits (gates w/o memory) Instances of other modules
23 Elements of Verilog Wire: mathematical abstraction of a real wire Can have 4 possible values!! True or 1 False or 0 X: unknown (not yet defined, unconnected etc) Z: high impedance Electrically disconnected. A smart trick electronics engineers have invented.
24 Elements of Verilog Registers (reg) Are memory elements Verilog compiler may map them to actual memory elements (flip flops) Same set of possible values
25 Elements of Verilog Constants Can be specified as plain constants like 3, 15, Often we want to specify the bit width of a constant 4'b0011 is 4 bit representation of 3 5'b00011 is a 5 bit representation of 3 4'b0011 is 4 bit representation of 3 (2's compl.) 4'hF is 4 bit representation of 15
26 Operators in Verilog +,,*,/ like C &,, ~, ^ again like C ==,!=, <, >, <=, >= like C <<, >> like C con?expr1:expr2 like C
27 Operators in Verilog But adds to C Unary &,, ^ Apply the operator on all bits of the operand {A,B} the bits of A followed by the bits of B {x{const}} is {const,const... x times}
28 Combinational Circuits A network of gates Directed graph There should be no cycles Output determined exclusively by inputs Implement logic functions
29 Memory elements
30 Memory Elements We can think of memory elements as combinational circuits with feedback We would rather think of them as little black boxes Sometimes memory is implemented using other technologies (capacitors for DRAM)
31 Combinational Circuits Module half_adder(a,b,sum,carry); input A,B; output Sum, Carry; assign Sum = A^B; assign Carry = A & &; endmodule
32 Combinational Circuits Use the assign keyword They represent permanent connections The assign keyword can specify only combinational circuits Combinational circuits can be specified with the always construct as well The always construct can also specify sequential circuits as well
33 The always construct Module half_adder(a,b,sum,carry) input A,B; output reg S, C begin case ({A,B}) 2'b00: begin S=0; C=0; end; 2'b01: begin S=1; C=0; end; 2'b10: begin S=1; C=0; end; 2'b11: begin S=0; C=1; end; end endmodule
34 Combinational with always Previous example used always to implement a half adder Uses blocking assignments Pretty much the same as C If properly defined, most compilers will not use flip flops to implement it If all input signals are on sensitivity list Every execution path assigns value to the same bits
35 Sequential Circuits Any circuit that contains memory If it contains memory then it has state If it has state then the state changes, so it goes through a sequence of states Hence the name sequential.
36 Sequential Circuits
37 Sequential Circuits How come signals don't rush around the loop uncontrollably? This is where the clock comes in It is the same clock you see on the specs of your CPU With every clock pulse the signal goes around once These are called synchronous sequential circuits There are also asynchronous
38 Typical Latch
39 Still... Unless the width of the clock pulse is wisely selected... The signal will travel around more than once These latches are useful in some case, but not good enough for our current task
40 Falling edge trigger FF
41 Edge triggered D Flip Flop Module DFF(clock,D,Q,Qb) input clock, D; output reg Q; output Qb; assign Qb = ~Q; clock) Q <= D; endmodule
42 Timings Timing is complex We use a simplified model Setup time: time the input to the FF has to be stable before the clock edge Hold time: time the input has to be stable after the clock edge
43 Multibit Wires and Registers reg [31:0] rega; rega[0] is the LSB; wire [31:0] ALUout; reg [31:0] regfile[0:31]; regfile[0] is the first register in the register file.
44 MIPS ALU module MIPSALU (ALUctl, A, B, ALUOut, Zero); input [3:0] ALUctl; input [31:0] A,B; output reg [31:0] ALUOut; output Zero; assign Zero = (ALUOut==0); //Zero is true if ALUOut is 0 A, B) begin //reevaluate if these change case (ALUctl) 0: ALUOut = A & B; 1: ALUOut = A B; 2: ALUOut = A + B; 6: ALUOut = A B; 7: ALUOut = A < B? 1 : 0; 12: ALUOut = ~(A B); // result is nor default: ALUOut <= 0; endcase end endmodule
45 Register File
46 Register File: read
47 Register File: write
48 Register File: Verilog module rfile(r1,r2,w,wd,wctl,rd1,rd2,clock) input [5:0] R1,R2,W; input [31:0] WD; input Wctl, clock; output [31:0] RD1,RD2; reg [31:0] RF[31:0]; assign RD1 = RF[R1]; assign RD2 = RF[R2]; clock) if (Wctl) RF[W] <= WD; endmodule
49 Specifying Gates Verilog allows the designer to specify individual gates Can be bulky Similar syntax can be used for user defined modules
50 Half Adder module HA(A,B,S,C) input A, B; output S, C; wire Bn, An, Abn, AnB; not N1(An,A); not N2(Bn,B); and (Abn,A,Bn); and (AnB,An,B); or (S,ABn,AnB); and (C,A,B); endmodule
51 Speeding Up Addition Carry propagation is what slows down addition Sometimes the LSB of input will affect the MSB or the output We design for the worst case senario The simpler adders are called ripple adders
52 Carry LookAhead a0, a1, a2, etc; b0, b1, b2, etc are the inputs c0, c1, c2 are the carries. c1 = b0 c0 + a0 c0 + a0 b0 c1 = a0 b0 + c0 (a0 + b0) c1 = g0 + c0 p0 g0 = a0 b0; p0 = a0 + b0;
53 Carry LookAhead Define g i = a i + b i Then p i = a i b i c i+1 = g i + p i c i
54 Carry LookAhead c1 = g0 + p0 c0 c2 = g1 + p1 g0 + p1 p0 c0 c3 = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0
55 Control Hazards Whenever we have a branch/jump/jal/whatever We find out which way we branch at the MEM stage Meanwhile we have loaded the next three instructions We have to flush the pipeline We waste three cycles
56 What is the problem Jumps/branches are very common 25% of the instructions sometimes If our processor is 4 way superscalar wasting three cycles means we do not execute 12 instructions! Longer pipelines suffer even more
57 Solutions Delayed branch Means that next instruction is always executed Ideally an instruction from before that is independent of the branch An instruction from fall through that has no effect if branch is taken An instruction from the target that has no effect if branch falls through A nop if all else is unavailable We save at most one cycle
58 Solutions Decide the branch at ID stage Requires extra hardware Saves two cycles With branch delay can be stall free
59 Solutions Always predict not taken The easiest... just do what we did so far Fails miserably for loops
60 Solutions Predict taken We can do this at the ID stage Waste one cycle only if prediction correct Combined with delayed branch cost goes to zero (if correct Works fine for many loops
61 Solutions Statically predict taken/not taken Can be done with heuristics Or by giving the compiler an execution trace Just have two variants of every branch instruction Easy to implement Works great for numerical programs Not so great for non numerical
62 Solutions Dynamic prediction The most advanced and most popular Requires a lot of silicon area Can be done by hashing the address to a small memory. (Branch Prediction Buffer) Memory remembers 1 bit (taken/not taken) Loops have two mispredictions Can be solved with two bit prediction There are many far more sophisticated techniques
63 Solutions Speculation The technique nowadays Good when the control hazard is compounded by a data hazard Should allow out of order execution Should provide a way to undo a change after a failed speculation
64 Exceptions/Interrupts There is a difference Exceptions are caused by an internal condition Error, system call Interrupts are caused by external conditions I/O complete, mouse clicks In many cases all are called interrupts They are handled in more or less the same way
65 Why bother A computer that does not communicate with its environment is called a brick The extra hardware to detect and handle interrupts is large and contributes to the slowing down of the clock That's part of the reason why some co processors run so much faster
66 How are they handled The CPU provide relevant info in two registers EPC (Exception Program Counter), 32 bits Cause Register, 32 bits bu many unused Alternatively Use Vector interrupts For each possible cause there is an entry in the vector
67 In more detail Another form of control hazard Instead of branching to a user space address, branch to a kernel space address Branches happen only at a particular stage in the pipeline, but exceptions can happen almost anywhere More than one exception can happen at the same time in different instructions We may need to restart the instruction after the exception is handled Some instructions are handled on the spot, others where they happened.
68 Instruction Level Parallelism What drove the speed of cpus Pipelining is the oldest technique Race to reduce hazards Programmer is unaware of the parallelism The key is multiple issue We encounter hazards on hormones
69 Two kinds Static multiple issue VLIW Fixed form issue packet Was the technique used on Itanium There are usually restriction on what instructions can be packaged together In some designs the compiler has to guarantee no data/structural hazards within the issue packet
70 Extra cost If we allow the issue of an ALU and a memory instruction at the same time we need Twice as many ports on the register file An extra adder to calculate the effective address Ability to detect/forward many more hazards between different issue packets Stalls create twice as much delay
71 Advantage With two issue we have possibly twice as fast processor (if the world was made by angels) We do not need much more hardware With a good compiler the C programmer will never know
72 Disadvantage We have to recompile for new architectures We save a bit on hardware but it is hard to make use of advances immediately Software vendors hated it Itanium is dead.
73 Example: VLIW for MIPS A simplified static multiple issue MIPS like processor Can issue one ALU/branch and one load/store instruction per cycle. Ignores dependencies within the issue packet. Stalls/forwards for dependencies between issue packets
74 Example: VLIW for MIPS ALU/branch IF ID EX M WB Load/Store IF ID EX M WB ALU/branch IF ID EX M WB Load/Store IF ID EX M WB ALU/branch IF ID EX M WB Load/Store IF ID EX M WB ALU/branch IF ID EX M WB Load/Store IF ID EX M WB
75 Example: scheduling code Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, 4 bne $s1, $0, Loop
76 Scheduled Loop: nop lw $t0, 0($s1) addi $s1, $s1, 4 nop addu $t0, $t0, $s2 nop bne $s1, $0, Loop sw $t0, 0($s1)
77 The Verdict We can do it in 4 issue packets instead of five instructions Before we had one or two stalls so it would take 6 7 cycles to execute, plus the stalls due to the branch If we optimize the single issue version we can get it down to 5 cycles plus branch stalls Now we can execute it in 4 cycles plus branch stalls.
78 Observations We now have many more stalls/nops than single issue The new stalls/nops eat up most of the improvement It is not worth the extra hardware/power consumption Is it the end of the road?
79 Loop unrolling Compiler optimization Can be done easily when loops are independent Sometimes even when they are not independent Reduces the loop overhead Fewer instructions executed Allows more freedom in scheduling Fewer stalls/nops
80 The code Loop: addi $s1,$s1, 16 nop addu $t0, $t0, $s2 addu $t0, $t0, $s2 addu $t0, $t0, $s2 addu $t0, $t0, $s2 nop bne $s1, $0, Loop lw $t0, 0($s1) lw $t1, 12($s1) lw $t2, 8($s1) lw $t3, 8($s1) sw $t0, 8($s1) sw $t1, 8($s1) sw $t2, 8($s1) sw $t3, 4($s1)
81 The tricks we used Unroll the loop, eliminate the branches, simplify loop variable updating Use more temp registers This is called register renaming We need to do it if we have anti dependence or name dependence We may run out of registers or need more saving/restoring Longer code May not be optimal in all architectures
82 Dynamic Multiple Issue A.K.A superscalars The processor decides if it going to issue 0, 1, 2... instructions Instructions are allowed to execute out of order But not necessarily complete out of order The processor decides how many to instructions to issue The compiler does not need to know.
83 Dynamic Pipeline scheduling lw $t0, 20($s2) addu $t1, $t0, $t2 sub $s4, $s4, $t3 slti $t5, $s4, 20 The sub instruction can execute before addu
84 Dynamic Pipeline IF/ID Reservation station Reservation station... Reservation station Exec unit Exec unit Exec unit Commit Unit
85 The bad news Dynamic multiple issue CPUs were available for decades Some can issue more than 4 instructions per cycle They rarely complete more than 2 per cycle on average Have to be conservative to maintain correctness (pointer aliasing)
86 Power Efficiency Power has emerged as the limiting factor Cost of energy goes up Huge server farms are common Ability to eliminate heat is limited Battery life is very important Environmental concerns
87 Fallacies and Pitfalls Pipelining is easy Real pipelining is quite complex Pipelining is independent of technology The huge number of transistors offer options that annul previous technologies (huge pipelines vs delayed branches) Some optimizations in ISA spoil the speed of the pipeline.
Processor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationAdvanced Instruction-Level Parallelism
Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu
More informationLec 25: Parallel Processors. Announcements
Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationDigital Design with FPGAs. By Neeraj Kulkarni
Digital Design with FPGAs By Neeraj Kulkarni Some Basic Electronics Basic Elements: Gates: And, Or, Nor, Nand, Xor.. Memory elements: Flip Flops, Registers.. Techniques to design a circuit using basic
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationChapter 4 The Processor (Part 4)
Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline
More informationHomework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures
Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationMulticore and Parallel Processing
Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: January 2, 2018 at 11:23 CS429 Slideset 5: 1 Topics of this Slideset
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationChapter 4 The Processor 1. Chapter 4D. The Processor
Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline
More informationIntroduction to Verilog
Introduction to Verilog Structure of a Verilog Program A Verilog program is structured as a set of modules, which may represent anything from a collection of logic gates to a complete system. A module
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationEN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)
EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationFPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1
FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1 Anurag Dwivedi Digital Design : Bottom Up Approach Basic Block - Gates Digital Design : Bottom Up Approach Gates -> Flip Flops Digital
More informationIF1/IF2. Dout2[31:0] Data Memory. Addr[31:0] Din[31:0] Zero. Res ALU << 2. CPU Registers. extension. sign. W_add[4:0] Din[31:0] Dout[31:0] PC+4
12 1 CMPE110 Fall 2006 A. Di Blas 110 Fall 2006 CMPE pipeline concepts Advanced ffl ILP ffl Deep pipeline ffl Static multiple issue ffl Loop unrolling ffl VLIW ffl Dynamic multiple issue Textbook Edition:
More informationStatic, multiple-issue (superscaler) pipelines
Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue
More informationCode No: R Set No. 1
Code No: R059210504 Set No. 1 II B.Tech I Semester Regular Examinations, November 2006 DIGITAL LOGIC DESIGN ( Common to Computer Science & Engineering, Information Technology and Computer Science & Systems
More informationCSE A215 Assembly Language Programming for Engineers
CSE A215 Assembly Language Programming for Engineers Lecture 4 & 5 Logic Design Review (Chapter 3 And Appendices C&D in COD CDROM) September 20, 2012 Sam Siewert ALU Quick Review Conceptual ALU Operation
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCS/CoE 1541 Mid Term Exam (Fall 2018).
CS/CoE 1541 Mid Term Exam (Fall 2018). Name: Question 1: (6+3+3+4+4=20 points) For this question, refer to the following pipeline architecture. a) Consider the execution of the following code (5 instructions)
More informationQuestion 1: (20 points) For this question, refer to the following pipeline architecture.
This is the Mid Term exam given in Fall 2018. Note that Question 2(a) was a homework problem this term (was not a homework problem in Fall 2018). Also, Questions 6, 7 and half of 5 are from Chapter 5,
More informationECE369. Chapter 5 ECE369
Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1
More informationCOMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationHenry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012
Henry Lin, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 7 (Digital Logic) July 24 th, 2012 1 Digital vs Analog Digital signals are binary; analog
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationCOMPUTER ORGANIZATION AND DESIGN
ARM COMPUTER ORGANIZATION AND DESIGN Edition The Hardware/Software Interface Chapter 4 The Processor Modified and extended by R.J. Leduc - 2016 To understand this chapter, you will need to understand some
More informationCS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction
CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.
More information(Basic) Processor Pipeline
(Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might
More informationIn this lecture, we will go beyond the basic Verilog syntax and examine how flipflops and other clocked circuits are specified.
1 In this lecture, we will go beyond the basic Verilog syntax and examine how flipflops and other clocked circuits are specified. I will also introduce the idea of a testbench as part of a design specification.
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationDIGITAL SYSTEM DESIGN
DIGITAL SYSTEM DESIGN Prepared By: Engr. Yousaf Hameed Lab Engineer BASIC ELECTRICAL & DIGITAL SYSTEMS LAB DEPARTMENT OF ELECTRICAL ENGINEERING Digital System Design 1 Name: Registration No: Roll No: Semester:
More informationFormat. 10 multiple choice 8 points each. 1 short answer 20 points. Same basic principals as the midterm
Final Review Format 10 multiple choice 8 points each Make sure to show your work Can write a description to the side as to why you think your answer is correct for possible partial credit 1 short answer
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationEEC 581 Computer Architecture. Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW)
1 EEC 581 Computer Architecture Instruction Level Parallelism (3.6 Hardware-based Speculation and 3.7 Static Scheduling/VLIW) Chansu Yu Electrical and Computer Engineering Cleveland State University Overview
More informationa, b sum module add32 sum vector bus sum[31:0] sum[0] sum[31]. sum[7:0] sum sum overflow module add32_carry assign
I hope you have completed Part 1 of the Experiment. This lecture leads you to Part 2 of the experiment and hopefully helps you with your progress to Part 2. It covers a number of topics: 1. How do we specify
More informationTDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design
1 TDT4255 Computer Design Lecture 4 Magnus Jahre 2 Outline Chapter 4.1 to 4.4 A Multi-cycle Processor Appendix D 3 Chapter 4 The Processor Acknowledgement: Slides are adapted from Morgan Kaufmann companion
More informationCAD4 The ALU Fall 2009 Assignment. Description
CAD4 The ALU Fall 2009 Assignment To design a 16-bit ALU which will be used in the datapath of the microprocessor. This ALU must support two s complement arithmetic and the instructions in the baseline
More informationPrinciples of Digital Techniques PDT (17320) Assignment No State advantages of digital system over analog system.
Assignment No. 1 1. State advantages of digital system over analog system. 2. Convert following numbers a. (138.56) 10 = (?) 2 = (?) 8 = (?) 16 b. (1110011.011) 2 = (?) 10 = (?) 8 = (?) 16 c. (3004.06)
More informationHANSABA COLLEGE OF ENGINEERING & TECHNOLOGY (098) SUBJECT: DIGITAL ELECTRONICS ( ) Assignment
Assignment 1. What is multiplexer? With logic circuit and function table explain the working of 4 to 1 line multiplexer. 2. Implement following Boolean function using 8: 1 multiplexer. F(A,B,C,D) = (2,3,5,7,8,9,12,13,14,15)
More informationTopics of this Slideset. CS429: Computer Organization and Architecture. Digital Signals. Truth Tables. Logic Design
Topics of this Slideset CS429: Computer Organization and rchitecture Dr. Bill Young Department of Computer Science University of Texas at ustin Last updated: July 5, 2018 at 11:55 To execute a program
More informationCS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016
CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 2, 2016 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More information5 th Edition. The Processor We will examine two MIPS implementations A simplified version A more realistic pipelined version
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 5 th Edition Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationSynthesis of Language Constructs. 5/10/04 & 5/13/04 Hardware Description Languages and Synthesis
Synthesis of Language Constructs 1 Nets Nets declared to be input or output ports are retained Internal nets may be eliminated due to logic optimization User may force a net to exist trireg, tri0, tri1
More informationMultiple Issue ILP Processors. Summary of discussions
Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware
More information(ii) Simplify and implement the following SOP function using NOR gates:
DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING EE6301 DIGITAL LOGIC CIRCUITS UNIT I NUMBER SYSTEMS AND DIGITAL LOGIC FAMILIES PART A 1. How can an OR gate be
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationThe Pipelined RiSC-16
The Pipelined RiSC-16 ENEE 446: Digital Computer Design, Fall 2000 Prof. Bruce Jacob This paper describes a pipelined implementation of the 16-bit Ridiculously Simple Computer (RiSC-16), a teaching ISA
More informationAdapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]
Review and Advanced d Concepts Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Pipelining Review PC IF/ID ID/EX EX/M
More informationOrange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction
More informationReference Sheet for C112 Hardware
Reference Sheet for C112 Hardware 1 Boolean Algebra, Gates and Circuits Autumn 2016 Basic Operators Precedence : (strongest),, + (weakest). AND A B R 0 0 0 0 1 0 1 0 0 1 1 1 OR + A B R 0 0 0 0 1 1 1 0
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationCS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015
CS 31: Intro to Systems Digital Logic Kevin Webb Swarthmore College February 3, 2015 Reading Quiz Today Hardware basics Machine memory models Digital signals Logic gates Circuits: Borrow some paper if
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More information6.1 Combinational Circuits. George Boole ( ) Claude Shannon ( )
6. Combinational Circuits George Boole (85 864) Claude Shannon (96 2) Signals and Wires Digital signals Binary (or logical ) values: or, on or off, high or low voltage Wires. Propagate digital signals
More informationChapter 4. The Processor. Jiang Jiang
Chapter 4 The Processor Jiang Jiang jiangjiang@ic.sjtu.edu.cn [Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2008, MK] Chapter 4 The Processor 2 Introduction CPU performance
More informationLECTURE 10. Pipelining: Advanced ILP
LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction
More informationMemory Supplement for Section 3.6 of the textbook
The most basic -bit memory is the SR-latch with consists of two cross-coupled NOR gates. R Recall the NOR gate truth table: A S B (A + B) The S stands for Set to remember, and the R for Reset to remember.
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationInjntu.com Injntu.com Injntu.com R16
1. a) What are the three methods of obtaining the 2 s complement of a given binary (3M) number? b) What do you mean by K-map? Name it advantages and disadvantages. (3M) c) Distinguish between a half-adder
More informationChapter 5 Registers & Counters
University of Wisconsin - Madison ECE/Comp Sci 352 Digital Systems Fundamentals Kewal K. Saluja and Yu Hen Hu Spring 2002 Chapter 5 Registers & Counters Originals by: Charles R. Kime Modified for course
More informationCS470: Computer Architecture. AMD Quad Core
CS470: Computer Architecture Yashwant K. Malaiya, Professor malaiya@cs.colostate.edu AMD Quad Core 1 Architecture Layers Building blocks Gates, flip-flops Functional bocks: Combinational, Sequential Instruction
More informationComputer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture
Computer Science 324 Computer Architecture Mount Holyoke College Fall 2009 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture. Idea:
More informationECE 154B Spring Project 4. Dual-Issue Superscalar MIPS Processor. Project Checkoff: Friday, June 1 nd, Report Due: Monday, June 4 th, 2018
Project 4 Dual-Issue Superscalar MIPS Processor Project Checkoff: Friday, June 1 nd, 2018 Report Due: Monday, June 4 th, 2018 Overview: Some machines go beyond pipelining and execute more than one instruction
More informationChapter 3. Pipelining. EE511 In-Cheol Park, KAIST
Chapter 3. Pipelining EE511 In-Cheol Park, KAIST Terminology Pipeline stage Throughput Pipeline register Ideal speedup Assume The stages are perfectly balanced No overhead on pipeline registers Speedup
More informationVerilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design
Verilog What is Verilog? Hardware description language: Are used to describe digital system in text form Used for modeling, simulation, design Two major languages Verilog (IEEE 1364), latest version is
More informationENCM 369 Winter 2018 Lab 9 for the Week of March 19
page 1 of 9 ENCM 369 Winter 2018 Lab 9 for the Week of March 19 Steve Norman Department of Electrical & Computer Engineering University of Calgary March 2018 Lab instructions and other documents for ENCM
More informationMLR Institute of Technology
MLR Institute of Technology Laxma Reddy Avenue, Dundigal, Quthbullapur (M), Hyderabad 500 043 Course Name Course Code Class Branch ELECTRONICS AND COMMUNICATIONS ENGINEERING QUESTION BANK : DIGITAL DESIGN
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationRecitation Session 6
Recitation Session 6 CSE341 Computer Organization University at Buffalo radhakri@buffalo.edu March 11, 2016 CSE341 Computer Organization Recitation Session 6 1/26 Recitation Session Outline 1 Overview
More informationSynthesis of Combinational and Sequential Circuits with Verilog
Synthesis of Combinational and Sequential Circuits with Verilog What is Verilog? Hardware description language: Are used to describe digital system in text form Used for modeling, simulation, design Two
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationCS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07
CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as
More information