Workload Characterization and Performance for a Network Processor

Size: px
Start display at page:

Download "Workload Characterization and Performance for a Network Processor"

Transcription

1 Workload Characterization and Performance for a Network Processor Mitsuhiro Miyazaki Princeton Architecture Laboratory for Multimedia and Security (PALMS) May

2 Objectives To evaluate a NP from the computer architect s point of view, rather than the network infrastructure point of view To understand hardware multithreading effect for NPs To guide the architectural design of future NPs

3 Outline Router Processing Characterization Workload Characterization Intel s IXP1200 Architecture Simulation Setup IXP1200 Evaluation Instruction Mix Latency Executing, Aborted, Stalled and Idle ratio CPI Throughput Other NPs Conclusion and Future work

4 Input Port Router Processing Characterization Input Scheduler Classifier & Filter RPB Forwarder FIB FW Queuing Assignment TQB Output Scheduler & Load balancing Output Port IS RFIFO CF FL RPB FIB FW QA TQB OS TFIFO LB Packet Discard RPB FIB FW TQB

5 Frequently occurred packets in the real Internet Packe t Size Packet Type Description Packets Distributio n Internet Traffic 1) 40 Bytes 2) 576 Bytes TCP packets with IP header but no payload (i.e. only 20 Bytes IP header plus 20 Bytes TCP header), typically sent at the start of a new TCP session. The default IP Maximum Datagram Size (MDS) packets without fragmentation, including the default TCP Maximum Segment Size (MSS) 536 Bytes packets. 35% 3.5% 11.5% 16.5% 3) 1500 Bytes Packets corresponding to the Maximum Transmission Unit (MTU) size of an Ethernet connection. 10% 37% Note: Based on data collected by the National Laboratory for Applied Network Research (NLANR) project located at San Diego Supercomputer Center

6 Workloads of fixed size packets Packet Size Packet Type Description 1) 64 Bytes The minimum-size Ethernet packets, consisting of 14 Bytes Ethernet header, 20 Bytes IP header, 26 Bytes Payload, and 4 Bytes Ethernet trailer (FCS), and being expected to be used for TCP handshake 2) 594 Bytes Ethernet packets including 14 Bytes Ethernet header, 20 Bytes IP header, 556 Bytes Payload (assuming 20 Bytes TCP header plus 536 Bytes MSS), and 4 Bytes Ethernet trailer (FCS) 3) 1518 Bytes The maximum-size Ethernet packets, consisting of 14 Bytes Ethernet header, 20 Bytes IP header, 1480 Bytes Payload and 4 Bytes Ethernet trailer (FCS) Note: Workloads use Ethernet packets because the simulation assumes a router with 16x100Mbps Ethernet ports

7 Workload of Mixture packets Packet Size (Bytes) Proportion of Total Traffic Load 64 50% (6 parts) 7.881% % (5 parts) 60.96% % (1 parts) % Note: The average size of packets is 406 bytes.

8 IXP1200 Architecture Intel Strong ARM Core Intel Strong ARM SA-1Core 16 Kbyte I-cache 8 Kbyte D-cache 512 Kbyte Mini-Dcache Write-Buffer Read Buffer JTAG PCI Unit 32-bit bus UART 4 Timers GPIO RTC SDRAM Unit 64-bit bus 32-bit bus SRAM Unit FBI Unit Scratchpad Memory (4 Kbyte) Microengine 1 Microengine 2 Microengine 3 Hash Unit 64-bit bus IX Bus Interface Microengine 4 Microengine 5 Microengine 6 Notes: 32-bit Data Bus 32-bit ARM System Bus

9 Microengine Pipelining Note: Context switching can be made by 4PCs, 128GPRs, 64SDRAM Xfer regs, 64 SRAM Xfer regs and other CSRs

10 Hardwre Multi-Threading Multithreading keeps Microengine execution pipeline active without numerous stalled cycles Thread0 Thread stalled* Thread1 Thread stalled* Thread stalled* Thread2 Thread stalled* Thread3 *Note: Threads stalled are caused by memory access

11 Memory Access Flow

12 Branch and Context switch Instructions Class 3 Class2 Class1 br_bclr and br_bset br=0 br sdram br=byte and br!=byte br!=0 br=ctx sram jump br>0 br!=ctx hash1_48 rtn br>=0 ctx_arb hash2_48 br_!signal br<0 csr hash3_48 br_inp_state br<=0 r_fifo_rd hash1_64 br=cout t_fifo_wr hash2_64 br!=cout scratch hash3_64 Note: Blue colored instructions indicate context switch instructions.

13 Branch pipeline example with Class 3 Instruction

14 Branch pipeline example with Class 2 Instruction Case 1 Case 2

15 Branch/Context switch pipeline example with Class 1 Instruction

16 Solutions for branch penalties Deferred branch instruction Guess branch instruction Condition Code set earlier

17 Deferred branch Instruction

18 Guess Branch Instruction

19 Combination of Guess and Deferred Branch

20 Simulation Setup Workbench GUI interface to all Microengine tools Microcode assembler Microcode linker Transactor Debug and Simulation engine with IXP1200 Architectural Model and Memory The verilog model of an IX bus device(i.e. MAC device) Reference program(l2l3fwd16)

21 Simulation Image SRAM 100Mbps(Full Duplex) x 16 ports MAC IXF440 8 ports MAC IXF440 8 ports IX Bus 32bit 32bit FBI Unit SRAM Unit SDRAM Unit Six Micro engines SDRAM IXP1200

22 Thread assignment & Sim Conditions Receive threads are assigned to Microengine 0-3 Transmit threads are assigned to Microengine 4-5 One thread per Microengine works as output scheduler in Microengine 4-5 Operation Frequency Microengine runs at 232MHz The IX bus transfers packets at 104MHz SRAM and SDRAM bus transfer data at 116MHz The simulation had to forward 3000 packets

23 Instruction Mix for Receive Processing Packet Types Mixture 1518B 594B 64B 31.9% 30.3% 32.5% 40.8% 39.8% 40.8% 37.8% 28.0% 7.3%15.2% 5.8% 7.2% 16.4% 5.3% 10.0% 14.2% 5.6% 7.6% 16.6% 6.9% Arithmetic,Rotate, and Shift Instructions Branch and Jump Instructions Reference Instructions Local Register Instructions Miscellaneous Instructions 0% 20% 40% 60% 80% 100% Instruction Ratio

24 Instruction Mix for Transmit Processing Packet Types Mixture 1518B 594B 64B 50.7% 50.9% 51.3% 48.2% 31.0% 31.1% 30.7% 30.7% 8.5% 8.7% 1.1% 8.5% 8.6% 0.9% 8.2% 8.6% 1.2% 10.6% 8.2% 2.4% Arithmetic,Rotate, and Shift Instructions Branch and Jump Instructions Reference Instructions Local Register Instructions Miscellaneous Instructions 0% 20% 40% 60% 80% 100% Instruction Ratio

25 Instruction Mix for Overall Processing Packet Types Mixture 1518B 594B 64B 39.2% 38.4% 39.8% 43.4% 36.4% 37.0% 35.1% 29.0% 6.9% 12.7% 4.9% 6.5% 13.4% 4.7% 6.6% 12.0% 6.6% 8.6%11.7% 7.4% Arithmetic,Rotate, and Shift Instructions Branch and Jump Instructions Reference Instructions Local Register Instructions Miscellaneous Instructions 0% 20% 40% 60% 80% 100% Instruction Ratio

26 SDRAM Latency 100 cumulative percentage Microengine0 Microengine1 Microengine2 Microengine cycles

27 SRAM Latency (unlocked) 100 cumulative percentage Microengine0 Microengine1 Microengine2 Microengine3 Microengine4 Microengine cycles

28 Execution, Aborted, Stalled and Idle Ratio on 64bytes packets Microengine Microengine1 Microengine2 Microengine Executing Aborted Stalled Idle Microengine Microengine % 20% 40% 60% 80% 100% ratio

29 Execution, Aborted, Stalled and Idle Ratio on 594bytes packets Microengine Microengine1 Microengine2 Microengine Executing Aborted Stalled Idle Microengine Microengine % 20% 40% 60% 80% 100% ratio

30 Execution, Aborted, Stalled and Idle Ratio on 1518bytes packets Microengine Microengine1 Microengine2 Microengine Executing Aborted Stalled Idle Microengine Microengine % 20% 40% 60% 80% 100% ratio

31 Execution, Aborted, Stalled and Idle Ratio on Mixture packets Microengine Microengine1 Microengine2 Microengine Executing Aborted Stalled Idle Microengine Microengine % 20% 40% 60% 80% 100% ratio

32 Cycle per Instruction (CPI) MixturePackets-uEngine0 1518BPackets-uEngine0 594BPackets-uEngine0 64BPackets-uEngine0 MixturePackets-uEngine1 1518BPackets-uEngine1 594BPackets-uEngine1 64BPackets-uEngine1 MixturePackets-uEngine2 1518BPackets-uEngine2 594BPackets-uEngine2 64BPackets-uEngine2 MixturePackets-uEngine3 1518BPackets-uEngine3 594BPackets-uEngine3 64BPackets-uEngine3 MixturePackets-uEngine4 1518BPackets-uEngine4 594BPackets-uEngine4 64BPackets-uEngine4 MixturePackets-uEngine5 1518BPackets-uEngine5 594BPackets-uEngine5 64BPackets-uEngine CPI

33 Throughput (bounded) Sim Rate Mpps Ideal Sim Rate OC-24(CRC16) Mixture 1518bytes 594bytes 64bytes Note: The reason why OC-24 is higher than Sim rate comes from the difference of protocol overhead Ethernet protocol overhead:38bytes per packet.(82.6% overhead for 46bytes IP packet) Protocol header and trailer(18bytes)+ifg(12bytes)+preamble/sfd(8bytes)= 38bytes OC-24 POS overhead:7bytes per packet(15.2% overhead for 46bytes IP packet)

34 Throughput (unbounded) Mpps Mixture 1518bytes 594bytes 64bytes Sim Rate 1.244GEther(OC-24class) 2.488GEther(OC-48class) Note: These throughputs don t include 12bytes IFG overhead.

35 Features of Other NPs Lexra s NetVortex 32-bit MIPS-1 Instruction set plus 18 extended instructions for context control and bit-field operation Supports up to 8 contexts per processor Each context includes 32 GPRs, its own PC and a status reg. Uses delay slot of memory reference for context switching(ex. LW.CSW reg. addr.) Performs in the similar way to IXP1200 Motrola s C-5 A subset of MIPS-1 Instruction set (excluding multiply, divide, floating point, and Coprocessor Zero(CpO)) Provides its own special purpose CpO instructions for context switching(ex. MTC0 $1 $3) 16 x Channel Processor RISC Cores(CPRCs), each supports up to 4 contexts and 32 GPRs IBM s PowerNP 16 x picoprocessors performing operation codes, each supports 2 contexts 4 threads perform context switch in a cluster 4 categories: 1) ALU opcodes, 2) control opcodes, 3) data movement opcodes, 4) coprocessor execution opcodes(supporting context switching) Context switching occurs when the picoprocessor is waiting for a shared resource (ex. Waiting for one of the coprocessors to complete an operation, access memory, etc)

36 Conclusion and Future work H/W multithreading can hide large latencies effectively, but another issue has come up Aborted cycles occurred by branch and context switch are not small Some dynamic hardware prediction or speculation could be necessary to reduce penalties for future NPs, but should consider cost issue An IXP1200 has achieved OC-24 class router processing, but not enough to perform OC-48 class router processing

37 Backup Slide

38 Instruction Categories Instruction Description Instruction Description Arithmetic,Rotate, and Shift Instructions Reference Instructions alu Perform an alu operation csr Csr reference alu_shf Perform an alu and shift operation fast_wr W rite immediate data to thd_done csrs dbl_shf local_csr_rd, local_csr_wr r_fifo_rd Read and write csrs Read the receive fifo Branch and Jump Instructions pcl_dma Issue a request to the pci unit br, br=0, br!=0, br>=0, br>=0, br<0, br<=0, br>0, br=cout, br!=cout scratch sdram sram t_fifo_wr Scratchpad reference Sdram reference Sram reference W rite to the transmit fifo br_bset, br_bclr Branch on bit set or bit clear Local Register Instructions br=byte, br!=byte Brabch on byte equal find_bset, find_bset_with_mask Determine position number of first bit set in an arbitrary 16-bit field of a register. br=ctx, br!=ctx Branch on current context immed Load immediate word and sign extend or zero fill with shift. br_inp_state Branch on event state (e,g.,sram done). immed_bo, immed_b1, immed_b2, immed_b3 Load immediate byte to a field. br_!signal Branch if signal deasserted immed_wo, immed_w1 Load immediate word to a field. jum p Jump to label ld_field, ld_field_w_clr Load byte(s) into specified field(s). rtn Return from a branch or a jump load_addr Load instruction address. Miscellaneous Instructions Load the result of a find_bset or load_bset_result1, load_bset_result2 ctx_arb Perform context swap and wake on event. find_bset_with_mask instruction. nop Perform no operation. hash1_48, hash2_48, hash3_48 hash1_64, hash2_64, hash3_64 Concatenate two longwords, shift the result, and save a longword. Branch on condition code Perform 48-bit hash. Perform 64-bit hash.

39 SRAM Latency (locked) 100 cumulative percentage Microengine0 Microengine1 Microengine2 Microengine3 Microengine4 Microengine cycles

40 FBI Architecture AMBA (Core) Command Bus Microengine Command Bus From SDRAM Push and Pull Engine Arbiters fast _ wr TFIFO 16 elements (10 quadwords each) CSRs Pull Engine TFIFO Rd CRS/Scratch Hash Rd Pull Command From SRAM Microengine Write Transfer Register 8 command Pull Queue 8 command Hash Queue 8 command Push Queue IX Bus Interface 1k x 32 Scratchpad Hash Unit RFIFO 16 elements (10 quadwords each) Push Engine CRS/Scratch Hash Return Push command RFIFO To SRAM Microengine Read Transfer Register To SDRAM Ready Bus Sequencer Transmit State Machine Receive State Machine IX Bus Arbiter 64-bit IX Bus Ready Bus

41 Ready Bus and Ready Flags

42 Theoretical IP Throughput Media 100Mbps Ethernet 64-byte PPS (46-byte IP packet) 594-byte PPS (576-byte IP packet) 1518-byte PPS (1500-byte IP packet) Mixture (avg 406-byte) PPS (avg 388-byte IP packet) 148,810 20,358 8,127 29,343 Gigabit Ethernet 1,488, ,583 81, ,427 10Gigabit Ethernet 14,880,952 2,035, ,744 2,934,272 OC-3 POS CRC ,491 31,681 12,256 46,759 OC-12 POS CRC-16 1,412, ,439 49, ,570 OC-24 POS CRC-16 2,825, ,878 99, ,139 OC-48 POS CRC-16 5,651, , , ,278 OC-192 POS CRC-16 22,605,283 2,055, ,010 3,033,114 OC-3 POS CRC ,818 31,573 12,240 46,524 OC-12 POS CRC-32 1,361, ,000 49, ,615 OC-24 POS CRC-32 2,722, ,000 99, ,229 OC-48 POS CRC-32 5,445, , , ,458 OC-192 POS CRC-32 21,783,273 2,048, ,956 3,017,834 ATM OC-3 174,245 26,807 10,890 38,721 ATM OC , ,679 44, ,981 ATM OC-24 1,412, ,358 88, ,962 ATM OC-48 2,825, , , ,925 ATM OC ,302,642 1,738, ,415 2,511,698

43 NetVortex extended Instruction set Instruction MYCX POSTCX CSW LW.CSW LT.CSW WD WD.CSW WDLW.CSW WDLT.CSW SETI CLRI EXTIV INSV ACS2 MFCXG MTCXG MFCXC MTCXC Description Context-Control Instructions Read my context Post event to a context Context Switch Load word with context switch Load twinword* with context switch Write descriptor to device Write descriptor to device with context switch Write descriptor to device,load word with context switch Write descriptor to device,load twinword with context switch Bit-Field Instructions Set subfield to ones Clear subfield to zeroes Extract subfield and prepare for insertion Insert extracted subfield Dual 16-bit ones complement add for checksum Cross-Context Access Instructions Move from a context general-purpose register Move to a context general-purpose register Move from a context-control register Move to a context-control register

44 NetVortex Context Switch Mechanism Thread 1 Program I1(T1): I2(T1): LW.CSW (reg, addr) I3(T1): Delay slot instruction I4(T1): Next instruction I5(T1): Context Switch to Thread 2 Thread 2 Program I1(T1): I2(T1): LW.CSW (reg, addr) I3(T1): Delay slot instruction I4(T1): Next instruction I5(T1): Context Switch to next available thread General Purpose Register File Thread Context 1 (r0 - r31) Thread Context 2 (r0 - r31) Context Registers Thread1 CXPC = I4(T1) Thread1 CXSTATUS = Wait Thread2 CXPC = PC Thread2 CXSTATUS = Active PC = I1(T2)

45 PowerNP Context Switch Example IF Reduction_OR(mask16(i) = coprocessr. Busy(i))THEN PC <= stall ELSE PC <=PC +1 END IF IF p=1 THEN Priority Over(other thread)<= TRUE ELSE PriorityOwner(Other thread)<= PriorityOwner(Other thread) END IF;

Intel IXP1200 Network Processor Family

Intel IXP1200 Network Processor Family Intel IXP1200 Network Processor Family Hardware Reference Manual December 2001 Part Number: 278303-009 Revision History Revision Date Revision Description 8/30/99 001 Beta 1 release. 10/29/99 002 Beta

More information

INF5060: Multimedia data communication using network processors Memory

INF5060: Multimedia data communication using network processors Memory INF5060: Multimedia data communication using network processors Memory 10/9-2004 Overview!Memory on the IXP cards!kinds of memory!its features!its accessibility!microengine assembler!memory management

More information

Network Processors. Douglas Comer. Computer Science Department Purdue University 250 N. University Street West Lafayette, IN

Network Processors. Douglas Comer. Computer Science Department Purdue University 250 N. University Street West Lafayette, IN Network Processors Douglas Comer Computer Science Department Purdue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purdue.edu/people/comer Copyright 2003. All rights reserved.

More information

Topic & Scope. Content: The course gives

Topic & Scope. Content: The course gives Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors

More information

Road Map. Road Map. Motivation (Cont.) Motivation. Intel IXA 2400 NP Architecture. Performance of Embedded System Application on Network Processor

Road Map. Road Map. Motivation (Cont.) Motivation. Intel IXA 2400 NP Architecture. Performance of Embedded System Application on Network Processor Performance of Embedded System Application on Network Processor 2006 Spring Directed Study Project Danhua Guo University of California, Riverside dguo@cs.ucr.edu 06-07 07-2006 Motivation NP Overview Programmability

More information

Introduction to Network Processors: Building Block for Programmable High- Speed Networks. Example: Intel IXA

Introduction to Network Processors: Building Block for Programmable High- Speed Networks. Example: Intel IXA Introduction to Network Processors: Building Block for Programmable High- Speed Networks Example: Intel IXA Shiv Kalyanaraman Yong Xia (TA) shivkuma@ecse.rpi.edu http://www.ecse.rpi.edu/homepages/shivkuma

More information

Intel IXP1250 Network Processor

Intel IXP1250 Network Processor Product Features Datasheet The Intel IXP1250 Network Processor delivers high-performance processing power and flexibility to a wide variety of LAN and telecommunications products. Distinguishing features

More information

IBM Network Processor, Development Environment and LHCb Software

IBM Network Processor, Development Environment and LHCb Software IBM Network Processor, Development Environment and LHCb Software LHCb Readout Unit Internal Review July 24 th 2001 Niko Neufeld, CERN 1 Outline IBM NP4GS3 Architecture A Readout Unit based on the NP4GS3

More information

Advanced IXP1200 Microengine Programming

Advanced IXP1200 Microengine Programming Advanced IXP1200 Microengine Programming IXP1200 Network Processor IDF Spring 2001 Corporation Copyright 2001 Corporation. Agenda! Intro Programming Assignment! What is the IXP Macro Library?! Some Macro

More information

Network Processors Outline

Network Processors Outline High-Performance Networking The University of Kansas EECS 881 James P.G. Sterbenz Department of Electrical Engineering & Computer Science Information Technology & Telecommunications Research Center The

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Scheduling Computations on a Software-Based Router

Scheduling Computations on a Software-Based Router Scheduling Computations on a Software-Based Router ECE 697J November 19 th, 2002 ECE 697J 1 Processor Scheduling How is scheduling done on a workstation? What does the operating system do? What is the

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

Commercial Network Processors

Commercial Network Processors Commercial Network Processors ECE 697J December 5 th, 2002 ECE 697J 1 AMCC np7250 Network Processor Presenter: Jinghua Hu ECE 697J 2 AMCC np7250 Released in April 2001 Packet and cell processing Full-duplex

More information

NEPSIM: A NETWORK PROCESSOR SIMULATOR WITH A POWER EVALUATION FRAMEWORK

NEPSIM: A NETWORK PROCESSOR SIMULATOR WITH A POWER EVALUATION FRAMEWORK NEPSIM: A NETWORK PROCESSOR SIMULATOR WITH A POWER EVALUATION FRAMEWORK THIS OPEN-SOURCE INTEGRATED SIMULATION INFRASTRUCTURE CONTAINS A CYCLE-ACCURATE SIMULATOR FOR A TYPICAL NETWORK PROCESSOR ARCHITECTURE,

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

ECE 650 Systems Programming & Engineering. Spring 2018

ECE 650 Systems Programming & Engineering. Spring 2018 ECE 650 Systems Programming & Engineering Spring 2018 Networking Transport Layer Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) TCP/IP Model 2 Transport Layer Problem solved:

More information

Mapping Stream based Applications to an Intel IXP Network Processor using Compaan

Mapping Stream based Applications to an Intel IXP Network Processor using Compaan Mapping Stream based Applications to an Intel IXP Network Processor using Compaan Sjoerd Meijer (PhD Student) University Leiden, LIACS smeijer@liacs.nl Outline Need for multi-processor platforms Problem:

More information

A Fast Network Processor Performance Analysis Methodology. Abstract

A Fast Network Processor Performance Analysis Methodology. Abstract A Fast Network Processor Performance Analysis Methodology Hao Che, Chethan Kumar, and Basavaraj Menasinahal Department of Computer Science and Engineering University of Texas at Arlington (hche@cse.uta.edu,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Chapter 20 Network Layer: Internet Protocol 20.1

Chapter 20 Network Layer: Internet Protocol 20.1 Chapter 20 Network Layer: Internet Protocol 20.1 Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 20-1 INTERNETWORKING In this section, we discuss internetworking,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CSE398: Network Systems Design

CSE398: Network Systems Design CSE398: Network Systems Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University April 04, 2005 Outline Recap

More information

Design and Implementation of an Emulator for the Intel IXP 2400 Network Processor

Design and Implementation of an Emulator for the Intel IXP 2400 Network Processor Semester Thesis Design and Implementation of an Emulator for the Intel IXP 2400 Network Processor Author Christian Kummer supervising assistant Lukas Ruf supervising professor Prof. Bernhard Plattner SA-2006-17

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Towards Effective Packet Classification J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Outline Algorithm Study Understanding Packet Classification Worst-case Complexity

More information

Design Space Exploration of Network Processor Architectures

Design Space Exploration of Network Processor Architectures Design Space Exploration of Network Processor Architectures ECE 697J December 3 rd, 2002 ECE 697J 1 Introduction Network processor architectures have many choices Number of processors Size of memory Type

More information

PowerPC 740 and 750

PowerPC 740 and 750 368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order

More information

The Impact of Parallel and Multithread Mechanism on Network Processor Performance

The Impact of Parallel and Multithread Mechanism on Network Processor Performance The Impact of Parallel and Multithread Mechanism on Network Processor Performance Chunqing Wu Xiangquan Shi Xuejun Yang Jinshu Su Computer School, National University of Defense Technolog,Changsha, HuNan,

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

INT G bit TCP Offload Engine SOC

INT G bit TCP Offload Engine SOC INT 10011 10 G bit TCP Offload Engine SOC Product brief, features and benefits summary: Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx/Altera FPGAs or Structured ASIC flow.

More information

Switches and Routers. Switches and Routers

Switches and Routers. Switches and Routers 1. Introduction 2. Fundamentals and design principles 3. Network architecture and topology 4. Network control and signalling 5. Network components 5.1 s 5.2 switches and routers 6. End systems 7. End-to-end

More information

Link Layer and Ethernet

Link Layer and Ethernet Link Layer and Ethernet 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross traceroute Data Link Layer Multiple

More information

Link Layer and Ethernet

Link Layer and Ethernet Link Layer and Ethernet 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross traceroute Data Link Layer Multiple

More information

NAVAL POSTGRADUATE SCHOOL THESIS

NAVAL POSTGRADUATE SCHOOL THESIS NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS UTILIZING IXP1200 HARDWARE AND SOFTWARE FOR PACKET FILTERING by Jeffery L. Lindholm December 2004 Thesis Advisor: Co-Advisor: Su Wen John Gibson Approved

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

440GX Application Note

440GX Application Note Overview of TCP/IP Acceleration Hardware January 22, 2008 Introduction Modern interconnect technology offers Gigabit/second (Gb/s) speed that has shifted the bottleneck in communication from the physical

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Design and Evaluation of Diffserv Functionalities in the MPLS Edge Router Architecture

Design and Evaluation of Diffserv Functionalities in the MPLS Edge Router Architecture Design and Evaluation of Diffserv Functionalities in the MPLS Edge Router Architecture Wei-Chu Lai, Kuo-Ching Wu, and Ting-Chao Hou* Center for Telecommunication Research and Department of Electrical Engineering

More information

The Network Processor Revolution

The Network Processor Revolution The Network Processor Revolution Fast Pattern Matching and Routing at OC-48 David Kramer Senior Design/Architect Market Segments Optical Mux Optical Core DWDM Ring OC 192 to OC 768 Optical Mux Carrier

More information

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions

Page 1. Structure of von Nuemann machine. Instruction Set - the type of Instructions Structure of von Nuemann machine Arithmetic and Logic Unit Input Output Equipment Main Memory Program Control Unit 1 1 Instruction Set - the type of Instructions Arithmetic + Logical (ADD, SUB, MULT, DIV,

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Cisco IOS Switching Paths Overview

Cisco IOS Switching Paths Overview This chapter describes switching paths that can be configured on Cisco IOS devices. It contains the following sections: Basic Router Platform Architecture and Processes Basic Switching Paths Features That

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Topics C-Ware TM Software Toolset release timeline C-Ware TM Tools Overview C-Ware TM Applications Library Overview

Topics C-Ware TM Software Toolset release timeline C-Ware TM Tools Overview C-Ware TM Applications Library Overview C-Port Family C-Ware Software Toolset CST Overview (PUBLIC) Off. All other product or service names are the property of their respective owners. Motorola, Inc. 2001. All rights reserved. Topics C-Ware

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Fujitsu SOC Fujitsu Microelectronics America, Inc.

Fujitsu SOC Fujitsu Microelectronics America, Inc. Fujitsu SOC 1 Overview Fujitsu SOC The Fujitsu Advantage Fujitsu Solution Platform IPWare Library Example of SOC Engagement Model Methodology and Tools 2 SDRAM Raptor AHB IP Controller Flas h DM A Controller

More information

CS 152, Spring 2011 Section 2

CS 152, Spring 2011 Section 2 CS 152, Spring 2011 Section 2 Christopher Celio University of California, Berkeley About Me Christopher Celio celio @ eecs Office Hours: Tuesday 1-2pm, 751 Soda Agenda Q&A on HW1, Lab 1 Pipelining Questions

More information

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info. A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment

More information

CONTACT: ,

CONTACT: , S.N0 Project Title Year of publication of IEEE base paper 1 Design of a high security Sha-3 keccak algorithm 2012 2 Error correcting unordered codes for asynchronous communication 2012 3 Low power multipliers

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

Hierarchically Aggregated Fair Queueing (HAFQ) for Per-flow Fair Bandwidth Allocation in High Speed Networks

Hierarchically Aggregated Fair Queueing (HAFQ) for Per-flow Fair Bandwidth Allocation in High Speed Networks Hierarchically Aggregated Fair Queueing () for Per-flow Fair Bandwidth Allocation in High Speed Networks Ichinoshin Maki, Hideyuki Shimonishi, Tutomu Murase, Masayuki Murata, Hideo Miyahara Graduate School

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

DiffServ over Network Processors: Implementation and Evaluation

DiffServ over Network Processors: Implementation and Evaluation DiffServ over Network Processors: Implementation and Evaluation Authors:Ying-Dar Lin, Yi-Neng, Shun-Chin Yang, and Yu-Shen Lin Speaker: Yi-Neng Lin Department of Computer and Information Science National

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

Your favorite blog :www.vijay-jotani.weebly.com (popularly known as VIJAY JOTANI S BLOG..now in facebook.join ON FB VIJAY

Your favorite blog :www.vijay-jotani.weebly.com (popularly known as VIJAY JOTANI S BLOG..now in facebook.join ON FB VIJAY VISIT: Course Code : MCS-042 Course Title : Data Communication and Computer Network Assignment Number : MCA (4)/042/Assign/2014-15 Maximum Marks : 100 Weightage : 25% Last Dates for Submission : 15 th

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

IBM POWER8 100 GigE Adapter Best Practices

IBM POWER8 100 GigE Adapter Best Practices Introduction IBM POWER8 100 GigE Adapter Best Practices With higher network speeds in new network adapters, achieving peak performance requires careful tuning of the adapters and workloads using them.

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

Summary of MAC protocols

Summary of MAC protocols Summary of MAC protocols What do you do with a shared media? Channel Partitioning, by time, frequency or code Time Division, Code Division, Frequency Division Random partitioning (dynamic) ALOHA, S-ALOHA,

More information

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki Communications and Computer Engineering II: Microprocessor 2: Processor Micro-Architecture Lecturer : Tsuyoshi Isshiki Dept. Communications and Computer Engineering, Tokyo Institute of Technology isshiki@ict.e.titech.ac.jp

More information

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU Structure and Function Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition CPU must: CPU Function Fetch instructions Interpret/decode instructions Fetch data Process data

More information

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

ECE4110 Internetwork Programming. Introduction and Overview

ECE4110 Internetwork Programming. Introduction and Overview ECE4110 Internetwork Programming Introduction and Overview 1 EXAMPLE GENERAL NETWORK ALGORITHM Listen to wire Are signals detected Detect a preamble Yes Read Destination Address No data carrying or noise?

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

TCP/IP Performance ITL

TCP/IP Performance ITL TCP/IP Performance ITL Protocol Overview E-Mail HTTP (WWW) Remote Login File Transfer TCP UDP IP ICMP ARP RARP (Auxiliary Services) Ethernet, X.25, HDLC etc. ATM 4/30/2002 Hans Kruse & Shawn Ostermann,

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =

More information

PacketExpert PDF Report Details

PacketExpert PDF Report Details PacketExpert PDF Report Details July 2013 GL Communications Inc. 818 West Diamond Avenue - Third Floor Gaithersburg, MD 20878 Phone: 301-670-4784 Fax: 301-670-9187 Web page: http://www.gl.com/ E-mail:

More information

ATM-DB Firmware Specification E. Hazen Updated January 4, 2007

ATM-DB Firmware Specification E. Hazen Updated January 4, 2007 ATM-DB Firmware Specification E. Hazen Updated January 4, 2007 This document describes the firmware operation of the Ethernet Daughterboard for the ATM for Super- K (ATM-DB). The daughterboard is controlled

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as

More information

The Tofu Interconnect 2

The Tofu Interconnect 2 The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect

More information

Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx

Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx Microprogram Control Practice Problems (Con t) The following microinstructions are supported by each CW in the CS: RR ALU opx RA Rx RB Rx RB IR(adr) Rx RR Rx MDR MDR RR MDR Rx MAR IR(adr) MAR Rx PC IR(adr)

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Advanced Network Design

Advanced Network Design Advanced Network Design Organization Whoami, Book, Wikipedia www.cs.uchicago.edu/~nugent/cspp54015 Grading Homework/project: 60% Midterm: 15% Final: 20% Class participation: 5% Interdisciplinary Course

More information

INT 1011 TCP Offload Engine (Full Offload)

INT 1011 TCP Offload Engine (Full Offload) INT 1011 TCP Offload Engine (Full Offload) Product brief, features and benefits summary Provides lowest Latency and highest bandwidth. Highly customizable hardware IP block. Easily portable to ASIC flow,

More information

Data Link Layer. Our goals: understand principles behind data link layer services: instantiation and implementation of various link layer technologies

Data Link Layer. Our goals: understand principles behind data link layer services: instantiation and implementation of various link layer technologies Data Link Layer Our goals: understand principles behind data link layer services: link layer addressing instantiation and implementation of various link layer technologies 1 Outline Introduction and services

More information

Design Space Exploration for Memory Subsystems of VLIW Architectures

Design Space Exploration for Memory Subsystems of VLIW Architectures E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information