Computer Organization

Similar documents
Computer Organization

Message Transport With The User Datagram Protocol

MODULE VII. Emerging Technologies

Chapter 9 Memory Management

You Can Do That. Unit 16. Motivation. Computer Organization. Computer Organization Design of a Simple Processor. Now that you have some understanding

MODULE V. Internetworking: Concepts, Addressing, Architecture, Protocols, Datagram Processing, Transport-Layer Protocols, And End-To-End Services

Topics. Computer Networks and Internets -- Module 5 2 Spring, Copyright All rights reserved.

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Coupling the User Interfaces of a Multiuser Program

Loop Scheduling and Partitions for Hiding Memory Latencies

Protocol Configuration

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions.

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Performance Evaluation of a High Precision Software-based Timestamping Solution for Network Monitoring

PART 5. Process Coordination And Synchronization

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots

PART 2. Organization Of An Operating System

Lecture 1 September 4, 2013

Recitation Caches and Blocking. 4 March 2019

Baring it all to Software: The Raw Machine

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

MODULE II. Network Programming And Applications

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

SE-330 SERIES (NEW REVISION) IEC INTERFACE

Dynamic Capacity Allocation in OTN Networks

CS 3510 Comp&Net Arch

Impact of FTP Application file size and TCP Variants on MANET Protocols Performance

Study of Network Optimization Method Based on ACL

CordEx. >> Operating instructions. English

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Demystifying Automata Processing: GPUs, FPGAs or Micron s AP?

Supporting Fully Adaptive Routing in InfiniBand Networks

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Professor Lee, Yong Surk. References 고성능마이크로프로세서구조의개요. Topics Microprocessor & microcontroller

More Raster Line Issues. Bresenham Circles. Once More: 8-Pt Symmetry. Only 1 Octant Needed. Spring 2013 CS5600

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

Network Address Translation (NAT)

The Unix File System. Ken Wong Washington University. Bsd Unix Kernel I/O Structure. Monolithic OS Device Drivers (1) File Systems (CSE 422S)

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

ECEC 355: Pipelining

Problem 1. Multiple choice questions. Circle the answer that is the most correct.

NAND flash memory is widely used as a storage

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks

3 INSTRUCTION FLOW. Overview. Figure 3-0. Table 3-0. Listing 3-0.

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Laboratory I.7 Linking Up with the Chain Rule

A Duality Based Approach for Realtime TV-L 1 Optical Flow

Chapter 11 in Stallings 10 th Edition

This formula shows that partitioning the network decreases the total traffic if 1 N R (1 + p) < N R p < 1, i.e., if not all the packets have to go

An Infrastructureless End-to-End High Performance Mobility Protocol

Experion PKS R500 Migration Planning Guide

An In Depth Look at VOLK

How to Link your Accounts at Morgan Stanley

Just-In-Time Software Pipelining

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

6.823 Computer System Architecture. Problem Set #3 Spring 2002

Application Protocol Examples

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

DATA PARALLEL FPGA WORKLOADS: SOFTWARE VERSUS HARDWARE. Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

Optimal Distributed P2P Streaming under Node Degree Bounds

Algorithm for Intermodal Optimal Multidestination Tour with Dynamic Travel Times

Design of Controller for Crawling to Sitting Behavior of Infants

CS 421: COMPUTER NETWORKS SPRING FINAL May 16, minutes

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links

Compiler Optimisation

Waleed K. Al-Assadi. Anura P. Jayasumana. Yashwant K. Malaiya y. February Colorado State University

ACE: And/Or-parallel Copying-based Execution of Logic Programs

CS 2506 Computer Organization II Test 2

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

6 Using Bookmarks Using Saved Pages Advanced Features

A Hierarchical P2PSIP Architecture to support Skype-like services

Overview : Computer Networking. IEEE MAC Protocol: CSMA/CA Internet mobility TCP over noisy links

ET4254 Communications and Networking 1

Finite Automata Implementations Considering CPU Cache J. Holub

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Real-time concepts for Software/Hardware Engineering

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Voice Mail Information

CSE 392/CS 378: High-performance Computing - Principles and Practice

CS 152 Computer Architecture and Engineering

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

Hardware-based Speculation

Compact Routing on Internet-Like Graphs

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

CMSC 430 Introduction to Compilers. Spring Register Allocation

A closer look at network structure:

Optimizing the quality of scalable video streams on P2P Networks

LAN Overview (part 2) Interconnecting LANs - Hubs

AnyTraffic Labeled Routing

FANUC CNC Parts CANNED CYCLES (G73, G74, G76, G80-G89, G98, G99) (M series)

The Tofu Interconnect 2

Optimal Routing and Scheduling for Deterministic Delay Tolerant Networks

CHAPTER 5 PROPAGATION DELAY

Lecture 6 Datalink Framing, Switching. From Signals to Packets

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks

Comparison of Wireless Network Simulators with Multihop Wireless Network Testbed in Corridor Environment

Estimation of large-amplitude motion and disparity fields: Application to intermediate view reconstruction

Transcription:

Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve. This ocument may not be reprouce by any means without written consent of the author.

XVIII Pipelining CS250 -- Chapt. 18 1 2006

Concept Of Pipelining One of the two major harware optimization techniques Information flows through a series of stations (processing components) Each station can Inspect Interpret Moify CS250 -- Chapt. 18 2 2006

Illustration Of Pipelining information arrives stage 1 stage 2 stage 3 stage 4 information leaves CS250 -- Chapt. 18 3 2006

Characteristics Of Pipelines Harware or software implementation Large or small scale Synchronous or asynchronous flow Buffere or unbuffere flow Finite chunks or continuous bit streams Automatic ata fee or manual ata fee Serial or parallel path Homogeneous or heterogeneous stages CS250 -- Chapt. 18 4 2006

Implementation Pipeline can be implemente in harware or software Software pipeline Programmer convenience More efficient than intermeiate files Harware pipeline Much higher performance CS250 -- Chapt. 18 5 2006

Scale Range of scales Example of small scale: pipeline within an ALU Example of large scale: pipeline compose of programs running on separate computers connecte by the Internet CS250 -- Chapt. 18 6 2006

Synchrony Synchronous pipeline Operates like an assembly line Items move at exactly the same time Asynchronous pipeline Each station forwars whenever it is reay Slow stage may block previous stages CS250 -- Chapt. 18 7 2006

Buffering Buffere flow Buffer place between each pair of stages Useful when processing time per item varies Unbuffere flow Stage blocks until next stage can accept item Works best if processing time per stage is constant CS250 -- Chapt. 18 8 2006

Size Of Items Finite chunks Discrete items pass through pipeline Example: sequence of Ethernet packets Continuous bit stream Stream of bits flows through pipeline Example: vieo fee CS250 -- Chapt. 18 9 2006

Data Fee Mechanism Automatic Built into pipeline Manual Separate harware to move items CS250 -- Chapt. 18 10 2006

With Of Data Path Serial One bit at a time Parallel N bits at a time CS250 -- Chapt. 18 11 2006

Homogeneity Of Stages Homogeneous All stages are the same Example: five ientical processors Heterogeneous Stages can iffer Example: each stage optimize for one function CS250 -- Chapt. 18 12 2006

Software Pipelining Popularize by Unix comman interpreter (shell) User can specify pipeline as a comman Example cat x se s/frien/partner/g more CS250 -- Chapt. 18 13 2006

Software Pipeline Performance An Overhea Consier the pipeline cat x se s/frien/partner/g se /W/ more Substitutes partner for frien Deletes lines that contain W Can be optimize (swap se commans) CS250 -- Chapt. 18 14 2006

Implementation Of Software Pipeline Uniprocessor Each stage is a process or task Multiprocessor Each stage executes on separate processor Harware assist can spee inter-stage ata transfer CS250 -- Chapt. 18 15 2006

Harware Pipelining Two broa categories Instruction pipeline Data pipeline CS250 -- Chapt. 18 16 2006

Covere in Chapter 5 Instruction Pipeline Recall Instruction processe in stages Exact etails an number of stages epen on instruction set an operan types CS250 -- Chapt. 18 17 2006

Data Pipeline Data passes through pipeline Each stage hanles ata item an passes item to next stage Usually requires programmer to ivie coe into stages Among the most interesting uses of pipelining CS250 -- Chapt. 18 18 2006

Harware Pipelining An Performance A ata pipeline can ramatically increase performance (throughput) To see why, consier an example Internet router hanles packets Assume * Router processes one packet at at time * Performs six functions on a packet CS250 -- Chapt. 18 19 2006

Example Of Internet Router Algorithm 1. Receive a packet (i.e., transfer the packet into memory). 2. Verify packet integrity (i.e., verify that no changes occurre between transmission an reception). 3. Check for routing loops (i.e., ecrement a value in the heaer, an reform the heaer with the new value). 4. Route the packet (i.e., use the estination aress fiel to select one of the possible output networks an a estination on that network). 5. Prepare for transmission (i.e. compute information that will be use to verify packet integrity). 6. Transmit the packet (i.e., transfer the packet to the output evice). CS250 -- Chapt. 18 20 2006

Illustration Of A Processor In A Router An The Algorithm Use input from one network processor outputs o forever { Wait to receive packet Verify integrity Check for loops. Route packet Prepare for transmission Enqueue packet for output (a) } (b) CS250 -- Chapt. 18 21 2006

A Pipeline Implementation Of A Processor In A Router packets arrive verify integrity check for loops route packet packets leave prepare for transmission CS250 -- Chapt. 18 22 2006

The Ba News A ata pipeline passes ata through a series of stages that each examine or moify the ata. If it uses the same spee processors as a nonpipeline architecture, a ata pipeline will not improve the overall time neee to process a given ata item. CS250 -- Chapt. 18 23 2006

The Goo News Even if a ata pipeline uses the same spee processors as a nonpipeline architecture, a ata pipeline has higher overall throughput (i.e., number of ata items processe per secon). CS250 -- Chapt. 18 24 2006

Pipelining Can Only Be Use If It is possible to partition processing into inepenent stages Overhea require to move ata from one stage to another is insignificant The slowest stage of the pipeline is faster than a single processor CS250 -- Chapt. 18 25 2006

Unerstaning Pipeline Spee Assume The task is packet processing Processing a packet requires exactly 500 instructions A processor executes 10 instructions per µsec Total time require for one packet is: time = 500 instructions = 50 µsec 10 instr. per µsec Throughput for a non-pipline system is: T np = 1 packet = 50 µsec 1 packet 10 6 = 20,000 packets per secon 50 sec CS250 -- Chapt. 18 26 2006

Unerstaning Pipeline Spee (continue) Suppose the problem can be ivie into four stages an that the stages require: 50 instructions 100 instructions 200 instructions 150 instructions The slowest stage takes 200 instructions So, the time require for the slowest stage is: total time = 200 inst = 20 µsec 10 inst / µsec CS250 -- Chapt. 18 27 2006

Unerstaning Pipeline Spee (continue) Throughput of the pipeline is limite by the slowest stage Overall throughput can be calculate: T p = 1 packet = 20 µsec 1 packet 10 6 = 50,000 packets per secon 20 sec Note: throughput of pipeline version is 150% greater than throughput of the non-pipeline version! CS250 -- Chapt. 18 28 2006

Conceptual Division Of Processing f( ) g( ) h( ) f( ) g( ) h( ) (a) (b) (a) shows a single processor hanling all functions (b) shows processing ivie into a pipeline CS250 -- Chapt. 18 29 2006

Pipeline Architectures Refer to architectures that are primarily forme aroun pipelining Most often use for special-purpose systems Less relevant to general-purpose computers CS250 -- Chapt. 18 30 2006

Pipeline Setup, Stall, An Flush Setup time Refers to time require to start the pipeline initially Stall time Refers to time require to restart the pipeline after a stage blocks to wait for a previous stage Flush time Refers to time that elapses between the cessation of input an the final ata item emerging from the pipeline (i.e., the time require to shut own the pipeline) CS250 -- Chapt. 18 31 2006

Superpipelining Most often use with instruction pipelining Subivies a stage into smaller stages Example: subivie operan processing into Operan ecoe Fetch immeiate value or value from register Fetch value from memory Fetch inirect operan CS250 -- Chapt. 18 32 2006

Summary Pipelining Broa, funamental concept Can be use with harware or software Applies to instructions or ata Can be synchronous or asynchronous Can be buffere or unbuffere CS250 -- Chapt. 18 33 2006

Summary (continue) Pipeline performance Unless faster processors are use, ata pipelining oes not ecrease the overall time require to process a single ata item Using a pipeline oes increase the overall throughput (items processe per secon) The stage of a pipeline that requires the most time to process an item limits the throughput of the pipeline CS250 -- Chapt. 18 34 2006

Questions?