Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve. This ocument may not be reprouce by any means without written consent of the author.
XVIII Pipelining CS250 -- Chapt. 18 1 2006
Concept Of Pipelining One of the two major harware optimization techniques Information flows through a series of stations (processing components) Each station can Inspect Interpret Moify CS250 -- Chapt. 18 2 2006
Illustration Of Pipelining information arrives stage 1 stage 2 stage 3 stage 4 information leaves CS250 -- Chapt. 18 3 2006
Characteristics Of Pipelines Harware or software implementation Large or small scale Synchronous or asynchronous flow Buffere or unbuffere flow Finite chunks or continuous bit streams Automatic ata fee or manual ata fee Serial or parallel path Homogeneous or heterogeneous stages CS250 -- Chapt. 18 4 2006
Implementation Pipeline can be implemente in harware or software Software pipeline Programmer convenience More efficient than intermeiate files Harware pipeline Much higher performance CS250 -- Chapt. 18 5 2006
Scale Range of scales Example of small scale: pipeline within an ALU Example of large scale: pipeline compose of programs running on separate computers connecte by the Internet CS250 -- Chapt. 18 6 2006
Synchrony Synchronous pipeline Operates like an assembly line Items move at exactly the same time Asynchronous pipeline Each station forwars whenever it is reay Slow stage may block previous stages CS250 -- Chapt. 18 7 2006
Buffering Buffere flow Buffer place between each pair of stages Useful when processing time per item varies Unbuffere flow Stage blocks until next stage can accept item Works best if processing time per stage is constant CS250 -- Chapt. 18 8 2006
Size Of Items Finite chunks Discrete items pass through pipeline Example: sequence of Ethernet packets Continuous bit stream Stream of bits flows through pipeline Example: vieo fee CS250 -- Chapt. 18 9 2006
Data Fee Mechanism Automatic Built into pipeline Manual Separate harware to move items CS250 -- Chapt. 18 10 2006
With Of Data Path Serial One bit at a time Parallel N bits at a time CS250 -- Chapt. 18 11 2006
Homogeneity Of Stages Homogeneous All stages are the same Example: five ientical processors Heterogeneous Stages can iffer Example: each stage optimize for one function CS250 -- Chapt. 18 12 2006
Software Pipelining Popularize by Unix comman interpreter (shell) User can specify pipeline as a comman Example cat x se s/frien/partner/g more CS250 -- Chapt. 18 13 2006
Software Pipeline Performance An Overhea Consier the pipeline cat x se s/frien/partner/g se /W/ more Substitutes partner for frien Deletes lines that contain W Can be optimize (swap se commans) CS250 -- Chapt. 18 14 2006
Implementation Of Software Pipeline Uniprocessor Each stage is a process or task Multiprocessor Each stage executes on separate processor Harware assist can spee inter-stage ata transfer CS250 -- Chapt. 18 15 2006
Harware Pipelining Two broa categories Instruction pipeline Data pipeline CS250 -- Chapt. 18 16 2006
Covere in Chapter 5 Instruction Pipeline Recall Instruction processe in stages Exact etails an number of stages epen on instruction set an operan types CS250 -- Chapt. 18 17 2006
Data Pipeline Data passes through pipeline Each stage hanles ata item an passes item to next stage Usually requires programmer to ivie coe into stages Among the most interesting uses of pipelining CS250 -- Chapt. 18 18 2006
Harware Pipelining An Performance A ata pipeline can ramatically increase performance (throughput) To see why, consier an example Internet router hanles packets Assume * Router processes one packet at at time * Performs six functions on a packet CS250 -- Chapt. 18 19 2006
Example Of Internet Router Algorithm 1. Receive a packet (i.e., transfer the packet into memory). 2. Verify packet integrity (i.e., verify that no changes occurre between transmission an reception). 3. Check for routing loops (i.e., ecrement a value in the heaer, an reform the heaer with the new value). 4. Route the packet (i.e., use the estination aress fiel to select one of the possible output networks an a estination on that network). 5. Prepare for transmission (i.e. compute information that will be use to verify packet integrity). 6. Transmit the packet (i.e., transfer the packet to the output evice). CS250 -- Chapt. 18 20 2006
Illustration Of A Processor In A Router An The Algorithm Use input from one network processor outputs o forever { Wait to receive packet Verify integrity Check for loops. Route packet Prepare for transmission Enqueue packet for output (a) } (b) CS250 -- Chapt. 18 21 2006
A Pipeline Implementation Of A Processor In A Router packets arrive verify integrity check for loops route packet packets leave prepare for transmission CS250 -- Chapt. 18 22 2006
The Ba News A ata pipeline passes ata through a series of stages that each examine or moify the ata. If it uses the same spee processors as a nonpipeline architecture, a ata pipeline will not improve the overall time neee to process a given ata item. CS250 -- Chapt. 18 23 2006
The Goo News Even if a ata pipeline uses the same spee processors as a nonpipeline architecture, a ata pipeline has higher overall throughput (i.e., number of ata items processe per secon). CS250 -- Chapt. 18 24 2006
Pipelining Can Only Be Use If It is possible to partition processing into inepenent stages Overhea require to move ata from one stage to another is insignificant The slowest stage of the pipeline is faster than a single processor CS250 -- Chapt. 18 25 2006
Unerstaning Pipeline Spee Assume The task is packet processing Processing a packet requires exactly 500 instructions A processor executes 10 instructions per µsec Total time require for one packet is: time = 500 instructions = 50 µsec 10 instr. per µsec Throughput for a non-pipline system is: T np = 1 packet = 50 µsec 1 packet 10 6 = 20,000 packets per secon 50 sec CS250 -- Chapt. 18 26 2006
Unerstaning Pipeline Spee (continue) Suppose the problem can be ivie into four stages an that the stages require: 50 instructions 100 instructions 200 instructions 150 instructions The slowest stage takes 200 instructions So, the time require for the slowest stage is: total time = 200 inst = 20 µsec 10 inst / µsec CS250 -- Chapt. 18 27 2006
Unerstaning Pipeline Spee (continue) Throughput of the pipeline is limite by the slowest stage Overall throughput can be calculate: T p = 1 packet = 20 µsec 1 packet 10 6 = 50,000 packets per secon 20 sec Note: throughput of pipeline version is 150% greater than throughput of the non-pipeline version! CS250 -- Chapt. 18 28 2006
Conceptual Division Of Processing f( ) g( ) h( ) f( ) g( ) h( ) (a) (b) (a) shows a single processor hanling all functions (b) shows processing ivie into a pipeline CS250 -- Chapt. 18 29 2006
Pipeline Architectures Refer to architectures that are primarily forme aroun pipelining Most often use for special-purpose systems Less relevant to general-purpose computers CS250 -- Chapt. 18 30 2006
Pipeline Setup, Stall, An Flush Setup time Refers to time require to start the pipeline initially Stall time Refers to time require to restart the pipeline after a stage blocks to wait for a previous stage Flush time Refers to time that elapses between the cessation of input an the final ata item emerging from the pipeline (i.e., the time require to shut own the pipeline) CS250 -- Chapt. 18 31 2006
Superpipelining Most often use with instruction pipelining Subivies a stage into smaller stages Example: subivie operan processing into Operan ecoe Fetch immeiate value or value from register Fetch value from memory Fetch inirect operan CS250 -- Chapt. 18 32 2006
Summary Pipelining Broa, funamental concept Can be use with harware or software Applies to instructions or ata Can be synchronous or asynchronous Can be buffere or unbuffere CS250 -- Chapt. 18 33 2006
Summary (continue) Pipeline performance Unless faster processors are use, ata pipelining oes not ecrease the overall time require to process a single ata item Using a pipeline oes increase the overall throughput (items processe per secon) The stage of a pipeline that requires the most time to process an item limits the throughput of the pipeline CS250 -- Chapt. 18 34 2006
Questions?