Appendix A.2 (pg. A-21 A-26), Section 4.2, Section 3.4. Performance of Branch Prediction Schemes
|
|
- Lydia Harvey
- 6 years ago
- Views:
Transcription
1 Module: Branch Prediction Krishna V. Palem, Weng Fai Wong, and Sudhakar Yalamanchili, Georgia Institute of Technology (slides contributed by Prof. Weng Fai Wong were prepared while visiting, and employed by, Georgia Tech) Reading for this Module Branch Prediction Appendix A.2 (pg. A-21 A-26), Section 4.2, Section 3.4 Branch Target Buffers Section 3.5 Performance of Branch Prediction Schemes Section 3.8, pg The Trace Cache Section 4.4, pg Additional Ref: E. Rotenberg, S. Bennet, and J. Smith, Trace Cache: A Low Latency Approach to High Bandwidth Instruction ing, 29 th Annual International Symposium on Microarchitecture, Dec ECE 4100/6100 (2)
2 Control Dependencies Processor datapath Dependencies I- Execution Core Retire Structural Data Name Control Control dependencies determine execution order of instructions Instructions may be control dependent on a branch DADD R5, R6, R7 BNE R4, R2, CONTINUE Anti Output DMUL R4, R2, R5 DSUB R4, R9, R5 Goal: Maximize I- bandwidth via branch prediction How do we improve prediction accuracy and reduce branch penalties? ECE 4100/6100 (3) The Problem with Branches Branches introduce pipeline bubbles and therefore performance degradation Two types of branches: unconditional and conditional In addition, some branch instructions calculate the branch target by adding (or subtracting) the PC with some constants, others necessitate reading the register file Most of the branches encountered in program execution belongs to the former ECE 4100/6100 (4)
3 Conditional Branches.. DADD R5, R6, R7 BNE R4, R1, L1.... L1:.. IF ID EX INT EX FP MEM WB instruction issue EX BR For general pipelines, penalties occur because of the timing of Branch target address generation PC-relative address generation can occur after instruction fetch Branch condition resolution What cycle is the condition known? ECE 4100/6100 (5) Handling Branches Branch outcome known IF/ID ID/EX EX/MEM MEM/WB Instruction Decode & Register Read ALU Memory Operation Register Writeback Instruction decoded as a branch BNE R1, R2, Loop Simple solution: stall the pipeline What determines the branch penalty? ECE 4100/6100 (6)
4 Branch bubbles Time Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 Clk7 Clk8 Clk 9 S1: br K1 Instr Decode & Read Reg ALU Memory Operation Reg Writeback S2 Instr Decode & Read Reg ALU Memory Operation Reg Writeback S3 Instr Decode & Read Reg ALU Memory Operation Reg Writeback S4 Instr Decode & Read Reg ALU Memory Operation Reg Writeback K1: Instr Decode & Read Reg ALU Memory Operation Reg Writeback Program Execution Order (in instructions) ECE 4100/6100 (7) Unconditional Branches Should be moved to the earliest possible part of the pipeline the DECODE stage Reduce the number of bubbles inserted However some unconditional branches require register reads e.g. procedure return ECE 4100/6100 (8)
5 Unconditional Branch Time Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 Clk7 Clk8 Clk 9 S1: br K1 Instr Decode & Read Reg ALU Memory Operation Reg Writeback S2 Instr Decode & Read Reg ALU Memory Operation Reg Writeback K1 Instr Decode & Read Reg ALU Memory Operation Reg Writeback Only 1 bubble incurred Program Execution Order (in instructions) ECE 4100/6100 (9) Branch Delay Slots ADD.D F0, F2, F4 ADD.D F0, F2, F4 ADD.D F0, F2, F4 BNEZ R3, L1 BNEZ R3, L1 BNEZ R3, L1 delay slot delay slot delay slot.... L1: L1: ADD.D F6, F8, F10 ADD.D.. F6, F8, F10 L1: ADD.D F16, F18, F14.. BNEZ ADD.D.. L1: R3, L1 F0, F2, F4 Compiler scheduling ADD.D BNEZ ADD.D.. L1:ADD.D L2: F0, F2, F4 R3, L2 F6, F8, F10 F6, F8, F10 ADD.D F0, F2, F4 BNEZ R3, L1 ADD.D F6, F8, F L1: ADD.D F16, F18, F14 Instructions are moved to fill branch delay slots Must account for side effects when branch is mispredicted ECE 4100/6100 (10)
6 Branch Prediction Purpose: to steer the PC as accurately and as early as possible with respect to conditional branches Four possible outcomes in prediction-outcome pairs T/T predicted as taken and branch was indeed taken NT/NT predicted as not taken and branch was not taken T/NT predicted as taken but branch was not taken NT/T predicted as not taken but branch was taken The latter two are the branch misprediction pairs ECE 4100/6100 (11) Branch Prediction Strategies Three major classes of branch prediction strategies Static Semi-static Dynamic ECE 4100/6100 (12)
7 Static Prediction Static prediction are simple, hardwired strategies Two main examples: Always assumed taken Always assumed not taken Very ineffective ECE 4100/6100 (13) Static Branch Prediction EX INT IF ID EX FP MEM WB instruction issue EX BR The total number of stalls can be reduced Performance is very dependent on a priori understanding of branch behavior Based on extensive profiling Based on the instruction opcode (Motorola 8810) Based on the relative offset (IBM RS 6000), e.g., negative offsets ECE 4100/6100 (14)
8 Semi-static Strategies Sometimes, the programmer or the compiler can do a fairly good job in predicting branches A bit in branch instructions indicates if the branch is likely to be taken or not This is sometimes called the Take-Don t Take Bit (TDTB) The DECODE stage can steer instruction fetch accordingly ECE 4100/6100 (15) TDTB in action (1) Time Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 Clk7 Clk8 Clk 9 S1: br K1 (TDTB=1) Instr Decode & Read Reg ALU Memory Operation Reg Writeback Confirm! S2 Instr ALU Memory Operation Reg Writeback K1 Instr Decode & Read Reg ALU Memory Operation Reg Writeback Only 1 bubble incurred Correct prediction with TDTB = 1 ECE 4100/6100 (16)
9 TDTB in action (2) Time Clk1 Clk2 Clk3 Clk4 Clk5 Clk6 Clk7 Clk8 Clk 9 S1: br K1 Instr Decode & Read Reg ALU Memory Operation Reg Writeback S2 Instr Decode & Read Reg ALU Memory Operation Reg Writeback Cancelled! K1 Instr Decode & Read Reg ALU Memory Operation Reg Writeback K2 Instr Decode & Read Reg ALU Memory Operation Reg Writeback S2 Resteer PC Instr Decode & Read Reg ALU Memory Operation Reg Writeback Penalty of 3 cycles ECE 4100/6100 (17) Dynamic Branch Prediction Strategies How do we capture this history? n-1 Shift register 0 Last branch behavior, i.e., taken or not taken How do we predict? prediction From Ref: Modern Processor Design: Fundamentals of Superscalar Processors, J. Shen and M. Lipasti Use past behavior to predict the future Branches show surprisingly good correlation with one another they are not totally random events A general model (shown above) captures history and uses it to make predictions ECE 4100/6100 (18)
10 Schemes based on local history A simple scheme for each branch instruction is to maintain a single T bit. If T = 0, predict current instantiation of the branch as not taken. If T = 1, predict it as taken. When branch is confirmed, set T appropriately to 0 or 1 depending on whether it is confirmed not taken or taken, respectively Works well when last behavior is a good indicator of future behavior ECE 4100/6100 (19) Single-bit Branch Predictor 1-bit predictors Taken / Not Taken Confirmation 0 Prediction Aliasing? 2 K -1 Use K least significant bits of PC PC ECE 4100/6100 (20)
11 Two-Bit Predictors The 1-bit predictor is quite ineffective T Branch is taken Use the result of the last two branches to change the prediction Predict Branch Taken T NT T Predict Branch Taken NT This scheme can be generalized to n-bit counters Predict Branch Not Taken NT T Predict Branch Not Taken NT Branch is not taken ECE 4100/6100 (21) Bimodal Branch Predictor PC is used to index a set of counters rather that branch prediction bits Each counter is an n-bit unsigned integer If branch is confirmed taken, counter is incremented by one. If it is confirmed not taken, counter is decremented by one Counters are saturating if value is zero, decrement operations are ignored; if value is at the maximum, increment operations are ignored ECE 4100/6100 (22)
12 Bimodal Branch Predictor n-bit counters Taken / Not Taken Confirmation 0 Prediction 2 K -1 Use K least significant bits of PC PC ECE 4100/6100 (23) Bimodal Branch Predictor Prediction: If most significant bit is one, predict taken, else predict not taken Can tolerate occasional changes in branch direction Problem of aliasing (two PCs mapping to the same counter) depends on size of table Useful when the branch address is known before the branch condition is known so as to support prefetching ECE 4100/6100 (24)
13 Size and resolution of predictors established empirically Performance Comparison Performance improvement beyond 4K entry buffer sizes is minimal Performance is a function of Accuracy Branch penalties Branch frequency ECE 4100/6100 (25) Branch Prediction Improvements in accuracy beyond counter resolution is required Note: integer programs have higher branch frequency 4K entries with 2-bit predictors ECE 4100/6100 (26)
14 Global Branch Predictor Bimodal predictor does not take other branches into its consideration Global predictor use a shift register (GR) to store history of last k branches Table of counters maintained in the same way as the bimodal predictor When a branch is confirmed, a 1 or 0 (depending on whether the branch was taken or not) is shifted into GR The new value of GR is used as the address of the counter in the next prediction ECE 4100/6100 (27) Basic Idea The shift register captures the path through the program B1 For each unique path a counter is maintained B2 T F B3 Prediction is based on the behavior history of each path B6 111 B4 B6 B5 B7 Shift register length determines program region size 101 ECE 4100/6100 (28)
15 Global Branch Predictor Table of Counters Taken / Not Taken Confirmation Prediction GR ECE 4100/6100 (29) Tackling double loops The global predictor can tackle double loops with short inner loops for (I=0; I<100; I++) for (J=0; J<3; J++)... Conditional Value GR Result -- C1 -- C2 Distinguishes this particular iteration from the others though PC is the same C2 J= taken C2 J= taken C2 J= not taken C taken ECE 4100/6100 (30)
16 The gselect Predictor Table of Counters Taken / Not Taken Confirmation Prediction GR m bits PC m+n bits n bits The PC is used to select from a bag of identical GR patterns overcome aliasing problem in the pure GR global predictor ECE 4100/6100 (31) Another View 4 bit branch address Get corresponding prediction Select predictor 3-bit global history across three branches Capture history for a specific branch instuction Instead of having a predictor for a single branch have a predictor for the most recent history of branch decisions For each branch history sequence, use an n-bit predictor ECE 4100/6100 (32)
17 The gshare Predictor Table of Counters Taken / Not Taken Confirmation Prediction GR n bits n bits XOR n bits PC A further combination (hashing) of the PC and GR. Counters are not distinguishable by PC or GR alone shared ECE 4100/6100 (33) Combining Predictors Idea: combine local with global predictor use the one that is more accurate for a particular branch A separate table to track which predictor is more accurate Can be extended to incorporate a number of different predictors In experiments, shown to be 98% accurate - a variant is used in the Compaq Alpha processor ECE 4100/6100 (34)
18 Multi-level Predictors Specific transitions between states determined accuracy of individual predictors Use predictor 1 Use predictor 2 Use predictor 1 Use predictor 2 Use multiple predictors and chose between them Employ predictors based on local and global information state of the art ECE 4100/6100 (35) The Alpha Multi-Level Predictor 2-bit predictor Global predictor 2-bit selector 12-bit history 4K entries Local branch address Two level local predictor 4K entries 10-bit history (1024) entries 3-bit saturating counters ECE 4100/6100 (36)
19 A Combined Predictor Taken / Not Taken Confirmation P1 P2 P1c-P2c Predictor Selection Table of Counters Use P1 Use P2 GSHARE Predict BIMODAL Predict GR n bits n bits XOR n bits PC ECE 4100/6100 (37) The Combined Predictor The table of counters to determine whether P1 or P2 is to be used is updated as follows: P1 correct? P2 correct? P1c-P2c The P1c-P2c value is added to the counter addressed by PC Flexible and dynamic use of whichever is the more accurate predictor ECE 4100/6100 (38)
20 Combined Predictor Performance Performance of selection between a 2-bit local predictor and gshare ECE 4100/6100 (39) Misprediction Recovery What actions must be taken on a misprediction? Remove predicted instructions Start fetching from the correct branch target(s) What information is necessary to recover from misprediction? Address information for non-predicted branch target address Identification of those instructions that are predicted To be invalidated and prevented from completion Association between predicted instructions and specific branch When that branch is mispredicted then only those instructions must be squashed ECE 4100/6100 (40)
21 Branch Target Buffer A cache that contains three pieces of information: The address of branch instructions The BTB is managed like a cache and the addresses of branch instructions are kept for lookup purpose Branch target address To avoid re-computation of branch target address where possible Prediction statistics Different strategies are possible to maintain this portion of the BTB ECE 4100/6100 (41) Branch Target Buffers PC of prior branches PC of corresponding target PC search Prediction info Hit: use corresponding target address Miss: no action Access in parallel with instruction cache Hit produces the branch target address ECE 4100/6100 (42)
22 Branch Target Buffer Operation Instruction N BTB Hit? Y Instruction Decode N Y N Y Branch? Branch? Normal execution Update BTB: new branch Update BTB: Misprediction recovery Continue: normal execution ECE 4100/6100 (43) Branch Target Buffers: Operation Couple speculative generation of the branch target address with branch prediction Continue to fetch and resolve branch condition Take appropriate action if wrong Any of the preceding history based techniques can be used for branch condition speculation Store prediction information, e.g., n-bit predictors, along with BTB entry Branch folding optimization: store target instruction rather than target address ECE 4100/6100 (44)
23 Return Address Predictors There is a need for prediction mechanisms for indirect jumps, that is for addresses generated at run time such as return addresses Return addresses are pushed onto a stack in the fetch unit If the fetch unit sees a return in its instruction stream, immediate pop return stack and fetch from popped address BTB accuracy can be degraded by calls from multiple locations ECE 4100/6100 (45) An Integrated Solution Branch Prediction RAS address Branch Target Buffer BTB Logic Target address Interleaved access I-Cache Concurrently check whether each entry in an I-cache line is a branch Multi-branch prediction using a variant of global history To CPU Based on the design reported in E. Rotenberg, S. Bennet, and J. Smith, Trace Cache: A Low Latency Approach to High Bandwidth Instruction ing, 29th Annual International Symposium on Microarchitecture, Dec ECE 4100/6100 (46)
24 Further Analysis Even with accurate branch prediction, we must fetch instructions from multiple targets What is the effect on instruction bandwidth and pipeline performance? How can we increase instruction fetch bandwidth to compensate Assume perfect branch prediction Instructions are located at different cache lines B1 B2 B3 B4 B5 B6 Exploit instruction locality + branch prediction! ECE 4100/6100 (47) Challenges to Increasing BW? Pipeline latency BW Instruction alignment I-cache Predicting multiple branches ILP What happens as ILP increases? Impact on memory bandwidth, especially fetch bandwidth? Impact on branch predictor throughput? Impact I-Cache? ECE 4100/6100 (48)
25 Some Branch Statistics From E. Rotenberg, S. Bennet, and J. Smith, Trace Cache: A Low Latency Approach to High Bandwidth Instruction ing, 29th Annual International Symposium on Microarchitecture, Dec ECE 4100/6100 (49) A Program Trace An instruction trace is the actual sequence of executed instructions We can block the trace #instructions/block Treat the trace blocks as units of prediction Based on prediction of individual branches in a block Creating program regions for analysis is a common technique Note that the I-Cache stores a static program description B4 B2 B6 B5 B1 B7 B3 ECE 4100/6100 (50)
26 The Trace Cache: Principle Instruction trace B1 B3 B1 B4 B2 B6 Branch instruction Branch instruction Branch instruction Store recurring sequences of basic blocks These form a contiguous sequence of instructions big basic block Issue multiple instructions from this big basic block high issue rate Trace length determined by (#instructions, #branches) Predict and fetch traces rather than lines in the instruction cache Multiple instructions are issue from the trace ECE 4100/6100 (51) The Trace Cache: The Problem Need to identify multiple blocks in the cache Some form branch target table? Multi-ported instruction cache To fetch from multiple targets Instruction alignment To feed the decoder Most likely will add a pipeline stage after instruction fetch ECE 4100/6100 (52)
27 The Trace Cache Basic Idea: fetch the trace according to a multiple path predictor It compliments a core (standard) fetch unit Trace cache reconstruct trace in parallel Ref: E. Rotenberg, S. Bennet, and J. Smith, Trace Cache: A Low Latency Approach to High Bandwidth Instruction ing, 29 th Annual International Symposium on Microarchitecture, Dec Following figures are from this reference ECE 4100/6100 (53) The Trace Cache trace cache B1 B2 B5 B6 first time through record the trace (instructions) B2 B1 B3 time B4 B5 trace cache B1 B2 B5 B6 B6 B7 second time through access the trace To decoder ECE 4100/6100 (54)
28 The Trace Cache address Trace cache BTB I-cache tag Branch flags Branch mask FT addr Target addr Trace logic n instructions RAS branch predictor n instructions Capture/fill trace history Trace length is determined by Dispatch bandwidth and branch prediction bandwidth Parallel look-up of trace history and instruction cache First address + branch prediction bits to index the cache ECE 4100/6100 (55) Data Structure Valid bit: to indicate if trace is valid Tag: to identify starting address Branch flags: to indicate the branching behavior of the trace ECE 4100/6100 (56)
29 Data Structure Branch mask: to indicate the number of branches in the trace whether the trace ends in a branch Trace fall-through address: next fetch address if the last branch in the trace is predicted as not taken Trace target address: fetch address if last branch is taken ECE 4100/6100 (57) On a hit There is a hit in the trace cache if the fetch address match the tag the branch predictions match the branch flags When there is a hit, instructions come from the trace cache, else they come from the core unit ECE 4100/6100 (58)
30 On a miss Core unit takes over the responsibility of supplying instruction Trace cache uses what is supplied by core fetch unit to build up its cache line ECE 4100/6100 (59) Performance ECE 4100/6100 (60)
31 Performance ECE 4100/6100 (61) The Real Deal: P4 Micro-architecture Front-end Branch Target Buffer 4k entries used when it misses in the trace BTB If no BTB entry is found, use static prediction backward branches predicted taken Static branch prediction uses a threshold Indirect branch predictor ECE 4100/6100 (62) From The Microarchitecture of the Intel Pentium4 Processor, Intel Technology Journal, February 2004 & February 2001
32 P4 Microarchitecture Front-end BTB (4K entries) I-TLB Prefetcher Instruction decoder Trace Cache BTB (512 entries) Trace Cache (12K µops) Up to 3 µops/cycle µop queue Trace Cache 6 µops per trace line (many lines/trace) Has its own branch predictor Has its own 512 entry BTB with a 16 entry return address stack ECE 4100/6100 (63) The Real Deal: Power5* Func/proc returns Predicted targets Two level prediction scheme shared by two threads 1 bimodal + 1 path-correlated prediction 1 to predict which of the first 2 is correct Branch instruction queue (BIQ) Store recovery information for misprediction Retired in program order ECE 4100/6100 (64) R. Kalla, B. Sinharoy, J. Tendler, IBM POWER5 CHIP: A Dual-Core Multithreaded Processor, IEEE Micro, March/April, 2004
33 Power5 Execution Pipeline Threads share IF-IC-BP High branch prediction throughput: all branches can be predicted R. Kalla, B. Sinharoy, J. Tendler, IBM POWER5 CHIP: A Dual-Core Multithreaded Processor, IEEE Micro, March/April, 2004 ECE 4100/6100 (65) Some Research Questions? Quality of branch prediction Improving branch predictor throughput Power efficiency of branch prediction logic Speculative execution Is the focus moving up a level to multi-core, manycore, any core? Has the era of ILP stabilized? ECE 4100/6100 (66)
34 Concluding Remarks Handling control flow is a challenge to keeping the execution core fed Prediction and recovery mechanisms key to keeping the pipeline active avoid performance degradation Superscalar datapaths provide increased pressure pushing for better, more innovative techniques to keep pace with technology-enabled appetite for instruction level parallelism What next? ECE 4100/6100 (67) Study Guide What are the basic approaches to branch prediction? What properties of the program does each predictor rely upon? Given a predictor, describe program structures/behaviors for which this predictor will work well or work poorly. Behavior of branch predictors Given a program trace (including taken/not taken conditions of branches), be able to trace through the states of a predictor Behavior of the BTB Given a program trace, be able to show the BTB contents at each point in the fetch sequence Trace pipeline operation on a BTB miss ECE 4100/6100 (68)
35 Study Guide What is a trace cache? Basic operation What are the properties of programs for which a trace cache is a good solution? Compare and contrast each type of branch predictor Given a set of program behaviors/statistics design a branch prediction strategy and implementation ECE 4100/6100 (69)
Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction
ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch
More informationStatic Branch Prediction
Static Branch Prediction Branch prediction schemes can be classified into static and dynamic schemes. Static methods are usually carried out by the compiler. They are static because the prediction is already
More informationHY425 Lecture 05: Branch Prediction
HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware
More informationControl Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.
Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation
More informationLooking for Instruction Level Parallelism (ILP) Branch Prediction. Branch Prediction. Importance of Branch Prediction
Looking for Instruction Level Parallelism (ILP) Branch Prediction We want to identify and exploit ILP instructions that can potentially be executed at the same time. Branches are 15-20% of instructions
More informationInstruction Fetch and Branch Prediction. CprE 581 Computer Systems Architecture Readings: Textbook (4 th ed 2.3, 2.9); (5 th ed 3.
Instruction Fetch and Branch Prediction CprE 581 Computer Systems Architecture Readings: Textbook (4 th ed 2.3, 2.9); (5 th ed 3.3) 1 Frontend and Backend Feedback: - Prediction correct or not, update
More informationInstruction Level Parallelism (Branch Prediction)
Instruction Level Parallelism (Branch Prediction) Branch Types Type Direction at fetch time Number of possible next fetch addresses? When is next fetch address resolved? Conditional Unknown 2 Execution
More informationLooking for Instruction Level Parallelism (ILP) Branch Prediction. Branch Prediction. Importance of Branch Prediction
Looking for Instruction Level Parallelism (ILP) Branch Prediction We want to identify and exploit ILP instructions that can potentially be executed at the same time. Branches are 5-20% of instructions
More informationDynamic Branch Prediction
#1 lec # 6 Fall 2002 9-25-2002 Dynamic Branch Prediction Dynamic branch prediction schemes are different from static mechanisms because they use the run-time behavior of branches to make predictions. Usually
More informationBranch prediction ( 3.3) Dynamic Branch Prediction
prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to
More informationBranch Prediction Chapter 3
1 Branch Prediction Chapter 3 2 More on Dependencies We will now look at further techniques to deal with dependencies which negatively affect ILP in program code. A dependency may be overcome in two ways:
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationComputer Architecture: Branch Prediction. Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Branch Prediction Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture These slides are partly from 18-447 Spring 2013, Computer Architecture, Lecture 11: Branch Prediction
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More informationInstruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov
Instruction-Level Parallelism and Its Exploitation (Part III) ECE 154B Dmitri Strukov Dealing With Control Hazards Simplest solution to stall pipeline until branch is resolved and target address is calculated
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationControl Hazards. Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationWide Instruction Fetch
Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table
More informationControl Hazards. Branch Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationStatic, multiple-issue (superscaler) pipelines
Static, multiple-issue (superscaler) pipelines Start more than one instruction in the same cycle Instruction Register file EX + MEM + WB PC Instruction Register file EX + MEM + WB 79 A static two-issue
More informationChapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST
Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationDynamic Hardware Prediction. Basic Branch Prediction Buffers. N-bit Branch Prediction Buffers
Dynamic Hardware Prediction Importance of control dependences Branches and jumps are frequent Limiting factor as ILP increases (Amdahl s law) Schemes to attack control dependences Static Basic (stall the
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationChapter 3 (CONT) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,2013 1
Chapter 3 (CONT) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007,2013 1 Dynamic Hardware Branch Prediction Control hazards are sources of losses, especially for processors
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More information1993. (BP-2) (BP-5, BP-10) (BP-6, BP-10) (BP-7, BP-10) YAGS (BP-10) EECC722
Dynamic Branch Prediction Dynamic branch prediction schemes run-time behavior of branches to make predictions. Usually information about outcomes of previous occurrences of branches are used to predict
More informationBranch statistics. 66% forward (i.e., slightly over 50% of total branches). Most often Not Taken 33% backward. Almost all Taken
Branch statistics Branches occur every 4-7 instructions on average in integer programs, commercial and desktop applications; somewhat less frequently in scientific ones Unconditional branches : 20% (of
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationLecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2
Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More information6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU
1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high
More informationLecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism
Lecture 8: Compiling for ILP and Branch Prediction Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 Advanced pipelining and instruction level parallelism
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II
CS252 Spring 2017 Graduate Computer Architecture Lecture 8: Advanced Out-of-Order Superscalar Designs Part II Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time
More informationLecture 12 Branch Prediction and Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars Krste Asanovic Electrical Engineering and Computer
More informationAs the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.
Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction
More informationCOSC 6385 Computer Architecture Dynamic Branch Prediction
COSC 6385 Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 208 Pipelining Pipelining allows for overlapping the execution of instructions Limitations on the (pipelined) execution of
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationDetermined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version
MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 13: Branch Prediction
S 09 L13-1 18-447 Lecture 13: Branch Prediction James C. Hoe Dept of ECE, CMU March 4, 2009 Announcements: Spring break!! Spring break next week!! Project 2 due the week after spring break HW3 due Monday
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationCOSC 6385 Computer Architecture. Instruction Level Parallelism
COSC 6385 Computer Architecture Instruction Level Parallelism Spring 2013 Instruction Level Parallelism Pipelining allows for overlapping the execution of instructions Limitations on the (pipelined) execution
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationPipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Pipeline Hazards Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hazards What are hazards? Situations that prevent starting the next instruction
More informationSuperscalar Processors Ch 14
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationSuperscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?
Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationComputer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationLecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )
Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections 3.4-3.5, 3.8-3.14) 1 1-Bit Prediction For each branch, keep track of what happened last time and use
More informationHandout 2 ILP: Part B
Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP
More informationDesign of Digital Circuits Lecture 18: Branch Prediction. Prof. Onur Mutlu ETH Zurich Spring May 2018
Design of Digital Circuits Lecture 18: Branch Prediction Prof. Onur Mutlu ETH Zurich Spring 2018 3 May 2018 Agenda for Today & Next Few Lectures Single-cycle Microarchitectures Multi-cycle and Microprogrammed
More informationInstruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties
Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,
More informationLecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationFall 2011 Prof. Hyesoon Kim
Fall 2011 Prof. Hyesoon Kim 1 1.. 1 0 2bc 2bc BHR XOR index 2bc 0x809000 PC 2bc McFarling 93 Predictor size: 2^(history length)*2bit predict_func(pc, actual_dir) { index = pc xor BHR taken = 2bit_counters[index]
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationThe Processor: Improving the performance - Control Hazards
The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary
More informationLecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections 2.2-2.6) 1 Predication A branch within a loop can be problematic to schedule Control
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationChapter 4. The Processor
Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations Determined by ISA
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationSISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:
SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs
More informationThe Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
The Processor (3) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationCS / ECE 6810 Midterm Exam - Oct 21st 2008
Name and ID: CS / ECE 6810 Midterm Exam - Oct 21st 2008 Notes: This is an open notes and open book exam. If necessary, make reasonable assumptions and clearly state them. The only clarifications you may
More informationChapter 4. The Processor
Chapter 4 The Processor 1 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationBranch Prediction & Speculative Execution. Branch Penalties in Modern Pipelines
6.823, L15--1 Branch Prediction & Speculative Execution Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L15--2 Branch Penalties in Modern Pipelines UltraSPARC-III
More informationLecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3) 1 Support for Speculation In general, when we re-order instructions, register renaming
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationHardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.
Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationEE482: Advanced Computer Organization Lecture #3 Processor Architecture Stanford University Monday, 8 May Branch Prediction
EE482: Advanced Computer Organization Lecture #3 Processor Architecture Stanford University Monday, 8 May 2000 Lecture #3: Wednesday, 5 April 2000 Lecturer: Mattan Erez Scribe: Mahesh Madhav Branch Prediction
More informationCPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor
1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More informationTDT 4260 lecture 7 spring semester 2015
1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationCISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions
CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy
More informationEIE/ENE 334 Microprocessors
EIE/ENE 334 Microprocessors Lecture 6: The Processor Week #06/07 : Dejwoot KHAWPARISUTH Adapted from Computer Organization and Design, 4 th Edition, Patterson & Hennessy, 2009, Elsevier (MK) http://webstaff.kmutt.ac.th/~dejwoot.kha/
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationPerformance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.
Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution
More informationCS433 Homework 2 (Chapter 3)
CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the
More informationModule 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.
Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch
More informationPerformance of tournament predictors In the last lecture, we saw the design of the tournament predictor used by the Alpha
Performance of tournament predictors In the last lecture, we saw the design of the tournament predictor used by the Alpha 21264. The Alpha s predictor is very successful. On the SPECfp 95 benchmarks, there
More informationCPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor
Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More information