Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

Computer Architecture Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

What is Architecture? Instruction Set Design. Old definition from way back when computers had bitslices and microcode. Determine the high level attributes that optimize the performance, cost, power consumption, etc, of a computer. Interact with compiler developers, other software and hardware developers and manufacturers.

Instruction Set Architecture Class of ISA Register-Memory Load-Store Memory Addressing Byte addressing Aligned, non-aligned Addressing modes Register Immediate Displacement

Instruction Set Architecture Types and sizes of operands Byte Short (also Unicode characters) Integers and long integers Floating point (single, double, extended double) Signed, unsigned Operators Data transfer Arithmetic Multimedia

Instruction Set Architecture Control Flow Jumps, branches, procedure calls PC-relative addressing Encoding the ISA Fixed length Variable length VLIW

What Else is Architecture? Organization Choice and interconnection of subsystems Multi-core and hyperthreaded designs Interaction with peripherals Hardware Digital Logic Design Underlying technology

Trends Transistor count goes up by 40-55% per year About 35% due to density increase 10-20% due to the increase of the die DRAM doubles every two years Disk 60-100% per year until 2004 30% since then Flash memory doubles every couple of years Network speed has tenfold jumps every few years

Bandwidth vs Latency Bandwidth: how long it takes to d/l a movie Latency: how long to read a character from the disk/network Both have to do CPU-memory communication Floating point unit Network Audio, recording-playback Disk Graphics

Bandwidth vs Latency In the past 20 years Bandwidth improved 1000-2000X Latency improved 20-40X Bandwidth improves with Paralelism Clock speed Latency improves with Clock speed

Transistors and Wires Transistor density improves quadratically with feature size Transistor speed improves linearly with feature size Wire delays do not change much with feature size Length really depends on die size R*C changes little and speed of light is constant VLSI chips have a lot of wire!

Trends in Power Power is one of the hottest issues Cost of running server farms Cost of running a home computer Battery life Size and weight of heat sink Negative publicity

Trends in Power Power depends on capacitance, voltage and frequency. P d = 1 2 C V 2 f

Trends in Power Voltage does not go down fast enough Went from 5V to 1V in twenty years Power consumption went from less than 1W to over 100W in the same period. Current leakage is another factor in power consumption.

Trends in Cost Learning Curve As time goes by yield increases Volume Economies of scale in purchasing and manufacturing Commodification Increased competition Expired patents

Cost of a Chip Cost of die, testing and packaging Divided by the final test yield Cost of a die is the cost of the wafer divided by the dies per wafer and the die yield. A chip can be rejected in multiple stages, increasing the cost Yield can affect the cost of a chip significantly.

Cost of a Chip The cost is proportional to its size (before testing and possibly rejection) The faults on the wafer are scattered randomly The probability to have a fault on the die increases with its size, so the yield drops The cost of a chip after testing increases much faster than its size (area)

Probabilistic Yield Yield (the probability of a chip being good) is always positive and less than one. Depends on the expected number of defects per chip and a parameter alpha which is a function of the complexity of the process. Y d = 1 N

Probabilistic Yield Every time the number of defects increases by one the yield goes down by 50-60% The moral of the story is that one should keep the die size small.

Fixed Costs Intellectual property Masks Can cost over a million. Fixed costs make small runs unprofitable FPGAs are popular for this reason mainly

Cost vs Price Apple vs Linux The price of a Linux based system is cost plus profit margin The price of an Apple system is cost plus profit plus brand premium Commodity items have no brand premium

Dependability Module reliability: Mean Time To Failure (MTTF) Failures In Time (FIT) Inverse of MTTF Usually quoted as failures per billion hours E.g. If MTTF is a million hours, FIT is 1000 failures per billion hours How is MTTF or FIT estimated?

Dependability Estimate MTTF Put 10,000 disks in a room Run them for a 1000 hours (month and a half) Count the failures If you have 10 failures in 10 million disk-hours 1 failure every million hours of operation A million hours MTTF FIT of 1000 Hidden assumption? There are lies, damn lies and statistics Statistics is the art of deceiving less intelligent people.

Dependability The hidden (unstated) assumption is The probability distribution is exponential To be even more impressive: Disk failures are a Poisson process It means that the failure rate is independent of the age of the disks This is a strong assumption There are many failures in the first weeks Few failures in the next five years Failure rate increases after that

Dependability Mean Time To Repair (MTTR) Mean Time Between Failures (MTBF) MTBF = MTTF + MTTR Availability = MTTF/MTBF

Dependability Example 10 disks, million hours MTTF 1 power supply, 200,000 hours MTTF

Dependability Example 10 disks, million hours MTTF 1 power supply, 200,000 hours MTTF What is the overall MTTF? 1 disk has one millionth failures per hour 10 disks have 10 millionths failures per hour 1 PS has five millionths failures per hour System has 15 millionths failures per hour Overall MTTF is 66,666 hours (7.7 years)

Performance Performance is the inverse of execution time We care for both response time and throughput We care about actual computation time or tasks per unit time Various indicators mean nothing to professionals Forget MIPS, GHz, etc. Do not pay more for multimedia extensions (useless unless your movie editor knows about them)

Measuring Performance Performance is central to architecture Yet, hard to measure! We need a good definition The definition depends on your point of view. Consider this: A web server administrator sleeps peacefully knowing that 90% of the time his servers are lightly loaded Unfortunately, light load means few users, and few users means few witness his great system at its best

Measuring Performance How to deal with I/O It affects response time But if it overlaps with CPU, it does not affect throughput, and maybe not even response time

Measuring Performance We need to test our systems with representative software There is no ideal benchmark Even if one appears, it would soon defeated Optimize hardware/compiler for the benchmark

Measuring Performance We need to test our systems with representative software There is no ideal benchmark Even if one appears, it would soon defeated Optimize hardware/compiler for the benchmark Existing benchmarks are Chimerical Fake Toy Make believe Made up

Benchmarks Real applications Chimerical: portability, multitude of versions, too many of them Modified applications Honest attempt to fake reality Kernels OK to test particular features Toy benchmarks Good choice of name Synthetic benchmarks Their unrealistic nature makes them easy to defeat

Benchmark Suites SPEC is the de facto standard Includes many suites targeting desktops, servers, etc. Most programs in the suite are dropped or modified with every edition of the standard There is always discussion on how to weight the results

Quantitative Performance Despite the problems, numbers are the king Basic principle: Make the common case fast Amdahl's Law Takes into account two factors The fraction of the affected computation The enhancement of the affected computation Does not take into account many details

Pitfalls of Amdahl's Law Does not account for overlaps in computation If FP happened mostly during delays in memory If the improvement leads to changes in style of optimization (re-do the computation vs. save and retrieve) Remains a good guide if used properly

Amdahl's Law Gives the new execution time given the old, the fraction of the enhanced computation and the speedup of the enhanced computation. T new =T old 1 F enh F enh S enh

Amdahl's law Common sense Very frequently ignored. Renders many grandiose architectures futile.

Example We speed up the FP operations tenfold. In our benchmark FP accounts for 40% of execution time. S overall = 1 0.6 0.4 10 = 1 0.64 =1.56

Example A system has 5 disks with 1 million hours MTTF and a power supply with 100,000 hours MTTF. What is the improvement if we make the disks 10 times more reliable? The failure rate before the improvement was 15 millionths. We reduced it to 10.5 millionths Improvement 42%

Performance indices CPI (Clock (cycles) Per Instruction) IPC (Instructions Per Cycle) One inverse of the other IPC is more convenient for superscalar processors.

Performance Equation Instructions per program TIMES Clock cycles per instruction TIMES Seconds per Clock cycle EQUALS Seconds per program.

Fallacies and Pitfalls Amdahl's Law Do not include a golden link in a bronze chain The cost of the processor dominates the cost of the system For high end system is is about a quarter For economy systems is less than 10% Benchmarks remain valid No, vendors find ways around them

Fallacies and Pitfalls Peak performance is a good indicator Yes, sure Too much fault detection gives you trouble If you never use the FP unit, who cares if it is broken Balance the cost of checking against benefit of early detection Beware of screening tests