The Power Wall. Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s?

Similar documents
Modern Design Principles RISC and CISC, Multicore. Edward L. Bosworth, Ph.D. Computer Science Department Columbus State University

Computer Architecture!

Computer Architecture

An Introduction to Parallel Architectures

Computer Architecture!

Computer Architecture

Computer Architecture!

Parallelism and Concurrency. COS 326 David Walker Princeton University

Why Parallel Architecture

Transistors and Wires

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Lecture 1: Why Parallelism? Parallel Computer Architecture and Programming CMU , Spring 2013

Functional Units of a Modern Computer

Microprocessors. Chapter The McGraw-Hill Companies, Inc. All rights reserved. Mike Meyers CompTIA A+ Guide to Managing and Troubleshooting PCs

MICROPROCESSOR ARCHITECTURE

Part 1 of 3 -Understand the hardware components of computer systems

Parallel Computing. Parallel Computing. Hwansoo Han

Outline Marquette University

Module 3. CPUs and Cooling

Fundamentals of Computer Design

1.13 Historical Perspectives and References

Fundamentals of Computers Design

Developing a Powerful yet Inexpensive Computational Infrastructure for the UT Dept. of Nuclear Engineering. David D. Dixon April 8, 2009

ECE 588/688 Advanced Computer Architecture II

Now we are going to speak about the CPU, the Central Processing Unit.

The Memory Component

A+ Guide to Hardware, 4e. Chapter 4 Processors and Chipsets

HW Trends and Architectures

Parallelism: The Real Y2K Crisis. Darek Mihocka August 14, 2008

Copyright 2012, Elsevier Inc. All rights reserved.

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

Multicore and Parallel Processing

Chapter 1: Introduction to the Microprocessor and Computer 1 1 A HISTORICAL BACKGROUND

Systems Architecture

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018

Response Time and Throughput

EECS4201 Computer Architecture

Lecture 1: What is a Computer? Lecture for CPSC 2105 Computer Organization by Edward Bosworth, Ph.D.

Slide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Managing Data Center Power and Cooling

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

1DT157 Digitalteknik och datorarkitekt. Digital technology and computer architecture, 5p

CSE 502 Graduate Computer Architecture

Charge laptops faster, get more port options, and run at cooler temperatures with Dell Business Docks

Fundamentals of Quantitative Design and Analysis

Conduction Cooling for Stackable SBCs

Lecture 1: Introduction

Central Processing Unit

goals review some basic concepts and terminology

Chapter 1 Basic Computer Organization

(ii) Why are we going to multi-core chips to find performance? Because we have to.

Install and Configure ICT Equipment and Operating Systems Unit 229 Central Processing Unit (CPU)

BENEFITS OF ASETEK LIQUID COOLING FOR DATA CENTERS

Since the invention of microprocessors at Intel in the late 1960s, Multicore Chips and Parallel Processing for High-End Learning Environments

45-year CPU Evolution: 1 Law -2 Equations

Dell Dynamic Power Mode: An Introduction to Power Limits

Computer Architecture Computer Architecture. Computer Architecture. What is Computer Architecture? Grading

Introduction to GPU computing

When and Where? Course Information. Expected Background ECE 486/586. Computer Architecture. Lecture # 1. Spring Portland State University

MICROPROCESSOR TECHNOLOGY

Computer Hardware. In this lesson we will learn about Computer Hardware, so that we have a better understanding of what a computer is.

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Contents Slide Set 9. Final Notes on Textbook Chapter 7. Outline of Slide Set 9. More about skipped sections in Chapter 7. Outline of Slide Set 9

MULTI-CORE PROCESSORS: CONCEPTS AND IMPLEMENTATIONS

Scaling through more cores

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

ECE 588/688 Advanced Computer Architecture II

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Cycles Per Instruction For This Microprocessor

ASUS X-Mars CPU Cooler

Homeschool Enrichment. The System Unit: Processing & Memory


Multi-Core Microprocessor Chips: Motivation & Challenges

CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Advanced Computer Architecture (CS620)

Computer Architecture. Introduction. Lynn Choi Korea University

Performance Measurement (as seen by the customer)

Software within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual

Chapter 1: Fundamentals of Quantitative Design and Analysis

CS5222 Advanced Computer Architecture. Lecture 1 Introduction

Introduction to the Personal Computer

Parallel Languages: Past, Present and Future

Unit 4 Part A Evaluating & Purchasing a Computer. Computer Applications

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Performance of computer systems

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Architecture

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

ECE 486/586. Computer Architecture. Lecture # 2

Homework Question. Faster, faster, faster! (40 points)

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

Concurrency & Parallelism, 10 mi

Datacenter Efficiency Trends. Cary Roberts Tellme, a Microsoft Subsidiary

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

Computer System Architecture

Transcription:

The Power Wall Why Aren t Modern CPUs Faster? What Happened in the Late 1990 s? Edward L. Bosworth, Ph.D. Associate Professor TSYS School of Computer Science Columbus State University Columbus, Georgia 31907 January 2010

What is the Problem? The problem is illustrated in this figure, taken from Patterson & Hennessy. The SPECint performance of the hottest chip grew by 52% per year from 1986 to 2002, and then grew only 20% in the next three years (about 6% per year).

Here is a Clue to the Problem The problem is now called the Power Wall. It is illustrated in this figure, taken from Patterson & Hennessy. The design goal for the late 1990 s and early 2000 s was to drive the clock rate up. This was done by adding more transistors to a smaller chip. Unfortunately, this increased the power dissipation of the CPU chip beyond the capacity of inexpensive cooling techniques.

Roadmap for CPU Clock Speed: Circa 2005 Here is the result of the best thought in 2005. By 2015, the clock speed of the top hot chip would be in the 12 15 GHz range.

The CPU Clock Speed Roadmap (A Few Revisions Later) This reflects the practical experience gained with dense chips that were literally hot ; they radiated considerable thermal power and were difficult to cool. Law of Physics: All electrical power consumed is eventually radiated as heat.

Summary of What Follows There are two solutions to the problem of the Power Wall. We explore each. 1. The technique taken by IBM for implementation on their large z/10 and z/11 servers was to include sophisticated and costly cooling technologies. This allowed CPU clock rates in excess of 5.0 GHz. The commercial products today use 4.67 GHz CPU chips. 2. The technique taken by commodity processor providers, such as Intel and AMD. They have moved to a multicore design, in which each CPU chip contains a moderate number of components. Due to cost constraints, the chips are cooled by simple fans and all metal heat radiators. Each of these components might be called a CPU or processor, except that the design calls for them to be on a single chip, still called the CPU. For that reason these units are called cores. CPU chips with multiple processors per chip are called multicore. The term multiprocessor microprocessor just does not work.

The IBM Power 6 CPU This is the CPU used in the IBM large mainframes. It has 790 million transistors in a chip of area 341 square millimeters. In the Z/10, the chip runs at 4.67 GHz. Lab prototypes have run at 6.0 GHz. The Power 595 configuration of the Z/10 uses between 16 and 64 of the Power 6 chips, each running at 5.0 GHz. Here is a picture of the Power 6 chip and a typical module for its mounting.

IBM Cooling Technology While most chip manufacturers target commodity computers that cannot be fitted with expensive cooling; IBM is targeting the mainframe community. The IBM Power 6 CPU is generally placed in water cooled units. The copper tubing feeds cold water to cooling units in direct contact with the CPU chips. Each CPU chip is laid out not to have hot spots. Some engineers are using the warm water from the computer to heat buildings.

Cooling a Faster Single Core CPU We have seen how IBM does it; they provide a costly water cooling system. Could Intel and AMD duplicate this design in a market that will not allow the expense and complexity of a water cooling system? Akasa Copper Heatsink Mugen 2 Cooler Here are two options for air cooling of a commercial CPU chip. A Google search for Computer Cooling Radiators shows a brisk market in water cooling units for commodity CPU chips.

Making a Faster Single Core CPU The standard way is to make a more sophisticated pipeline. This requires more transistors per chip, but that is not a problem. The code is executed top to bottom. Each instruction is being handled by a distinct stage of the pipeline. The CPU is executing five instructions at once. BUT: More transistors mean more power, thus more heat. We can also raise the CPU clock rate, but this causes more heat as well.

The Power Equation Each CPU is nothing more than a collection of transistors, acting as switches. A modern CPU might have about a billion transistors on a single chip. The IBM Power 6 chip has 790 million transistors in an area of 341 mm 2. The power equation for a single transistor can be written as Power = K (Capacitive Load) (Voltage) 2 (Frequency Switched) To keep the power the same, one should halve the voltage for every speed increase of four. Older chips (Intel 8086) ran at 5.0 volts; the newer ones run at about 1.5 volts. Such a new chip, at the same voltage and capacitive load specifications would emit (1.5/5.0) 2 = (0.3) 2 10% of the power of the older chip. This would seem to allow a chip that is ten times faster (in clock speed). However, faster clock rates sometimes demand higher voltages. Also, higher voltages mean less trouble due to random noise. A random signal of 0.2 volts might disrupt a chip running at 1.0 volts, but not 5.0 volts.

The Intel Prescott: The End of the Line The CPU chip (code named Prescott by Intel) appears to be the high point in the actual clock rate. The fastest mass produced chip ran at 3.8 GHz, though some enthusiasts (called overclockers ) actually ran the chip at 8.0 GHz. Upon release, this chip was thought to generate about 40% more heat per clock cycle that earlier variants. This gave rise to the name PresHot. The Prescott was an early model in the architecture that Intel called NetBurst, which was intended to be scaled up eventually to ten gigahertz. The heat problems could never be handled, and Intel abandoned the architecture. The following are adapted from a review of the Prescott by Sander Sassen. The Prescott idled at 50 degrees Celsius (122 degrees Fahrenheit) The only way to keep it below 60 Celsius (140 F) was to operate it with the cover off and plenty of ventilation. Even equipped with the massive Akasa King Copper heat sink (see a previous slide), the system reached 77 Celsius (171 F) when operating at 3.8 GHz under full load and shut itself down.

Multicore Chips: The Start of a New Line Rather than continuing to improve single program performance, many commercial chip manufacturers have adopted a server mentality ; increase the throughput of a number of programs running concurrently. We shall study parallel processing later. At that time, we shall not that the difficulty lies in keeping all processors doing productive work. The division of a single problem among a large number of processors, or the use of a large number of processors for cooperating tasks, is difficult. Recall that a multicore chip is just a CPU chip with multiple processors. In a server, especially a large one such as the IBM z/10, there are a large number of independent processes that do not need to intercommunicate. Allocation of processors (cores) to such a job mix is almost trivial. Question: Compare a single processor operating at 4 GHz to a dual core processor with each core operating at 2 GHz. The dual core processor is likely to consume less power, but can it do the same amount of work per unit time as the faster single core processor?

Intel s Multicore Chip Offerings for 2010 For 2010, Intel Corporation has released a new series of multicore processors. Here is a Intel Corp overview of this series. All of these seem to be quad core.

Intel s Rationale According to Intel, the multi core technology will permanently alter the course of computing as we know it, provide new levels of energy efficient performance, deliver full parallel execution of multiple software threads, and reduce the amount of electrical power to do the computations. The current technology provides for one, two, four, or eight cores in a single processor. Intel expects to have available soon single processors with several tens of cores, if not one hundred. This new technology seems to be targeted at the commercial desktop machine, which can run several demanding modern applications at once. At present, there are little hard data on multicore machines. What we have mostly is marketing hype. That might change soon.

References Slide 1 is the title slide. Slides 2 and 3 are adaptations of Figures 1.15 and1.16 found in the textbook Computer Organization and Design, Fourth Edition, by David A. Patterson and John L. Hennessy, Morgan Kaufmann, 2009, ISBN 978 0 12 374493 7. Slides 4 and 5 were taken from slides 7 and 8 of a presentation by Katherine Yelick, NERSC Division Director (U.C. Berkeley and Lawrence Berkeley National Laboratory). The second slide also appeared in a late 2009 issue of the Communications of the ACM. Slide 6 is a summary, with few quotes. Slides 7 and 8 are based on Google searches, especially for IBM Power 6, which is the CPU chip for the IBM z/10 and (soon to be released) z/11. While visiting the zseries manufacturing plant in Poughkeepsie, NY, I was told that the CPU for the z/11 would be run at a speed above 5 GHz. Slide 7 quotes details on the IBM Power 6 chip from the paper IBM Power6 Microarchitecture, IBM Journal Res. & Dev. Vol. 51, No. 6, November 2007.

Slide 9 shows two commercial cooling units for commodity CPUs. The unit at the left is an Akasa Copper Heatsink. (www.akasa.com.tw/) The unit on the right is a Mugen 2 Cooler (See http://www.scythe-eu.com/en/products/cpu-cooler/mugen-cpu-cooler.html) Much of the information on the Intel Prescott chip comes from the Wikipedia article on Pentium 4. Some material is taken from a review by Sander Sassen, found at http://www.hardwareanalysis.com/content/article/1693/. Slide 15 is built on several sources, including: the Intel White Paper Extending the World s Most Popular Processor Architecture. The Intel web site http://www.intel.com/multi-core/index.htm