TDT4255 Computer Design. Lecture 1. Magnus Jahre

Similar documents
Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

The Computer Revolution. Classes of Computers. Chapter 1

Chapter 1. and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

Chapter 1. The Computer Revolution

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

Rechnerstrukturen

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Chapter 1. Computer Abstractions and Technology

Response Time and Throughput

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CSE2021 Computer Organization. Computer Abstractions and Technology

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

EIE/ENE 334 Microprocessors

The Computer Revolution. Chapter 1. The Processor Market. Classes of Computers. Morgan Kaufmann Publishers August 28, 2013

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Performance, Power, Die Yield. CS301 Prof Szajda

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

COMPUTER ORGANIZATION AND DESIGN

Outline Marquette University

Chapter 1. Computer Abstractions and Technology. Jiang Jiang

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

CS3350B Computer Architecture CPU Performance and Profiling

Computer Organization & Assembly Language Programming (CSE 2312)

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Syllabus. Chapter 1. Course Goals. Course Information. Course Goals. Course Information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 1. Computer Abstractions and Technology

An Introduction to Parallel Architectures

Chapter 1. Computer Abstractions and Technology

Thomas Polzer Institut für Technische Informatik

CSE2021 Computer Organization

CSE2021 Computer Organization

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessy

Computer Architecture. Fall Dongkun Shin, SKKU

ECE 154A. Architecture. Dmitri Strukov

1.6 Computer Performance

Microarchitecture Overview. Performance

Mestrado em Informática

CS3350B Computer Architecture. Introduction

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

TDT 4260 lecture 2 spring semester 2015

IC220 Slide Set #5B: Performance (Chapter 1: 1.6, )

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

UCB CS61C : Machine Structures

Transistors and Wires

COMPUTER ORGANIZATION AND. Editio. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

CS 110 Computer Architecture

ECE369: Fundamentals of Computer Architecture

ECE331: Hardware Organization and Design

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 1: Introduction

EECS4201 Computer Architecture

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Microarchitecture Overview. Performance

Fundamentals of Quantitative Design and Analysis

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Computer Architecture

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

ECE 486/586. Computer Architecture. Lecture # 3

Advanced Computer Architecture (CS620)

INSTRUCTION LEVEL PARALLELISM

Three major components of Embedded Systems

UCB CS61C : Machine Structures

Designing for Performance. Patrick Happ Raul Feitosa

Chapter 1. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

LECTURE 1. Introduction

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Instructor Information

Copyright 2012, Elsevier Inc. All rights reserved.

Course web site: teaching/courses/car. Piazza discussion forum:

Lecture 3: Evaluating Computer Architectures. How to design something:

CPE300: Digital System Architecture and Design

Lec 25: Parallel Processors. Announcements

CS 152 Computer Architecture and Engineering

Computer Architecture. Minas E. Spetsakis Dept. Of Computer Science and Engineering (Class notes based on Hennessy & Patterson)

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

CpE 442 Introduction to Computer Architecture. The Role of Performance

CS Computer Architecture Spring Lecture 01: Introduction

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

EITF20: Computer Architecture Part1.1.1: Introduction

CS61C Machine Structures. Lecture 1 Introduction. 8/27/2006 John Wawrzynek (Warzneck)

The Von Neumann Computer Model

Engineering 9859 CoE Fundamentals Computer Architecture

Lecture 2: Performance

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

ELE 455/555 Computer System Engineering. Section 1 Review and Foundations Class 5 Computer System Performance

Computer Architecture!

Computer Architecture!

Fundamentals of Computers Design

Transcription:

1 TDT4255 Computer Design Lecture 1 Magnus Jahre

2 Outline Practical course information Chapter 1: Computer Abstractions and Technology

3 Practical Course Information

4 TDT4255 Computer Design TDT4255 Computer Design is a thorough review of scalar processor core design: Detailed discussion of single cycle, multi-cycle and pipelined CPUs Introduction to advanced techniques like out-of-order execution, branch prediction and speculative execution Hands-on approach Exam counts 50% and exercises counts 50% towards the final grade Complementary to TDT4260 Computer Architecture TDT4255 goes through the scalar processor core in detail TDT4260 treats t the scalar processor core as a black box and covers the rest of the computer system

5 Course Staff Lecturer/Coordinator Magnus Jahre Associate Professor, Computer Architecture and Design Group (CARD), IDI, NTNU PhD 2010: Managing Shared Resources in Multicore Memory Systems Lecturer Ian Bratt 20% University Lecturer, CARD, IDI, NTNU Has designed processors for Tilera, will start at ARM this fall Teaching Assistant (TA) Vinay Kumar Gautam PhD student t at the Intelligent t Systems group at IDI

6 Preliminary Reading List Patterson and Hennessy, Computer Organization and Design The Hardware/Software Interface, 4th Edition: Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5.1 5.5 Appendix C Appendix D Hennessy and Patterson, Computer Architecture A Quantitative Approach, 4th Edition: Chapter 2 (will be provided) All exercises and provided exercise material

7 Exercises Three exercises Exercise 1: Design you own ALU (counts 10% of final grade) Exercise 2: Design you own multi-cycle CPU (counts 20% of final grade) Exercise 3: Design you own pipelined CPU (counts 20% of final grade) TA will introduce the exercises in detail in the first exercise lecture

8 Information Distribution Course web page http://www.idi.ntnu.no/emner/tdt4255/ no/emner/tdt4255/ Most of the information will be available here It s Learning Broadcast messages to all students Remember: set up e-mail notification of news! Information that should not be publicly available (e.g. supplementary files for exercises) Delivery of exercises

9 Preliminary Lecture Plan See PDF on the course web page (shown in class) Lectures will be on Wednesdays and Fridays Note: Irregular schedule

10 Chapter 1 Computer Abstractions and Technology Acknowledgement: Slides are adapted from Morgan Kaufmann companion material

11 The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications feasible Computers in automobiles Cell phones Human genome project World Wide Web Search Engines Computers are pervasive

12 Classes of Computers Desktop computers General purpose, variety of software Subject to cost/performance tradeoff Server computers Network based High capacity, performance, reliability Range from small servers to building sized Embedded computers Hidden as components of systems Stringent power/performance/cost constraints

13 The Processor Market ed n Units Shippe Million

14 Understanding Performance Algorithm Determines number of operations executed Programming language, compiler, architecture Determine number of machine instructions executed per operation Processor and memory system Determine how fast instructions are executed I/O system (including OS) Determines how fast I/O operations are executed

15 Chapter 1.2 Below Your Program

16 Below Your Program Application software Written in high-level language System software Compiler: translates HLL code to machine code Operating System: service code Handling input/output Managing memory and storage Scheduling tasks & sharing resources Hardware Processor, memory, I/O controllers

17 Levels of Program Code High-level language Level of abstraction closer to problem domain Provides for productivity and portability Assembly language Textual representation of instructions Hardware representation Binary digits (bits) Encoded instructions and data

18 Chapter 1.3 Under the Covers

19 Components of a Computer Same components for all kinds of computer Desktop, server, embedded Input/output t t includes User-interface devices Display, keyboard, mouse Storage devices Hard disk, CD/DVD, flash Network adapters For communicating with other computers

20 Inside the Processor (CPU) Datapath: performs operations on data Control: sequences datapath, memory,... Cache memory Small fast SRAM memory for immediate access to data

21 Inside the Processor AMD Barcelona: 4 processor cores

22 Abstractions Abstraction helps us deal with complexity Hide lower-level detail Instruction set architecture (ISA) The hardware/software interface Application binary interface The ISA plus system software interface Implementation The underlying details and interface

23 Technology Trends Electronics technology continues to evolve Increased capacity and performance Reduced cost DRAM capacity Year Technology Relative performance/cost 1951 Vacuum tube 1 1965 Transistor 35 1975 Integrated circuit (IC) 900 1995 Very large scale IC (VLSI) 2,400,000 2005 Ultra large scale IC 6,200,000,000 000 000

24 Chapter 1.4 Performance

25 Defining Performance Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 100 200 300 400 500 Passenger Capacity 0 2000 4000 6000 8000 10000 Cruising Range (miles) Boeing 777 Boeing 777 Boeing 747 BAC/Sud Concorde Douglas DC-8-50 Boeing 747 BAC/Sud Concorde Douglas DC- 8-50 0 500 1000 1500 Cruising Speed (mph) 0 100000 200000 300000 400000 Passengers x mph

26 Response Time Book definition: Time from issuing a command to its completion This is often referred to as the turn-around dtime More common response time definition: Time from issue to first response Execution time is the time the processor is busy execution the program Turn-around time includes the time the process waits to be executed, execution time does not Also: user execution time vs. system execution time

27 Response Time and Throughput Throughput Total work done per unit time How are response time and throughput affected by Replacing the processor with a faster version? Adding more processors?

28 Relative Performance Define Performance = 1/Execution Time X is n time faster than Y Performance X Execution time Performance Y Y Execution time X n Example: time taken to run a program 10s on A, 15s on B Execution Time B / Execution Time A = 15s / 10s = 1.5 So A is 1.5 times faster than B

29 Measuring Execution Time Elapsed time/wall clock time Total turn-around time, including all aspects Processing, I/O, OS overhead, idle time Determines system performance CPU time Time spent processing a given job Discounts I/O time, other jobs shares Comprised of user CPU time and system CPU time Different programs are affected differently by CPU and system performance

30 CPU Clocking Operation of digital hardware governed by a constant-rate t t clock Clock (cycles) Data transfer and computation Update state Clock period Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250 10 12 s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0 10 9 Hz

31 CPU Time CPU Time CPU Clock Cycles Clock Cycle Time CPU Clock Cycles Clock Rate Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count

32 CPU Time Example Computer A: 2GHz clock, 10s CPU time on benchmark X Designing i Computer B Design Goal: 6s CPU time on benchmark X Faster clock causes 20% increase in the number of clock cycles How fast must Computer B clock be? Clock Rate B Clock Cycles CPU Time B B 12 1.2 Clock Cycles 6s A Clock Cycles A CPU Time A Clock Rate A 10s 2GHz 20 10 9 Clock Rate B 1.22010 6s 9 2410 6s 9 4GHz

33 Instruction Count and CPI Clock Cycles Instruction Count Cycles per Instruction CPU Time Instruction Count CPI Clock Cycle Time Instruction Count CPI Clock Rate Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix Under which conditions is it valid to compare CPIs directly?

34 CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA, program and compiler Which is faster, and by how much? CPU Time A CPU Time B CPU Time B CPU Time A Instruction Count CPI Cycle Time A A I 2.0 250ps I500ps Instruction Count CPI B I1.2500ps I 600ps I 600ps 1.2 I500ps Cycle Time B A is faster by this much

35 CPI in More Detail If different instruction classes take different numbers of cycles Clock Cycles n i1 (CPIi Instruction Counti) Weighted average CPI CPI Clock Cycles Instruction Count n i 1 CPI i Instruction Counti Instruction Count Relative frequency

36 CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class Cass A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Sequence 2: IC = 6 Clock Cycles Clock Cycles =2 1 +1 2 +2 3 =4 1 +1 2 +1 3 = 10 = 9 Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5

37 Performance Summary CPU Time Instructions Program Clock cycles Instruction Seconds Clock cycle Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Clock cycle time

38 Chapter 1.5 The Power Wall

39 Power Trends In CMOS IC technology Power Capacitive load Voltage 2 Frequency 30 5V 1V 1000

40 Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction 2 Pnew Cold 0.85(Vold 0.85) Fold 0.85 4 0.85 2 P C V F old old od old od old od 0.52 The power wall: We can t reduce voltage further We can t remove more heat How else can we improve performance?

41 Chapter 1.6 Uniprocessors to Multiprocessors

42 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency

43 Multiprocessors Multicore microprocessors More than one processor per chip Covered in detail in TDT4260 Computer Architecture Requires explicitly parallel programming Compare with Instruction Level Parallelism (ILP) Hardware executes multiple instructions at once Hidden from the programmer Hard to do Programming for performance Load balancing Optimizing communication and synchronization

44 Chapter 1.7 Manufacturing and Benchmarking the AMD Opteron X4

45 Manufacturing ICs Yield: proportion of working dies per wafer

46 AMD Opteron X2 Wafer X2: 300mm wafer, 117 chips, 90nm technology X4: 45nm technology

47 Integrated Circuit Cost Cost per die Cost per wafer Dies per wafer Yield Dies per wafer Wafer area Die area Yield 1 (1 (Defects per area Die area/2)) 2 Nonlinear relation to area and defect rate Wafer cost and area are fixed Defect rate determined by manufacturing process Die area determined by architecture and circuit design

48 SPEC CPU Benchmark Programs used to measure performance Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU, I/O, Web, SPEC CPU2006 Elapsed time to execute a selection of programs Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of performance ratios CINT2006 (integer) and CFP2006 (floating-point) Again: Covered in detail in TDT4260

49 CINT2006 for Opteron X4 2356 Name Description IC 10 9 CPI Tc (ns) Exec time Ref time SPECratio perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3 bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8 gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1 mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8 go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6 hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5 sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5 libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8 h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3 omnetpp Discrete event ent simulation 587 2.94 0.40 690 6,250 9.1 astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1 xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0 Geometric mean 11.7 High cache miss rates

50 Chapter 1.8 Fallacies and Pitfalls

51 Pitfall: Amdahl s Law Improving one aspect of a computer and expecting a proportional improvement in overall performance T improved Taffected improvement factor T unaffected Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5 overall? 80 20 20 Can t be done! n Corollary: make the common case fast