CS5222 Advanced Computer Architecture. Lecture 1 Introduction

Similar documents
Computer Architecture

CS 194 Parallel Programming. Why Program for Parallelism?

CS671 Parallel Programming in the Many-Core Era

Computer Architecture. Fall Dongkun Shin, SKKU

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer Architecture!

Fundamentals of Computers Design

Computer Architecture!

Fundamentals of Computer Design

Computer Architecture

EITF20: Computer Architecture Part1.1.1: Introduction

Computer Architecture!

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

Fra superdatamaskiner til grafikkprosessorer og

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Parallel Computing. Parallel Computing. Hwansoo Han

Outline Marquette University

Computer & Microprocessor Architecture HCA103

ENIAC - background. ENIAC - details. Structure of von Nuemann machine. von Neumann/Turing Computer Architecture

HW Trends and Architectures

ECE 475/CS 416 Computer Architecture - Introduction. Today s Agenda. Edward Suh Computer Systems Laboratory

Instructor Information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 2 Computer Evolution and Performance

CSC 447: Parallel Programming for Multi- Core and Cluster Systems. Lectures TTh, 11:00-12:15 from January 16, 2018 until 25, 2018 Prerequisites

Fundamentals of Computer Design

COMP 322: Fundamentals of Parallel Programming

Lecture 1: Introduction

Lecture 1 Introduction to Microprocessors

What is this class all about?

Chapter 2. Perkembangan Komputer

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Chapter 1. Introduction To Computer Systems

CS654 Advanced Computer Architecture. Lec 1 - Introduction

VLSI Design Automation

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Computer Architecture

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:

Lec 25: Parallel Processors. Announcements

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

What is this class all about?

VLSI Design Automation. Maurizio Palesi

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture

Performance of computer systems

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

CHAPTER 1 Introduction

When and Where? Course Information. Expected Background ECE 486/586. Computer Architecture. Lecture # 1. Spring Portland State University

Administration. Coursework. Prerequisites. CS 378: Programming for Performance. 4 or 5 programming projects

Lecture 1: CS/ECE 3810 Introduction

ECE 588/688 Advanced Computer Architecture II

Computer Architecture Review. ICS332 - Spring 2016 Operating Systems

CS61C Machine Structures. Lecture 1 Introduction. 8/27/2006 John Wawrzynek (Warzneck)

ECE 154A. Architecture. Dmitri Strukov

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Lecture #1. Teach you how to make sure your circuit works Do you want your transistor to be the one that screws up a 1 billion transistor chip?

CS 6534: Tech Trends / Intro

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

CS/EE 6810: Computer Architecture

The Computer Revolution. Classes of Computers. Chapter 1

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Computer Architecture. R. Poss

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Computer System architectures

ECE 588/688 Advanced Computer Architecture II

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

COMPUTER ARCHITECTURE

Why Parallel Architecture

Fundamentals of Quantitative Design and Analysis

Computer Evolution. Computer Generation. The Zero Generation (3) Charles Babbage. First Generation- Time Line

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

CHAPTER 1 Introduction

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance

Figure 1-1. A multilevel machine.

Performance, Power, Die Yield. CS301 Prof Szajda

What is this class all about?

CS758: Multicore Programming

VLSI Design Automation

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

Introduction to GPU computing

Multicore Hardware and Parallelism

Copyright 2012, Elsevier Inc. All rights reserved.

Unit 2: Hardware Background

Multi-Core Microprocessor Chips: Motivation & Challenges

The Implications of Multi-core

Computer Evolution. Budditha Hettige. Department of Computer Science

CS61C Machine Structures. Lecture 1 Introduction. 8/25/2003 Brian Harvey. John Wawrzynek (Warznek) www-inst.eecs.berkeley.

CS Computer Architecture

CS427 Multicore Architecture and Parallel Computing

Computer Systems are Different! Robert Morris and Frans Kaashoek Spring 2009

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

Multicore and Parallel Processing

RISC Architecture Ch 12

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

6 February Parallel Computing: A View From Berkeley. E. M. Hielscher. Introduction. Applications and Dwarfs. Hardware. Programming Models

EECS4201 Computer Architecture

High Performance Computing

Response Time and Throughput

The Art of Parallel Processing

Transcription:

CS5222 Advanced Computer Architecture Lecture 1 Introduction

Overview Teaching Staff Introduction to Computer Architecture History Future / Trends Significance The course Content Workload Administrative Matters 2

Who am I? Dr. Soo Yuen Jien Contact Information: Room: COM2 #02-61 Consultation Hour: Friday 3pm-5pm Wednesday after lecture Email me for other timing Email: sooyj@comp.nus.edu.sg Comments / Suggestions welcome 3

WHAT IS COMPUTER ARCHITECTURE? 4

Computer Architecture: Definition Architecture (in Computing): The organization of the components and functionalities of a system Computer Architecture: The study of computer (processor) architecture To maximize performance within constraints Typically classified into 3 categories: Instruction Set MicroArchitecture System Design 5

The 3 Categories Instruction Set The hardware/software interface Expose the functionalities to programmer MicroArchitecture Organization of components Techniques / Mechanisms for performance System Design Interconnection, data path Memory hierarchy 6

Computer Architecture VS Computer Architecture: Hardware Engineering Describes the behavior of the processor Describes the high level mechanisms / techniques for better performance Hardware Engineering: Concerns with the actual implementation of the architecture Logic / Circuit implementation, Packaging, Cooling, Transistor process technology etc 7

Computer System: The brief history Let's review the progress of computer system in the past: 1. Follow the thread of "Personal" Computer 2. Another thread on High-end supercomputer Observe the progress in terms of: Speed ( Operations / Second ) Size Availability and Cost 8

The Brief History: 1946 - ENIAC ENIAC: World s first programmable electronic digital computer 1900 additions per second 18,000 vacuum tubes 30 ton, 80 by 8.5 feet 9

The Brief History: 1951 - UNIVAC UNIVAC: first commercial computer of US Uses Von Neumman design 2000 additions per second for $1 million Sold 48 copies 10

The Brief History: 1964 IBM 360 IBM System/360: Six implementations with varying price, performance An example: 2MHz, 128KB-256KB memory, 500K operations/sec for $1M All binary compatible, redefines industry! 11

The Brief History: 1965 PDP-8 DEC PDP-8: first minicomputer 4k of 12-bit words 4 registers 330K operations per second for $16,000 sold 50,000 copies! 12

The Brief History: 1971 Intel 4004 Intel 4004: First microprocessor (single chip CPU) 4-bit processor for calculator 1KB data + 4KB program memory Only 2300 transistors 16-pin package 740KHz 100K operations per second 13

The Brief History: 1977 Apple II Apple II: first personal computer 1 Mhz clock, 4kB of RAM, $1300 ~200k operations per second 14

The Brief History: 1981 IBM PC IBM PC The system that shapes the IT industry as we know it Intel 8088 Processor 4.77 MHz, 16-256kB RAM 240K operations for $3000! 15

The Brief History: 2003 Pentium 4 Intel Pentium 4 processor Clock speed 3.0GHz for around $300 169 million transistors 6000M operations/sec 16

The Brief History: 2011 Intel i7 Intel Core i7 processor Clock speed 3.2GHz for around $500 ~120GFlops 17

The Brief History: Supercomputer 35,000.0 33,826.7 30,000.0 25,000.0 20,000.0 15,000.0 10,000.0 5,000.0 0.0 17,590.0 10,510.0 1,105.0 1,759.02,566.0 Linpack Performance ( teraflops ) Nov, 2008 Road Runner (US) Nov, 2009 Jaguar (US) Nov, 2010 TianHe (China) Nov, 2011 K Computer (Japan) Nov, 2012 Titan (US) Nov, 2013 TianHe 2 (China) 18

Summary: From a few to many n Transistor is the building block of CPU since 1960s 1970-1980 1980-1990 1990-2000 2000-2011 2K 100K 100K 1M 1M 100M 100M 2.2B Current World Population = 7Billion about the number of transistors in 3 CPU chips! 19

Summary: From BIG to small Process size = Minimum length of a transistor 80286 1982 1.5 µm Pentium 1993 0.80 µm - 0.25 µm Pentium 4 2000 0.180 µm - 0.065 µm Core i7 2010 0.045 µm - 0.032 µm Wave length of visible light = 350nm (violet) to 780nm (red) Process size now smaller than wavelength of violet light! 20

Summary: From S-L-O-W to fast FLOPS = FLoating-point Operation Per Second 80286 1982 1.8 MIPS* Pentium 1993 200 MFLOPS # Pentium 4 2000 4 GFLOPS # Core i7 2011 120 GFLOPS # 21

Summary: The Brief History Unprecedented progress since late 1940s Performance doubling ~2 years (1971-2005): Total of 36,000X improvement! If transportation industry matched this improvement, we could have traveled Singapore to Shanghai, China in about a second for roughly a few cents! Incredible amount of innovations to revolutionize the computing industry again and again 22

GREAT!! (BUT IS THERE ANYTHING LEFT TO DO?) 23

Moore s Law Intel co-founder Gordon Moore "predicted" in 1965 that Transistor density will double every 18 months 24

Growth in Processor Performance 25

Growth in Processor Performance Prior to mid-80s Largely technology driven Average 25% performance gain per year Mid-80s to 2002 Both technology, instruction set (RISC), and organization Average 52% performance gain per year Factor of seven gain from organization 2002 onwards Average 20% performance gain per year 26

The Three Walls Three major reasons for the unsustainable growth in uniprocessor performance 1. The Memory Wall: Increasing gap between CPU and Main memory speed 2. The ILP Wall: Decreasing amount of "work" (instruction level parallelism) for processor 3. The Power Wall: Increasing power consumption of processor 27

The Memory Wall Memory access speed increases at about 10% / yr Processor speed increases at about 50% / yr Memory is now order of magnitude slower than the processor speed E.g. Intel Core i7 has 0.3ns cycle, DDR3 SDRAM latency is ~10ns Increasing amount of chip area dedicated to on-chip cache 28

The ILP Wall Instruction Level Parallelism (ILP) defines the amount of instructions that can be executed in parallel The main source of performance for superscalar processors Very limited for implicit ILP, discovered onthe-fly by processor Average ~3 instructions (depends!!) Move to explicit ILP Parallel Programming and Execution 29

The Power Wall We can now cramp more transistor into a chip than the ability (power) to turn them on! 30

Power Consumption: A comparison ~500 watts ~10mega watts Frige ~600 watts 1 HDB block ~50kilo watts 31

The Power Wall: Challenges Mobile/Portable (cell phone, laptop, PDA) Battery life is critical Desktop 400 million computers in the world 0.16PW (PetaWatt = 10 15 Watt) of power dissipation Equivalent to 26 nuclear power plants Data centers 1 single server rack is between 5 and 20 kw 100s of those racks in a single room 32

SO, HOW DO WE FIGHT THE WAR (WALL)? 33

Meeting the challenge Hyper-Threading Technology (HTT) in Xeon and Pentium 4 Allow one physical processor to appear and behave as two virtual processors to the operating system Two independent thread gives more ILP! Intel dual-core (Pentium D) Multiple microprocessor cores on a single chip Copyright 2005 Intel 34

Parallelism saves Power Dynamic Power = C x V 2 x f C = Capacitance, V = Voltage, f = clock freq Performance is proportional to clock frequency Exploit explicit parallelism for reducing power using additional cores Increase density (=more transistors = more capacitance) Can increase cores (2x) and performance (2x) Or increase cores (2x) but decrease frequency (f/2) 35

Multicore Revolution Chip density is continuing to increase ~2x every 2 years Clock speed is not Number of processor cores may double instead 36

Multicore Revolution: Industry We are dedicating all of our future product development to multicore designs. This is a sea change in computing Paul Otellini, President, Intel (2005) All microprocessor companies switch to MP (2X CPUs / 2 yrs) Procrastination results in 2X sequential perf. / 5 yrs Current State: Intel i7 has 6 cores The STI Cell processor (PS3) has 8 cores nvidia Tesla GPU has up to 512 cores Intel MIC has > 50 cores 37

Multicore/Manycore Roadmap Multicore: 2X / 2 yrs 64 cores in 8 years Manycore: 8X to 16X multicore 1000 100 10 1 512 256 128 64 64 32 16 8 4 2 1 2003 2005 2007 2009 2011 2013 2015 38

Architecture Outlook Expect modestly pipelined processors Small cores not much slower than large cores Parallelism is energy efficient path to performance Lower threshold and supply voltages lowers energy per operation Small, regular processing elements easier to verify Heterogeneous processors Special function units to accelerate popular functions 39

Multicore: Impacts All major processor vendors are producing multicore chips Every machine will soon be a parallel machine All programmers will be parallel programmers??? Complexity may eventually be hidden in libraries, compilers, and high level languages But a lot of work is needed to get there Big open questions: What will be the killer apps for multicore machines? How should the chips be designed, and how will they be programmed? Many others.. 40

Parallel Revolution May Fail when we start talking about parallelism and ease of use of truly parallel computers, we're talking about a problem that's as hard as any that computer science has faced. I would be panicked if I were in industry. John Hennessy, President, Stanford University, 1/07 100% failure rate of Parallel Computer Companies Convex, Encore, MasPar, NCUBE, Kendall Square Research, Sequent, (Silicon Graphics), Transputer, Thinking Machines, What if IT goes from a growth industry to a replacement industry? If SW can t effectively multiple cores per chip SW no faster on new computer Only buy if computer wears out 41

Parallel Computing: A view from Berkeley Applications 1. What are the applications? 2. What are common kernels of the applications? Architecture and Hardware 3. What are the HW building blocks? 4. How to connect them? Programming Model and Systems Software 5. How to describe applications and kernels? 6. How to program the hardware? Evaluation 7. How to measure success? 42

Compiler Challenges Heterogeneous processors Increase in the design space for code optimization Auto-tuners: optimizing code at runtime Software controlled memory management Example: Cell processor 43

Parallel Programming Challenges Finding enough parallelism (Amdahl s Law) Granularity Locality Load balance Coordination and synchronization Debugging Performance modeling 44

BACK TO THE COURSE 45

What will we learn in CS5222? Instruction-Level Parallelism (ILP) Pipelining Dynamic Scheduling (Superscalar out-of-order) Static scheduling (VLIW processors) Branch Prediction Multi-threaded processors Multiprocessors Symmetric shared-memory architectures Synchronization Memory consistency Memory Hierarchy Design 46

Where can CS5222 takes you? Advanced Compiler System Software Operating System High Performance Computing Parallel Computing 47

We expect you to know Computer Organization (CS2100) Multi-Core Architecture (CS4223) Significant overlap in topics, but more indepth Instruction set concepts: RISC instruction set design philosophy registers, instructions, etc. Simple pipelining Basic caches, main memory Low-level programming experience C is very likely to be needed 48

Reference Computer Architecture: A Quantitative Approach 4 th Edition Hennessy & Patterson Published by Morgan Kaffman 49

Resources Primary and only information source is IVLE Workbin: Lecture notes Assignment submissions Forum: Ask course-related technical questions in the forum. Email is only for your personal concerns. 50

Assessment Final Exam: 50% Assignments: 30% 2-3 assignments Midterm: 20% Tentatively in week 7 (after term break). During normal lecture hours. 51