CS 6534: Tech Trends / Intro

Similar documents
Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Lecture 1: Introduction

Multicore and Parallel Processing

Lecture 1: CS/ECE 3810 Introduction

CS/EE 6810: Computer Architecture

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Advanced Computer Architecture (CS620)

Computer Architecture!

CS3350B Computer Architecture. Introduction

ECE 587 Advanced Computer Architecture I

Why Parallel Architecture

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

EECS4201 Computer Architecture

Lecture 1: Gentle Introduction to GPUs

Computer Architecture

When and Where? Course Information. Expected Background ECE 486/586. Computer Architecture. Lecture # 1. Spring Portland State University

Computer Architecture!

Microprocessor Trends and Implications for the Future

Lec 25: Parallel Processors. Announcements

Computer Architecture!

EITF20: Computer Architecture Part1.1.1: Introduction

Lecture 14: Multithreading

Computer Architecture. Introduction. Lynn Choi Korea University

45-year CPU evolution: one law and two equations

Introduction to Computer Architecture II

CS61C Machine Structures. Lecture 1 Introduction. 8/25/2003 Brian Harvey. John Wawrzynek (Warznek) www-inst.eecs.berkeley.

Lecture 1: Why Parallelism? Parallel Computer Architecture and Programming CMU , Spring 2013

ADVANCED COMPUTER ARCHITECTURES

Fundamentals of Quantitative Design and Analysis

HW Trends and Architectures

CSE 502 Graduate Computer Architecture

What is This Course About? CS 356 Unit 0. Today's Digital Environment. Why is System Knowledge Important?

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,

PERFORMANCE MEASUREMENT

Computer Architecture

IT 252 Computer Organization and Architecture. Introduction. Chia-Chi Teng

ECE 475/CS 416 Computer Architecture - Introduction. Today s Agenda. Edward Suh Computer Systems Laboratory

ECE 588/688 Advanced Computer Architecture II

High Performance Computing

CS 152 Computer Architecture and Engineering. Lecture 14: Multithreading

Unit #8: Shared-Memory Parallelism and Concurrency

EE5780 Advanced VLSI CAD

Fundamentals of Computers Design

Administrative matters. EEL-4713C Computer Architecture Lecture 1. Overview. What is this class about?

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

CS61C Machine Structures. Lecture 1 Introduction. 8/27/2006 John Wawrzynek (Warzneck)

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

ECE 571 Advanced Microprocessor-Based Design Lecture 4

CS 152 Computer Architecture and Engineering. Lecture 18: Multithreading

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

ECE 2162 Intro & Trends. Jun Yang Fall 2009

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

CS427 Multicore Architecture and Parallel Computing

CS430 - Computer Architecture William J. Taffe Fall 2002 using slides from. CS61C - Machine Structures Dave Patterson Fall 2000

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Online Course Evaluation. What we will do in the last week?

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

Copyright 2012, Elsevier Inc. All rights reserved.

CS5222 Advanced Computer Architecture. Lecture 1 Introduction

Lecture 2: Performance

ECE 15B COMPUTER ORGANIZATION

Parallelism, Multicore, and Synchronization

TDT 4260 lecture 2 spring semester 2015

CSE 141: Computer Architecture. Professor: Michael Taylor. UCSD Department of Computer Science & Engineering

URL: Offered by: Should already know: Will learn: 01 1 EE 4720 Computer Architecture

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Art of Parallel Processing

ECE 154A. Architecture. Dmitri Strukov

CS758: Multicore Programming

Advanced Computer Architecture

ECE 588/688 Advanced Computer Architecture II

EE/CSCI 451: Parallel and Distributed Computation

Chapter 1: Fundamentals of Quantitative Design and Analysis

Fundamentals of Computer Design

Tunneling. Encapsulation and Tunneling. Performance implications of data transmission on high speed network devices

EITF20: Computer Architecture Part1.1.1: Introduction

ECSE 425 Lecture 1: Course Introduc5on Bre9 H. Meyer

CS 194 Parallel Programming. Why Program for Parallelism?

Multicore Hardware and Parallelism

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

Instructor Information

Administration. Coursework. Prerequisites. CS 378: Programming for Performance. 4 or 5 programming projects

EECE 321: Computer Organization

Advanced Processor Architecture

What is this class all about?

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

CS671 Parallel Programming in the Many-Core Era

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Lecture 1: Course Introduction and Overview Prof. Randy H. Katz Computer Science 252 Spring 1996

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

Outline Marquette University

1.13 Historical Perspectives and References

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Introduction to Computer Systems

Computer Architecture Computer Architecture. Computer Architecture. What is Computer Architecture? Grading

How many cores are too many cores? Dr. Avi Mendelson, Intel - Mobile Processors Architecture group

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Transcription:

1 CS 6534: Tech Trends / Intro Charles Reiss 24 August 2016

Moore s Law Microprocessor Transistor Counts 1971-2011 & Moore's Law 16-Core SPARC T3 2,600,000,000 1,000,000,000 Six-Core Core i7 Six-Core Xeon 7400 Dual-Core Itanium 2 AMD K10 Itanium 2 with 9MB cache POWER6 AMD K10 10-Core Xeon Westmere-EX 8-core POWER7 Quad-core z196 Quad-Core Itanium Tukwila 8-Core Xeon Nehalem-EX Six-Core Opteron 2400 Core i7 (Quad) Itanium 2 Core 2 Duo Cell 100,000,000 AMD K8 Pentium 4 Barton Atom AMD K7 AMD K6-III Transistor count 10,000,000 1,000,000 80386 80486 AMD K5 Pentium AMD K6 Pentium III Pentium II 100,000 80286 68000 80186 8086 8088 10,000 8080 8085 6800 Z80 6809 2,300 8008 MOS 6502 4004 RCA 1802 1971 1980 1990 2000 2011 Date of introduction Wikimedia Commons / Wgsimon 2

Good Ol Days: Frequency Scaling Figure 1.11 Growth in clock rate of microprocessors in Figure 1.1. Between 1978 and 1986, the clock rate improved less than 15% per year while performance improved by 25% per year. During the renaissance period of 52% performance improvement per year between 1986 and 2003, clock rates shot up almost 40% per year. Since then, the clock rate has been nearly flat, growing at less than 1% per year, while single processor performance improved at less than 22% per year. H&P 3

The Power Wall 4 Power Switching Power + Leakage Power Switching Power Capacitance Voltage 2 Frequency

Increasing Parallelism: Cores Intel x86 # of Cores Per Package 25 20 15 10 5 0 200520062008200920102012201320142016 Date 5

Increasing Parallelism: Vector width 6 vector register size (bits) 500 400 300 200 100 x86 ARM 0 1995 1997 2000 2003 2005 2008 2011 2014 2016 Date

Increasing Parallelism: ILP 7 x86 Intel 32-bit adds per cycle 4 3 2 1 0 1978 1983 1988 1994 1999 2005 2010 2016

Limits: Parallelism 20 Amdahl s Law Speedup (1=serial) 15 10 5 0% serial 5% serial 10% serial 25% serial 50% serial 10 20 30 40 50 60 Degree of Parallelism (1=serial) 8

Limits: Communication 9 [Balfour et al, Operand Registers and Explicit Operand Forwarding, 2009.] [Malladi et al, Towards Energy-Proportional Datacenter Memory with Mobile DRAM, 2012.] DDR3 DRAM (32-bit read/write) full utilization 2 300 000 fj 4300 low utilization 7 700 000 fj 15000

Increasing Efficiency: Specialization 10 Task/workload-specific coprocessors or instructions Maybe reconfigurable? Heterogeneous systems different parts for different types of computation

Interlude: Logistics 11 Paper reviews approx 2/class Homeworks programming assignments Exam end of semester

Textbook? 12 Primarily paper readings Classic + some newish papers Reference: Hennessy and Patterson, Computer Architecture: A Quantitative Approach

Paper Reviews 13 What was your most significant insight from the paper? What evidence does the paper have to support this insight? What is the weakest part of the paper or how could it be approved? What topic from the paper would you like to see discussed in class (if any)?

Paper Reviews 13 What was your most significant insight from the paper? Might not be what the authors put in their abstract/conclusion What evidence does the paper have to support this insight? What is the weakest part of the paper or how could it be approved? What topic from the paper would you like to see discussed in class (if any)?

Paper Discussions 14 and not paper lectures. Requires your cooperation.

Homeworks 15 Individual programming + writing assignments First on memory hierarchy available now. Second to be announced likely on superscalar Third to be announced likely GPU programming

Homework 1 16 Description on course website (linked off Collab) Memory system parameters by benchmarking

Homework 1 16 Description on course website (linked off Collab) Memory system parameters by benchmarking Example: 32K cache means accessing 32K repeatedly is faster than 128K repeatedly.

Homework 1: Disclaimer 17 This is probably hard Modern memory hierarchies are complicated Documentation is incomplete Mainly looking for: measurement technique that should work If it doesn t, try to come up with good reasons why

Exam 18 There will be in final, probably in-class. Cover material from papers, homeworks, discussions in class

Exceptions / etc. 19 Need accommodations please ask Disability accommodations Student Disability Access Center

Asking Questions 20 Piazza (linked of Collab) Office Hours: Instructor Lecturer Charles Reiss TA Luowan Wang Loation Soda 205 TBA Times Monday 1PM 3PM Tuesday 1PM 2PM Friday 10AM noon Email: creiss@virginia.edu

Survey 21 linked off Collab anonymous please do it

Preview of coming topics 22

Memory hierarchy 23 caching review(?) and advanced techniques homework 1

Pipelining different parts of multiple instructions at the same time more advanced topics: handling exceptions Image: Wikimedia commons / Poil 24

Increasing Parallelism: ILP 25 x86 Intel 32-bit adds per cycle 4 3 2 1 0 1978 1983 1988 1994 1999 2005 2010 2016

Beyond pipelining: Multiple issue 26 starting multiple instructions at the same time allows cycles per instruction < 1

Beyond pipelining: Out-of-order 27 run next instruction despite stall of prior one slow cache read-after-write hazard... speculation guess outcome of branch/load/etc. fix later if wrong

Increasing Parallelism: Cores Intel x86 # of Cores Per Package 25 20 15 10 5 0 200520062008200920102012201320142016 Date 28

Multiprocessor/multicore 29 connecting processors together shared memory multiple threads synchronization

Increasing Parallelism: Vector width 30 vector register size (bits) 500 400 300 200 100 x86 ARM 0 1995 1997 2000 2003 2005 2008 2011 2014 2016 Date

Vector/SIMD/GPUs 31 single instruction/multiple data started with early supercomputers basis of GPU programming model

Specialization 32 using custom chips (or circuits within chips) reconfigurable processors (e.g. FPGAs)

Miscellaneous topics 33 hardware security warehouse-scale computers... depends on time Suggestions?

Papers for Next Class 34 Alan Smith s review of caching in 1982 D. J. Bernstein s timing attack and suggestions to computer architects in 2005 Note: We re not reading this to learn about AES