Fundamentals of Computer Design

Similar documents
Fundamentals of Computers Design

Fundamentals of Quantitative Design and Analysis

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

EECS4201 Computer Architecture

Lecture 1: Introduction

Exploitation of instruction level parallelism

Shared Symmetric Memory Systems

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

When and Where? Course Information. Expected Background ECE 486/586. Computer Architecture. Lecture # 1. Spring Portland State University

Advanced cache memory optimizations

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Outline Marquette University

CSE 502 Graduate Computer Architecture

Chap. 4 Multiprocessors and Thread-Level Parallelism

Computer Architecture. Fall Dongkun Shin, SKKU

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Computer Architecture

Computer Architecture!

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Computer Organization and Components

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Top500 Supercomputer list

45-year CPU Evolution: 1 Law -2 Equations

Lecture 8: RISC & Parallel Computers. Parallel computers

Introduction to Computer Architecture II

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Computer Architecture

CS3350B Computer Architecture. Introduction

The Art of Parallel Processing

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Computer Architecture Today (I)

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer Architecture Computer Architecture. Computer Architecture. What is Computer Architecture? Grading

WHY PARALLEL PROCESSING? (CE-401)

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Multi-core Programming - Introduction

Computer Architecture. R. Poss

Processor Performance and Parallelism Y. K. Malaiya

Computer Architecture!

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

Chapter 1: Fundamentals of Quantitative Design and Analysis

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

An Introduction to Parallel Architectures

Computer Architecture

RAID 0 (non-redundant) RAID Types 4/25/2011

anced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer

Introduction to Parallel Programming

Multicore Hardware and Parallelism

Figure 1-1. A multilevel machine.

Lecture 7: Parallel Processing

EITF20: Computer Architecture Part1.1.1: Introduction

Computer Architecture

ECE 2162 Intro & Trends. Jun Yang Fall 2009

Advanced Computer Architecture

Intel Enterprise Processors Technology

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Microelectronics. Moore s Law. Initially, only a few gates or memory cells could be reliably manufactured and packaged together.

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

Parallelism. CS6787 Lecture 8 Fall 2017

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Computer Architecture!

EC 513 Computer Architecture

COMPUTER ARCHITECTURE AND OPERATING SYSTEMS (CS31702)

Unit 9 : Fundamentals of Parallel Processing

Introduction II. Overview

Advanced Architecture, Third Class, Computer Science Department, CSW,

The Computer Revolution. Classes of Computers. Chapter 1

CS 426 Parallel Computing. Parallel Computing Platforms

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Lecture 7: Parallel Processing

27. Parallel Programming I

Vector Processors and Graphics Processing Units (GPUs)

COMPUTER ARCHITECTURE

Memory Systems IRAM. Principle of IRAM

CSE : Introduction to Computer Architecture

CSC 447: Parallel Programming for Multi- Core and Cluster Systems. Lectures TTh, 11:00-12:15 from January 16, 2018 until 25, 2018 Prerequisites

Parallelism and Concurrency. COS 326 David Walker Princeton University

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

Computer Systems Architecture Spring 2016

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

Parallel Systems I The GPU architecture. Jan Lemeire

Computer Architecture

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Advanced Computer Architecture Week 1: Introduction. ECE 154B Dmitri Strukov

Processor Architecture and Interconnect

Fundamentals of Computer Design

Chapter 1: Perspectives


27. Parallel Programming I

DEPARTMENT OF ECE IV YEAR ECE EC6009 ADVANCED COMPUTER ARCHITECTURE LECTURE NOTES

EECS2021E EECS2021E. The Computer Revolution. Morgan Kaufmann Publishers September 12, Chapter 1 Computer Abstractions and Technology 1

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

Transcription:

Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 1/45

Introduction 1 Introduction 2 Historic perspective 3 Classification 4 Parallelism 5 Computer Architecture 6 Conclusion cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 2/45

Introduction Computer Architecture The term architecture is used here to describe the attributes of a system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flow and controls, the logical design, and the physical implementation. Source: http://commons. wikimedia.org/wiki/file: Amdahl_march_13_2008.jpg Gene Amdahl et al. Architecture of the IBM System. IBM Journal of Research and Development Vol 8 (2) pp. 87-101. 1964. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 3/45

Introduction What is Computer Architecture? Computer Architecture is the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. WWW Computer Architecture Page. Computer architecture is not about using computers to design buildings. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 4/45

Introduction Why study Computer Architecture? No computers No Computer Engineers. To understand trends for next decade. How will computers be in the future? What will they be able to do? To understand computers limitations. What can be done? What can t be done? Where are the performance limitations? cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 5/45

Introduction The Moore s Law Number of transistors per chip duplicated every N months. With 12 <N <24. Gordon Moore, 1965. Observations: 1 Obtained from experimental data Empirical Law. 2 Still valid today. 3 It does not need to directly translate into performance increases. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 6/45

Introduction Transistors per chip Activity: Please, read the full article. Source: The free lunch is over. Herb Sutter. http://www.gotw.ca/ publications/ concurrency-ddj.htm cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 7/45

Introduction Effects of RISC emergence Available capacity improvement. A high-end microprocessor is more powerful than a supercomputer ten years before. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 8/45

Introduction Cost versus performance The cost/performance improving ratio leads to new classes of computers. 80 s: PC and workstations. 00 s: Smart-phones and tablets. More large scale data centers with thousand of nodes giving a single system image. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 9/45

Introduction The RISC revolution Continuous semiconductor improvement led to microprocessor based computers to become dominant. Minicomputers disappear. Mainframes and supercomputers built as a collection of microprocessors. Sustained increase in performance from 1986 to 2003: 52% per year. No longer true! cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 10/45

Historic perspective 1 Introduction 2 Historic perspective 3 Classification 4 Parallelism 5 Computer Architecture 6 Conclusion cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 11/45

Historic perspective First revolution: Microprocessor The microprocessor revolution. Generated from a single change. Enough transistors (25,000) in a single chip for a 16-bit processor. Advantages: Faster: Less accesses out of the chip. Cheaper: All in one chip. New market segments generated by the innovation. Desktop computers, CD/DVD, laptops, video-game consoles, set-top-boxes, digital cameras, MP3, GPS,... Impact on existing markets. Supercomputers, mainframes,... cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 12/45

Historic perspective First microprocessor Intel 4004 (1971). Application domain: Calculators. Technology: 10,000 nm. Data: 2300 transistors. 13 mm2 108 KHz 12 Volts Features: 4-bits data. Data-path in one cycle. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 13/45

Historic perspective Second revolution Extract implicit instruction-level parallelism (ILP). Hardware has resources that can be used in parallel. Elements: Pipelining: Allowed increasing clock frequency. Caches: Needed to increase clock frequencies. Floating point: Integrated into the chip. Increasing pipeline depth and speculative branching. Multiple issue: superscalar architectures. Dynamic scheduling: Out-of-Order execution. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 14/45

Historic perspective Top single core processors Intel Pentium 4 (2003). Application domain: Desktop / Servers. Technology: 90 nm (1/100x). Data: 55M transistors (20,000x). 101 mm 2 (10x). 3.4 GHz (10,000x). 1.2 Volts (1/10x). Features: 32/64-bit data (16x). Data path with 22 pipeline stages (later 31). 3-4 instructions per cycle (superscalar). Two level cache on chip. Data parallel instructions (SIMD). Hyper-threading. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 15/45

Historic perspective Third revolution Explicitly support thread-level and data-level parallelism. Hardware offers parallel resources and software specifies its use. Parallelism no longer hidden by hardware. Rationale: Diminishing returns from ILP. Elements: Vector instructions: Intel SSE. General support for multi-threaded applications. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 16/45

Historic perspective Multi-core processors Intel Core i7 (2009). Application: Desktop / Server. Technology: 45 nm (1/2x). Data: 774M transistors (12x). 296 mm 2 (3x). 3.2 GHz 3.6 GHz ( 1x). 0.7 1.4 Volts ( 1x). Features: 128-bit data (2x). Datapath with 14-stage pipeline (0.5x). 4 instructions per cycle ( 1x). Three level cache on chip. Data parallel instructions (SIMD). Hyper-threading. 4 cores (4x). cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 17/45

Historic perspective Architecture trends Instruction-Level Parallelism. Parallel execution of instructions. Impossible to improve Instruction-Level parallelism (since 2003-2005). Hardware and compiler conspiration to hide details to programmer. Programmer with very simplified view of hardware. New models to improve performance. Data-Level Parallelism (DLP). Thread-Level Parallelism (TLP). Request-Level Parallelism (RLP). IMPORTANT: All of them require to re-structure applications to take advantage of promised performance increase. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 18/45

Classification 1 Introduction 2 Historic perspective 3 Classification 4 Parallelism 5 Computer Architecture 6 Conclusion cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 19/45

Classification Personal Mobile Devices Wireless devices with multimedia UI. Mobile devices, tablets,... Price: $100 $1000. Processor price: $10 $100. Critical factors: Cost. Energy. Media performance. Responsiveness. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 20/45

Classification Desktop Designed to offer good performance to end-user. From ultra-books to workstations. Since 2008 more than 50% are laptops. Price: $300 $2500. Processor price: $50 $500. Critical factors: Price-Performance. Energy. Graphics performance. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 21/45

Classification Servers Used to run large scale applications and give service to multiple users simultaneously. Growing since 80 s replacing mainframes. Price: $5,000 $10,000,000. Processor price: $200 $2,000. Critical factors: Throughput. Availability. Scalability. Energy. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 22/45

Classification Clusters / Warehouse Scale Computers (WSC) A collection of computers connected through a LAN and acting as a larger computer. Made much more popular by SaaS (Software as a Service) growth. Each node runs its own operating system. WSC 10,000+ nodes. Price: $100,000 $200,000,000. Processor price: $50 $250. Critical factors: Price-Performance. Throughput. Energy proportionality. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 23/45

Classification Embedded Computer within another system to run pre-established applications. Dish-washer, video-game console, MP3,... Processor price: $0.01 $100. Critical factors: Price. Energy. Application-specific performance. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 24/45

Classification Sales (2010) Class Units Sold Portable Mobile Devices 1,800,000,000 Desktop 350,000,000 Servers 20,000,000 Embedded 19,000,000,000 cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 25/45

Parallelism 1 Introduction 2 Historic perspective 3 Classification 4 Parallelism 5 Computer Architecture 6 Conclusion cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 26/45

Parallelism The speed of a serial computer A 1 TFLOP (10 12 FLOPS) sequential machine: Data must travel a certain distance (r) from memory to CPU. 1 data element per cycle 10 12 times per second 10 12 s. per cycle. Data traveling at light speed: c = 3 10 8 m/s. r = 3 10 8 10 12 = 0.3mm 1 TB of data in 0.3 mm 2 surface: Each data to be stored in 3 Angmstroms (approx.) The size of a small atom! CONSEQUENCE: Sequential approach not feasible. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 27/45

Parallelism Parallelism Kinds All computers have cost and energy restrictions. Parallelism emerges as main mechanism to design computers. Types of application parallelism: Data parallelism: Same operation applied to many data. Task parallelism: Tasks operate independently and in parallel. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 28/45

Parallelism Hardware parallelism ILP: Instruction-Level Parallelism. Exploits data-parallelism with compiler help (pipeline, speculative execution,... ). Vector architectures and GPUs. Exploit data parallelism applying the same instruction to several data in parallel. TLP: Thread-Level Parallelism. Exploits data or task parallelism in highly coupled hardware. Allows inter-thread interactions. RLP: Request-Level Parallelism. Exploits parallelism among highly decoupled tasks. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 29/45

Parallelism Flynn Taxonomy (1966) A classification of possible parallel architectures. SISD: Single Instruction / Single Data Stream. Mono-processor. May use ILP techniques. SIMD: Single Instruction / Multiple Data Stream. Same instructions executed by different processors on distinct data items. Alternatives: Vector processors, multimedia extensions, and GPUs. MISD: Multiple Instructions / Single Data Stream. No known commercial implementation. MIMD: Multiple Instructions / Multiple Data Stream. Each processor operates on its own data items Task Parallelism. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 30/45

Parallelism More about MIMD Variety in MIMD architectures: Highly coupled architectures. TLP (Thread-Level Parallelism): Multi/Many-core architectures. Weakly coupled architectures: RLP (Request-Level Parallelism): Clusters and WSCs. MIMD is: More flexible and general than SIMD. More expensive than SIMD. Requires enough task granularity. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 31/45

Computer Architecture 1 Introduction 2 Historic perspective 3 Classification 4 Parallelism 5 Computer Architecture 6 Conclusion cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 32/45

Computer Architecture Architecture Views Computer design is complex. Determine important attributes. Maximize performance and energy efficiency with control over cost, power, and availability. Different views on architecture design: Instruction set design. Functional organization. Logic design. Implementation: Circuit design, packaging, cooling,... Integration with compilers, operating systems,... cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 33/45

Computer Architecture ISA: Instruction Set Architecture ISA: Part of the architecture which is visible to the programmer. Available instructions. Registers number and type. Instructions format. Addressing modes. Exception conditions. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 34/45

Computer Architecture ISA Kinds Almost every current ISA use general purpose registers. Popular versions: Register-Memory ISA: Many instructions may access main memory. e.g.: Intel 80x86. Load-Store ISA: Only load/store instructions can access main memory. Other instructions operate on registers. e.g.: ARM, MIPS. Most recent ISAs are load-store. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 35/45

Computer Architecture Memory addressing Types: Byte addressing: Address identifies a byte within memory. Word addressing: Address identifies a word within memory. Most popular addressing: Byte addressing. Variants: All accesses must be aligned (e.g. MIPS). Aligned accesses are faster (e.g. 80x86). cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 36/45

Computer Architecture Addressing modes MIPS: Register. Immediate. Relative to register. 80x86: Register, immediate, relative to register. Absolute. Base register with displacement (2 registers). Base register with scaled index and displacement.... cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 37/45

Computer Architecture Operand types and sizes Integers: 8 bits (character). 16 bits (Unicode character). 32 bits (integer / word). 64 bits (long integer / double word). Floating point: 32 bits (single precision). 64 bits (double precision). 80 bits (80x86 extended double precision). cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 38/45

Computer Architecture Operations Categories: Data transfer, arithmetic, logic, control, and floating point. MIPS: Simple instructions, easy to implement in a pipeline (RISC). 80x86: Many more instructions. Variable complexity. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 39/45

Computer Architecture Flow control instructions Categories: Conditional branching, jumps, procedure calls and returns. MIPS: PC-relative addressing. Conditions on register values. Procedure calls place return value in register. 80x86: PC-relative addressing. Conditions on bits with condition codes. Procedure calls through stack. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 40/45

Computer Architecture Instruction coding Fixed length MIPS. All instructions are 32 bit. Simplified instruction decode. Variable length 80x86. Length from 1 to 18 bytes. Reduces program size. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 41/45

Conclusion 1 Introduction 2 Historic perspective 3 Classification 4 Parallelism 5 Computer Architecture 6 Conclusion cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 42/45

Conclusion Summary Moore s law still alive. But The free lunch is over. New models improve performance (DLP, TLP, RLP) but require application restructuring. Diversity in computer classes with varied properties and requirements. PMD, Desktop, Servers, WSC, Embedded. Emerging architectures combining SIMD and MIMD. ISA is different from architecture. cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 43/45

Conclusion References Computer Architecture. A Quantitative Approach 5th Ed. Hennessy and Patterson. Sections 1.1, 1.2, and 1.3. The Free Lunch is over. Herb Sutter. http://www.gotw.ca/publications/concurrency-ddj.htm Welcome to the Jungle. Herb Sutter. http://herbsutter.com/welcome-to-the-jungle/ cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 44/45

Conclusion Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid cbed Computer Architecture ARCOS Group http://www.arcos.inf.uc3m.es 45/45