Chapter 18 - Multicore Computers

Similar documents
Chapter 17 - Parallel Processing

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

Chapter 20 - Microprogrammed Control (9 th edition)

Chapter 4 - Cache Memory

Chapter 6 - External Memory

Comp. Org II, Spring

Parallel Processing & Multicore computers

Online Course Evaluation. What we will do in the last week?

Chapter 14 - Processor Structure and Function

Computer Architecture

Multi-core Architectures. Dr. Yingwu Zhu

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Comp. Org II, Spring

An Introduction to Parallel Programming

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Caches Concepts Review

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Computer Systems Architecture

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Designing for Performance. Patrick Happ Raul Feitosa

Control Hazards. Branch Prediction

CSE 392/CS 378: High-performance Computing - Principles and Practice

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Lecture 1: Gentle Introduction to GPUs

Lec 25: Parallel Processors. Announcements

Show Me the $... Performance And Caches

Multicore Programming

Παράλληλη Επεξεργασία

Copyright 2012, Elsevier Inc. All rights reserved.

Operating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

Adapted from David Patterson s slides on graduate computer architecture

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Assuming ideal conditions (perfect pipelining and no hazards), how much time would it take to execute the same program in: b) A 5-stage pipeline?

Copyright 2012, Elsevier Inc. All rights reserved.

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Mo Money, No Problems: Caches #2...

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

Parallel Computing. Parallel Computing. Hwansoo Han

Copyright 2012, Elsevier Inc. All rights reserved.

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

High Performance Computing Systems

Chapter 5 - Input / Output

Chapter 1 - Introduction

Multi-core Programming Evolution

Lecture 2: Memory Systems

27. Parallel Programming I

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Introduction II. Overview

Introduction to parallel computing

Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

Computer Systems Architecture

Lecture 28 Introduction to Parallel Processing and some Architectural Ramifications. Flynn s Taxonomy. Multiprocessing.

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

Parallelism via Multithreaded and Multicore CPUs. Bradley Dutton March 29, 2010 ELEC 6200

Parallel Algorithm Engineering

Keywords and Review Questions

27. Parallel Programming I

Computer Architecture Lecture 27: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Tools and techniques for optimization and debugging. Fabio Affinito October 2015

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

27. Parallel Programming I

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Caches. Samira Khan March 23, 2017

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

Control Hazards. Prediction

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

Parallel Architecture. Hwansoo Han

Parallelism. CS6787 Lecture 8 Fall 2017

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

TDT 4260 lecture 3 spring semester 2015

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Unit 8: Superscalar Pipelines

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

Message-Passing Shared Address Space

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

CIT 668: System Architecture. Computer Systems Architecture

Chap. 4 Multiprocessors and Thread-Level Parallelism

LECTURE 5: MEMORY HIERARCHY DESIGN

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

Parallel programming. Luis Alejandro Giraldo León

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)

Review: Creating a Parallel Program. Programming for Performance

Multicore Hardware and Parallelism

Transcription:

Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28

Table of Contents I 1 2 Where to focus your study Luis Tarrataca Chapter 18 - Multicore Computers 2 / 28

Table of Contents I 3 References Luis Tarrataca Chapter 18 - Multicore Computers 3 / 28

Up until now we have seen several methods for increasing performance. Can you name a few? Luis Tarrataca Chapter 18 - Multicore Computers 4 / 28

Frequency Cache Easy way out =) Inherent vantages and disadvantages. Can you name them? High speed memory; Idea: diminish the number of reads from low latency memory (RAM, HD, etc); Very expensive! Pipeline; Idea: assembly line for instructions; Leads to performance increases; But also introduces a host of new considerations (hazards); Luis Tarrataca Chapter 18 - Multicore Computers 5 / 28

Parallel Computation SMP; Clusters; Vector Computation: GPGPU; Intel XEON Phi. Luis Tarrataca Chapter 18 - Multicore Computers 6 / 28

Today we will look at another form of parallel computation: multicore systems Combines two or more processors (cores) on a single piece of silicon. Each core is an independent processor: Registers; ALU; Pipeline; Luis Tarrataca Chapter 18 - Multicore Computers 7 / 28

The obvious question to pose is why multicore systems? Luis Tarrataca Chapter 18 - Multicore Computers 8 / 28

The obvious question to pose is why multicore systems? Pipelines have hazards and resource dependencies; Using multiple threads over a set of pipelines also introduces complexities; Larger coordinating complexity and signal transfer logic: Increases difficulty of designing, fabricating, and debugging the chips. Also increases power consumption... Luis Tarrataca Chapter 18 - Multicore Computers 9 / 28

Amdahl s law states that: Speed-up = = time to execute program on a single processor time to execute program on N parallel processors 1 (1 f)+ f N f fraction that involves code that is infinitely parallelizable with no scheduling overhead. (1 f) fraction that involves code that is inherently sequential. Luis Tarrataca Chapter 18 - Multicore Computers 10 / 28

Lets look at what happens with different values of(1 f): Figure: Speedup with 0%, 2%, 5%, and 10% sequential portions (Source: [Stallings, 2015]) Conclusion: Small amount of serial code has a noticeable impact! Luis Tarrataca Chapter 18 - Multicore Computers 11 / 28

But in real computers we have: Figure: Speedup with 0%, 2%, 5%, and 10% sequential portions (Source: [Stallings, 2015]) Performance peaks then starts losing performance due to these variables. Luis Tarrataca Chapter 18 - Multicore Computers 12 / 28

Why do you think this performance peak happens? Any ideas? Luis Tarrataca Chapter 18 - Multicore Computers 13 / 28

Why do you think this performance peak happens? Any ideas? Multiple processors result in overheads required to; Communicate; Distribution work; Cache coherence overhead. Luis Tarrataca Chapter 18 - Multicore Computers 14 / 28

So what type of applications work well with multicore systems? Any ideas? Luis Tarrataca Chapter 18 - Multicore Computers 15 / 28

So what type of applications work well with multicore systems? Any ideas? Applications: where f 1; where overheads are smaller; DBMS are particularly well suited. Still not perfect... Figure: Scaling of Database Workloads on Multiple-Processor Hardware (Source: [Stallings, 2015]) Luis Tarrataca Chapter 18 - Multicore Computers 16 / 28

Some of the main variables in a multicore organization are as follows: Number of core processors on the chip; Number of levels of cache memory; Amount of cache memory that is shared. Luis Tarrataca Chapter 18 - Multicore Computers 17 / 28

Lets look at some architectures (1/2): Figure: Dedicated L1 cache - Ex: ARM11 MPCore (Source: [Stallings, 2015]) Figure: Dedicated L2 cache - Ex: AMD Opteron (Source: [Stallings, 2015]) L1-D data cache; L1-I instruction cache; Luis Tarrataca Chapter 18 - Multicore Computers 18 / 28

Lets look at some architectures (2/2): Figure: Shared L2 cache - Ex: Intel Core Duo (Source: [Stallings, 2015]) Figure: Shared L3 cache - Ex: Intel Core i7 (Source: [Stallings, 2015]) L1-D data cache; L1-I instruction cache; Luis Tarrataca Chapter 18 - Multicore Computers 19 / 28

Why is the use of a shared cache good? Any ideas? Luis Tarrataca Chapter 18 - Multicore Computers 20 / 28

Why is the use of a shared cache good? Any ideas? 1 Constructive Interference: One core may cache a line that will later be used by another core; 2 Cache Coherence: No need to guarantee cache coherence since there is a single cache; 3 Dynamic Cache Allocation: Amount of shared cache allocated to each core is dynamic; 4 Intercore communication: Easy to implement with a shared memory; Luis Tarrataca Chapter 18 - Multicore Computers 21 / 28

Why is the use of an hierachical cache good? Luis Tarrataca Chapter 18 - Multicore Computers 22 / 28

Why is the use of an hierachical cache good? Cache latency vs. hit rate Tradeoff: Larger caches have better hit rates but longer latency; Cost vs Space Tradeoff: Faster caches are more expensive and have less space; Slower caches are cheaper and have more space; Luis Tarrataca Chapter 18 - Multicore Computers 23 / 28

To address these tradeoffs computers use multiple cache levels: Small fast caches backed up by larger slower caches; Check if data is on the first levels: Cache hit: Core proceeds at high speed; Cache miss: Try next cache levels before accessing external memory. Luis Tarrataca Chapter 18 - Multicore Computers 24 / 28

As a curiosity lets take a look at the AMD Bulldozer memory hierarchy: Figure: Memory hierarchy of an AMD Bulldozer server. (Source: Wikipedia) Luis Tarrataca Chapter 18 - Multicore Computers 25 / 28

Where to focus your study Where to focus your study After this class you should be able to: Understand the hardware performance issues that have driven the move to multicore computers; Understand the software performance issues posed by the use of multithreaded multicore computers. Luis Tarrataca Chapter 18 - Multicore Computers 26 / 28

Where to focus your study Less important to know how these solutions were implemented: details of specific hardware solutions. Your focus should always be on the building blocks for developing a solution =) Luis Tarrataca Chapter 18 - Multicore Computers 27 / 28

References References I Stallings, W. (2015). Computer Organization and Architecture. Pearson Education. Luis Tarrataca Chapter 18 - Multicore Computers 28 / 28