EE/CSCI 451: Parallel and Distributed Computation

Size: px
Start display at page:

Download "EE/CSCI 451: Parallel and Distributed Computation"

Transcription

1 EE/CSCI 451: Parallel and Distributed Computation Lecture #2 1/17/2017 Xuehai Qian University of Southern California 1

2 Outline Opportunities and challenges in exploiting parallelism Course outline High level view of Processor Core Organization Chapter 2.1, 2.2 in the text (Implicit parallelism) Superscalar Very Long Instruction Word (VLIW) processor Multi-core processor organization 2

3 Announcements HW #1 out next Wednesday, due on Jan 31 (check course website) Programming HW #1 out this Friday, due on Jan 26(check course website) Piazza account has been set up ( Course website: 3

4 Midterms and Final Midterm I: 24 th Feb in lab session Midterm II: 31 st March in lab session Final, as scheduled by the University No makeup exams 4

5 Course Dependencies EE 457 CS 402 EE 477 EE 451 EE 560 EE 557 CS 503 CS 596 CS 570 EE 653 EE 657 EE 659 CS 653 EE 677 REQUIRED SUGGESTED 5

6 Challenges (1) Processor - Memory performance gap Memory (DRAM) Access time 5 nsec Cache Processor 2 GHz 0.5 nsec 6

7 Challenges (2) Limit to achievable speedup Amdahl s Law Serial time = 1, Best Parallel time =, Program 1 unit α 1-α Serial Parallel α 1 Speedup = Serial time Parallel time = 1. Example: 50 % of the code is parallizable, Speedup 2 (no matter how many processors are used) 7

8 Challenges (3) Co-ordination Do in parallel P I, P J,, P K end overhead =? Synchronization P I P J P K Barrier 8

9 Challenges (4) Communication Compute Communicate Wait? Interconnection Network 9

10 Challenges (5) Load balance Example: Matrix multiplication P Z[ compute output matrix ij, 0 i, j < p = number of processors p (0,0) n n n p n p n n n n ( P 1, P 1) 10

11 Challenges (6) Key performance metrics jklzmn ozpk Speedup = qmlmnnkn ozpk Given p > 1 processors, can we get p fold improvement performance? Scalability 11

12 Course Outline (1) Overview / Course Introduction Architectural Principles for Application Developers Pipelined processor organization: data and control hazards, ILP, out of order execution, multithreading. Memory systems: DRAM organization, cache organization. Impact on software performance, locality, multithreading, prefetching. 12

13 Course Outline (2) Analytical Models for Parallel Systems Architecture performance metrics: CPI, MIPS, SpecMark. Software performance benchmarks: Peak performance, sustained performance, LinPack, Bandwidth benchmarks. Limits on achievable performance, Amdahl s Law, scalability definitions, work optimality, Iso-efficiency function, Order notation. 13

14 Course Outline (3) Analytical Models for Parallel Systems(cont.) Communication costs in parallel machines: start-up cost, throughput, latency. Routing mechanisms: packet routing, cut through, virtual channels. Modeling message passing and shared address space machines. Data layouts and graph embeddings. Multi-core, many-core and GPU architectures, data parallel programming abstraction of GPUs. 14

15 Course Outline (4) PRAM and Data Parallel Algorithms PRAM model of computation, Brent s theorem, other models of computation, illustrative examples Max, Scan operations Recursive doubling, graph algorithms Performance analysis, scalability, relation ship to practical platforms Sorting, FFT 15

16 Course Outline (5) Basic Communication Primitives Broadcast and all-to-all, communication costs on various topologies Personalized communication, Reduce, prefix sum, and scatter and gather Graph embeddings 16

17 Course Outline (6) Message Passing Programming Model Message passing abstraction, send receive primitives, blocking and non-blocking commands, collective operations Illustrative examples: Canon s algorithm, overlapping computation and communication, Odd- even merge sort, Review for Midterm 2 Shared Address Space Programming Models Pthreads, OpenMP, Illustrative examples 17

18 Course Outline (7) Parallel Dense Algebra Matrix vector, matrix matrix computations Parallel Dense Algebra Parallel Search and Sorting Parallel search, illustrative example applications, throughput optimization, Multi-dimensional search, decision tree and decomposition Sorting techniques, Bitonic sort, row-column sort, Mapping onto parallel architectures 18

19 Course Outline (8) Cloud, Big Data and Map Reduce Cloud as a computing platform, Large data sets and organization Discovery of knowledge, annotation of data Computational and programming models for Big Data Map Reduce as a parallel programming model, Hadoop, Spark Illustrative examples from data science domain 19

20 Acknowledgement Lecture notes compiled over the years My research team Course TA and mentors during Spring 14, 15, and 16 EE451 Students in Spring 14, 15, and 16 Vipin Kumar, Ananth Grama 20

21 Research Projects Research High Performance Networking Big Data Analysis Social Network Analysis Energy Efficient Computing Webpages ceng.usc.edu/~prasanna/

22 Related Courses EE/CSCI 451 Introduction to Parallel and Distributed Computation (Prasanna) EE 454 Introduction to Systems-on-Chip (Bogdan) EE 457 Computer Systems Organization (Puvvada) CSCI 402 Operating Systems (Cheng) EE 557 Computer Systems Architecture (Annavaram/Dubois) EE 560L Digital System Design Tools and Techniques (Puvvada) EE 532 Wireless Internet and Pervasive Computing (Hwang) EE 533 Network Processor Programming and Design (Cho) CSCI 558L Internetworking and Distributed System Laboratory (Cho) CSCI 570 Analysis of Algorithms (Shamsian/Adamchik) CSCI 596 Scientific Computing and Visualization (Nakano) EE 653 Advanced Topics in Microarchitecture (Dubois) CSCI 653 High Performance Computing and Simulations (Nakano) EE 659 Interconnection Networks (Pinkston) EE 657 Parallel and Distributed Computing (Dubios) EE 677 VLSI Architectures and Algorithms (Prasanna) 22

23 Sample Course work with focus on Computer Architecture EE/CSCI 451 Introduction to Parallel and Distributed Computation (Prasanna) EE 454 Introduction to Systems-on-Chip (Bogdan) EE 457 Computer Systems Organization (Puvvada) CSCI 402 Operating Systems (Cheng) EE 557 Computer Systems Architecture (Annavaram/Dubois) EE 533 Network Processor Programming and Design (Cho) EE 599 Cyber-Physical Systems (Bogdan) CSCI 570 Analysis of Algorithms (Shamsian/Adamchik) EE 653 Advanced Topics in Microarchitecture (Dubois) EE 659 Interconnection Networks (Pinkston) EE 677 VLSI Architectures and Algorithms (Prasanna) 23

24 Sample course work with focus on Hardware, embedded systems EE/CSCI 451 Introduction to Parallel and Distributed Computation (Prasanna) EE 454 Introduction to Systems-on-Chip (Bogdan) EE 457 Computer Systems Organization (Puvvada) EE 477 MOS VLSI Circuit Design (Nazarian) EE 577a VLSI System Design (Pedram/Nazarian) EE 577b VLSI System Design (Pedram) EE 560L Digital System Design Tools and Techniques (Puvvada) EE 532 Wireless Internet and Pervasive Computing (Hwang) EE 533 Network Processor Programming and Design (Cho) EE 599 Cyber-Physical Systems (Bogdan) CSCI 570 Analysis of Algorithms (Shamsian/Adamchik) EE 677 VLSI Architectures and Algorithms (Prasanna) 24

25 Sample course work with focus on parallel and scientific computing EE/CSCI 451 Introduction to Parallel and Distributed Computation (Prasanna) EE 454 Introduction to Systems-on-Chip (Bogdan) EE 457 Computer Systems Organization (Puvvada) CSCI 402 Operating Systems (Cheng) EE 557 Computer Systems Architecture (Annavaram/Dubois) EE 542 Internet and Cloud Computing (Hwang) CSCI 570 Analysis of Algorithms (Shamsian/Adamchik) CSCI 596 Scientific Computing and Visualization (Nakano) CSCI 653 High Performance Computing and Simulations (Nakano) 25

26 Sample course work with focus on distributed systems and big data EE/CSCI 451 Introduction to Parallel and Distributed Computation (Prasanna) EE 450 Introduction to Computer Networks (Zahid/Touch) EE 542 Internet and Cloud Computing (Hwang) CSCI 402 Operating Systems (Cheng) CSCI 555 Advanced Operating Systems (Govindan) CSCI 567 Machine Learning (Sha/Liu) CSCI 570 Analysis of Algorithms (Shamsian/Adamchik) CSCI 585 Database Systems (Raghavachary/Ghyam) CSCI 657 Advanced Distributed Systems (Lloyd) 26

27 CSCI 596: Scientific Computing and Visualization (4 Units) Overview: Particle and continuum simulations are used as a vehicle to learn basic elements scientific computing & visualization. Hands-on experience in: 1) formulating a mathematical model to describe a physical phenomenon; 2) discretizing the model (differential or integral equations) into algebraic forms in order to allow numerical solution; 3) designing/analyzing numerical algorithms to solve the algebraic equations on parallel computers; 4) translating the algorithms into a program; 5) performing a computer experiment by executing the program; 6) visualizing simulation data in an immersive and interactive virtual environment; and 7) managing/mining large datasets. Instructor: Aiichiro Nakano, anakano@usc.edu Prerequisite: Basic knowledge of programming, data structures, linear algebra, and calculus Intended Audience: MS students Course Website: cacs.usc.edu/education/cs596.html Offered: Fall 14, Fall 15, Fall

28 CSCI 653: High Performance Computing & Simulations (4 Units) Overview: Provide students with advanced techniques that are common to high performance computer simulations in science and engineering. Scalable algorithms for both deterministic and stochastic simulations of particles and continuum will be implemented on massively parallel and distributed computing platforms, and the simulation datasets will be visualized and analyzed in immersive and interactive virtual environment. Instructor: Aiichiro Nakano, Prerequisite: CS 596 Intended Audience: PhD students Course Website: cacs.usc.edu/education/cs653.html Offered: Fall

29 CSCI 570: Analysis of Algorithms (4 Units) Overview: The course is intended as a first graduate course in the design and analysis of algorithms. The main focus is on developing an understanding for the major algorithm design techniques. At times, the practical side of algorithm design is also explored with interesting examples of their usage in solving industry problems. Instructor: S. Shamsian, sshamsia@usc.edu Prerequisite: Students in the class are expected to have a reasonable degree of mathematical sophistication, and to be familiar with the basic notions of algorithms and data structures, discrete mathematics, and probability. Undergraduate classes in these subjects should be sufficient. Textbook: Algorithm Design, Jon Kleinberg/Eva Tardos Offered: Spring, Summer, Fall 29

30 EE 677: VLSI Architectures and Algorithms (3 Units) Overview: This course focuses on reconfigurable computing based on Field Programmable Gate Arrays (FPGAs) for implementing application-specific architectures and algorithms including embedded computing applications. The course will address devices, systems, and application synthesis and analysis. Instructor: Viktor Prasanna, Prerequisite: EE 457 and CS 570 or consent of the instructor Intended Audience: MS and PhD students Course Website: ceng.usc.edu/~prasanna/ee677 Text: Papers from the literature Offered: in Spring 30 30

31 Outline Opportunities and challenges in exploiting parallelism Course outline High level view of Processor Core Organization Chapter 2.1, 2.2 in the text (Implicit parallelism) Superscalar Very Long Instruction Word (VLIW) processor Multi-core processor organization 31

32 Uniprocessor (Serial Processor) RAM (Random Access Machine) A simple view Memory Fetch Execute Processor Memory access = 1 cycle Execution = 1 cycle 32

33 Pipelined implementation of processor (1) All processors are pipelined Five stage pipeline (See Hennessey Patterson, EE 457) IF ID EX MEM WB Instruction Fetch Instruction Decode/ Register file read Execute/ Address calculation Memory access Write Back 33

34 Pipelined implementation of processor(2) Ins IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM 1. Load 2. Load 3. Add R1, Add R2, Add R1, R2 6. Store WB Total # of cycles to complete execution = 10 34

35 Simple Performance Model Instruction exec. rate = clock rate = peak performance? Time to execute n ins. n / clock rate Example: 2 GHz processor core Peak performance = 2 G Ins. per second 35

36 Superscalar execution Example: 2 pipelines (Sharing hardware resources) 2 improvement? 5 stage pipeline Instruction memory Registers Functional units Data memory 5 stage pipeline 36

37 2-way super scalar execution (almost 2 Ins. Issued per cycle) Load 2. Load 3. Add R1, Add R2, Add R1, R2 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB 6. Store Total # of cycles = 8 Note: Ins. 5 and 6 cannot be issued in the same cycle 37 IF ID EX MEM WB

38 Data dependencies Output of Ins i is an input to Ins k (k > i) Potential problem: Ins k is close to Ins i during execution (during execution Ins L and Ins I are in the hardware pipeline concurrently) Example: Single Pipeline 1. Load 2. Load 3. Add R1, Add R2, Add R1, R2 Simple solution = Stall the pipeline Hardware solution = Internal Forwarding Ex.: Dependency between Ins. 4 and 5 can be handled without stalling pipeline. 38

39 Out of Order Issue Example: 2-way superscalar 1. Load 2. Add R1, Load 4. Add R2, 104 Cannot issue Ins. 1, 2 simultaneously. Cannot issue Ins. 3, 4 simultaneously However, can do: Initiate Ins. 1, Ins. 3 in clock cycle 0 Initiate Ins. 2, Ins. 4 in clock cycle 1 39

40 Dynamic Instruction Issue Code Window - Series of instructions Identify those that can be executed in parallel Note: Hardware does scheduling at run time Way of exploiting instruction level parallelism 40

41 Resource dependency Example: 2-way super scalar execution one hardware Floating Point Unit (FPU) Execution pipelines FPU Load Load FP Mult R1, 100 FP Mult R2, 104 Execution pipelines Execution hardware detects such dependencies and schedules the execution accordingly 41

42 Superscalar (1) Instruction-level parallelism Multiple function units (Ex. arithmetic logic unit, bit shifter) One or more instructions issued per clock In-order issue Out-of-order execution In-order commit 42

43 Superscalar (2) 43

44 Bulldozer Architecture (1) Microprocessor microarchitecture by AMD Two-core module Two integer execution units 2 arithmetic logic units 2 address generation units One floating point execution unit Shared L2 cache Shared instruction fetch and decode stages 44

45 Bulldozer Architecture (2) Handle 4 independent arithmetical and memory operations per clock for each integer core Issue bandwidth = 4 45

46 Sandy Bridge Microarchitecture by Intel For each Integer execution unit (core) 3 ALUs + 2 AGUs 3 ALU+ 1 AGU / 2 ALU + 2 AGU ( Note: four instruction decoders) 46

47 Very Long Instruction Word (VLIW) Processor Superscalar processor Dynamic scheduling of Ins. Scheduling hardware is expensive Dependence window size # of pipelines Do scheduling at compile time (in software) Specify parallelism as parallel activities (using long word of the instruction format) 47

48 Processor Execution Model SISD Single Instruction Single Data (Random Access Machine) Instruction Set Architecture (ISA) e.g. X86 M fetch 64 P execute 48

49 Designing and Executing a Parallel Program Performance metric Execution time Execution time depends on Compile time optimizations (Compiler level) Execution time optimizations (Processor level Implicit) Many optimizations are possible at architecture and hardware level (EE 457, EE 557) 49 Course focus Problem User Program Compile Code for target ISA Target HW e.g. matrix mult.

50 Multi-core Processor (1) AMD 6278 processor 16 cores Core Core Core 2.4 GHz L1 cache: 16 KB L1 Cache L1 Cache L1 Cache L2 cache: 2 MB L2 Cache L2 Cache L2 Cache L3 cache: 6 MB Main memory: 64 GB L3 Cache Main Memory 50

51 Multi-core Processor (2) Parallelism Pipelined execution of Instructions (each core) Instruction-level parallelism (each core) Thread-level parallelism (across cores) Data-level parallelism Loop-level parallelism 51

52 Multi-core Processor (3) Challenges (to achieving high performance) Identify and exploit explicit parallelism Cache performance Co-ordination 52

53 Summary Challenges in exploiting parallelism Course outline Core -> Multi-core -> System Implicit Parallelism 53

EE/CSCI 451 Parallel and Distributed Computation

EE/CSCI 451 Parallel and Distributed Computation EE/CSCI 451 Parallel and Distributed Computation Lecture #1 1/11/2018 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Course information Outline

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Spring 2014 Midterm Exam Review

Spring 2014 Midterm Exam Review mr 1 When / Where Spring 2014 Midterm Exam Review mr 1 Monday, 31 March 2014, 9:30-10:40 CDT 1112 P. Taylor Hall (Here) Conditions Closed Book, Closed Notes Bring one sheet of notes (both sides), 216 mm

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

ECE 587 Advanced Computer Architecture I

ECE 587 Advanced Computer Architecture I ECE 587 Advanced Computer Architecture I Instructor: Alaa Alameldeen alaa@ece.pdx.edu Spring 2015 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2015 1 When and Where? When:

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Performance, Power, Die Yield. CS301 Prof Szajda

Performance, Power, Die Yield. CS301 Prof Szajda Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016 Moore s Law CS 6534: Tech Trends / Intro Microprocessor Transistor Counts 1971-211 & Moore's Law 2,6,, 1,,, Six-Core Core i7 Six-Core Xeon 74 Dual-Core Itanium 2 AMD K1 Itanium 2 with 9MB cache POWER6

More information

CS 6534: Tech Trends / Intro

CS 6534: Tech Trends / Intro 1 CS 6534: Tech Trends / Intro Charles Reiss 24 August 2016 Moore s Law Microprocessor Transistor Counts 1971-2011 & Moore's Law 16-Core SPARC T3 2,600,000,000 1,000,000,000 Six-Core Core i7 Six-Core Xeon

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

High Performance Computing

High Performance Computing High Performance Computing CS701 and IS860 Basavaraj Talawar basavaraj@nitk.edu.in Course Syllabus Definition, RISC ISA, RISC Pipeline, Performance Quantification Instruction Level Parallelism Pipeline

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #8 2/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

The University of Texas at Austin

The University of Texas at Austin EE382 (20): Computer Architecture - Parallelism and Locality Lecture 4 Parallelism in Hardware Mattan Erez The University of Texas at Austin EE38(20) (c) Mattan Erez 1 Outline 2 Principles of parallel

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

CS 2630 Computer Organization. What did we accomplish in 15 weeks? Brandon Myers University of Iowa

CS 2630 Computer Organization. What did we accomplish in 15 weeks? Brandon Myers University of Iowa CS 2630 Computer Organization What did we accomplish in 15 weeks? Brandon Myers University of Iowa require slide from day 1 Why take 2630? The esoteric answer: Computer Science graduates should have an

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

CMSC411 Fall 2013 Midterm 1

CMSC411 Fall 2013 Midterm 1 CMSC411 Fall 2013 Midterm 1 Name: Instructions You have 75 minutes to take this exam. There are 100 points in this exam, so spend about 45 seconds per point. You do not need to provide a number if you

More information

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture

Administration. Course material. Prerequisites. CS 395T: Topics in Multicore Programming. Instructors: TA: Course in computer architecture CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.26A ACES Email: pingali@cs.utexas.edu TA: Xin Sui Email: xin@cs.utexas.edu University of Texas, Austin Fall

More information

Advanced Parallel Programming I

Advanced Parallel Programming I Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

Computer Architecture EE 4720 Midterm Examination

Computer Architecture EE 4720 Midterm Examination Name Computer Architecture EE 4720 Midterm Examination Friday, 22 March 2002, 13:40 14:30 CST Alias Problem 1 Problem 2 Problem 3 Exam Total (35 pts) (25 pts) (40 pts) (100 pts) Good Luck! Problem 1: In

More information

Trends and Challenges in Multicore Programming

Trends and Challenges in Multicore Programming Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores

More information

TDT 4260 lecture 7 spring semester 2015

TDT 4260 lecture 7 spring semester 2015 1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

CSE502 Graduate Computer Architecture. Lec 22 Goodbye to Computer Architecture and Review

CSE502 Graduate Computer Architecture. Lec 22 Goodbye to Computer Architecture and Review CSE502 Graduate Computer Architecture Lec 22 Goodbye to Computer Architecture and Review Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. 11-1 This Set 11-1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

Instructor Information

Instructor Information CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus

More information

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar, CO403 Advanced Microprocessors IS860 - High Performance Computing for Security Basavaraj Talawar, basavaraj@nitk.edu.in Course Syllabus Technology Trends: Transistor Theory. Moore's Law. Delay, Power,

More information

CS 2630 Computer Organization. What did we accomplish in 8 weeks? Brandon Myers University of Iowa

CS 2630 Computer Organization. What did we accomplish in 8 weeks? Brandon Myers University of Iowa CS 2630 Computer Organization What did we accomplish in 8 weeks? Brandon Myers University of Iowa Course evaluations ICON student (course?) tools Evaluations require Why take 2630? Brandon s esoteric answer:

More information

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. 11 1 This Set 11 1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but

More information

CSE 141 Summer 2016 Homework 2

CSE 141 Summer 2016 Homework 2 CSE 141 Summer 2016 Homework 2 PID: Name: 1. A matrix multiplication program can spend 10% of its execution time in reading inputs from a disk, 10% of its execution time in parsing and creating arrays

More information

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018 Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline

More information

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.

Chapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved. Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE

More information

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Engineering 9859 CoE Fundamentals Computer Architecture

Engineering 9859 CoE Fundamentals Computer Architecture Engineering 9859 CoE Fundamentals Computer Architecture Introduction Dennis Peters 1 Fall 2007 1 Based on notes from Dr. R. Venkatesan Course Details Classes Monday, Wednesday, Friday 9 10 EN-4033 Course

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

Basic Communication Operations (Chapter 4)

Basic Communication Operations (Chapter 4) Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:

More information

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK SUBJECT : CS6303 / COMPUTER ARCHITECTURE SEM / YEAR : VI / III year B.E. Unit I OVERVIEW AND INSTRUCTIONS Part A Q.No Questions BT Level

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

ECE 588/688 Advanced Computer Architecture II

ECE 588/688 Advanced Computer Architecture II ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Winter 2018 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2018 1 When and Where? When:

More information

ECE 2300 Digital Logic & Computer Organization. Caches

ECE 2300 Digital Logic & Computer Organization. Caches ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

Parallel Computing Ideas

Parallel Computing Ideas Parallel Computing Ideas K. 1 1 Department of Mathematics 2018 Why When to go for speed Historically: Production code Code takes a long time to run Code runs many times Code is not end in itself 2010:

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA:

Administration. Prerequisites. CS 395T: Topics in Multicore Programming. Why study parallel programming? Instructors: TA: CS 395T: Topics in Multicore Programming Administration Instructors: Keshav Pingali (CS,ICES) 4.126A ACES Email: pingali@cs.utexas.edu TA: Aditya Rawal Email: 83.aditya.rawal@gmail.com University of Texas,

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

CS3350B Computer Architecture. Introduction

CS3350B Computer Architecture. Introduction CS3350B Computer Architecture Winter 2015 Introduction Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b What is a computer? 2 What is a computer? 3 What is a computer? 4 What is a computer? 5 The Computer

More information

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. 11-1 This Set 11-1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but

More information

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2014

Lecture 13: Memory Consistency. + Course-So-Far Review. Parallel Computer Architecture and Programming CMU /15-618, Spring 2014 Lecture 13: Memory Consistency + Course-So-Far Review Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2014 Tunes Beggin Madcon (So Dark the Con of Man) 15-418 students tend to

More information

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 Name: Solutions (please print) 1-3. 11 points 4. 7 points 5. 7 points 6. 20 points 7. 30 points 8. 25 points Total (105 pts):

More information

Modern CPU Architectures

Modern CPU Architectures Modern CPU Architectures Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2014 16.04.2014 1 Motivation for Parallelism I CPU History RISC Software GmbH Johannes

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information