Parallel Computing Ideas

Size: px
Start display at page:

Download "Parallel Computing Ideas"

Transcription

1 Parallel Computing Ideas K. 1 1 Department of Mathematics 2018

2 Why When to go for speed Historically: Production code Code takes a long time to run Code runs many times Code is not end in itself 2010: If it is worth writing compiled code, it is worth doing in parallel... All CPUs now have multiple processors All applications will one day take advantage

3 Why Uses of Parallelism 1980s - Fluid Mechanics Single solution, large matrices, gather at every step 1990s - Large nonlinear problems Large matrices, repeated operations, gather at every step 2000s - Larger nonlinear problems and Big Data Multiple solutions, more independent processes, gathered results 2010s - Simulations and Data Analytics Multiple solutions, independent processes, gathered results Move from solving one large problem to many time-consuming problems.

4 Why Nomenclature CPU - Central Processing Unit Register - place in CPU where operands reside during use Cache - memory that resides on the CPU IF - Instruction Fetch ID - Instruction Decode EX - Execute WB - Write Back

5 Why Nomenclature Core A single processor on one of the CPUs of a node Processor Usually means a core, or hyperthread core Process A program that runs on a processor. Possibly many processes per processor.

6 Why Central Processing Unit From hardwaresecrets.com/ how-a-cpu-works/4/ accessed March 2018

7 Why How to speed? Avoid cache misses Try to access contiguous memory Do all computations on a block before moving on Example...

8 Why No Pipeline... All pipeline figures from

9 Pipelines Vector Pipelines Simple form of parallel processing Single CPU can perform two different operations at once Single instruction, sequential data

10 Pipelines Instruction Pipeline... Fetch, Decode, Execution, and so on, can all work at same time.

11 SIMD Single Instruction, Multiple Data Single CPU with many Arithmetic units ID step fills registers for each ALU EX step does computation simultaneously on all ALUs

12 SIMD SIMD Pipeline... After instructions are decoded, the same operations can be executed on a vector array of numbers.

13 SIMD Disadvantages Slower to fill ALU registers Many ALUs idle during EX

14 SIMD Single Instruction, Multiple Thread Many CPUs Main program spawns many threads for one instruction GPU computing

15 MIMD Multiple Instruction, Multiple Data Many CPUs Asynchronous Redundant work Much more versatile than most SIMD

16 MIMD Shared Memory E.g. Quad Core CPU Bus-based Limited bandwidth Scales poorly Switch-based Expensive Still does not scale well

17 SPMD SPMD Single Program, Multiple Data You write one (1) program......that program runs in every process Processes perform tasks based on conditions and messages Processes have different inputs, outputs

18 SPMD Python If we really want speed, should compile... For simulations and Monte Carlo, maybe... Forking processes is expensive match it to number of processors.

19 When CPUs were expensive: Pipelines As chips became denser SIMD As CPUs become commodities: MIMD As GPUs could be dense: GPU As processors automated everything: Scripts

20 Goal Hope to show that we can modify programs easily to take advantage of modern processors Getting speedups is more problematic

21 Resources Solitary - Two cpus, four cores each, 8GB RAM runs prime1 on 6 cores in.038 seconds on OpenMPI Cluster - Five nodes, one cpu per node, six cores per cpu, 8GB RAM per node runs prime1 on 6 cores in.041 seconds on MPICH2 Labs - 20 to 32 nodes, two to eight cores per node

Approaches to Parallel Computing

Approaches to Parallel Computing Approaches to Parallel Computing K. Cooper 1 1 Department of Mathematics Washington State University 2019 Paradigms Concept Many hands make light work... Set several processors to work on separate aspects

More information

RISC Processors and Parallel Processing. Section and 3.3.6

RISC Processors and Parallel Processing. Section and 3.3.6 RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

CUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University

CUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University GPU Computing K. Cooper 1 1 Department of Mathematics Washington State University 2014 Review of Parallel Paradigms MIMD Computing Multiple Instruction Multiple Data Several separate program streams, each

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

VON NEUMANN ARCHITECTURE. VON NEUMANN ARCHITECTURE IB DP Computer science Standard Level ICS3U

VON NEUMANN ARCHITECTURE. VON NEUMANN ARCHITECTURE IB DP Computer science Standard Level ICS3U C A N A D I A N I N T E R N A T I O N A L S C H O O L O F H O N G K O N G 5.3 Putting all the Units Together the Von Neumann Architecture 5.4 s Von Neumann Architecture the four components that make up

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages. PIPELINING What is Pipelining A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed. - A Pipeline is a series of

More information

Optimized Scientific Computing:

Optimized Scientific Computing: Optimized Scientific Computing: Coding Efficiently for Real Computing Architectures Noah Kurinsky SASS Talk, November 11 2015 Introduction Components of a CPU Architecture Design Choices Why Is This Relevant

More information

COMPUTER STRUCTURE AND ORGANIZATION

COMPUTER STRUCTURE AND ORGANIZATION COMPUTER STRUCTURE AND ORGANIZATION Course titular: DUMITRAŞCU Eugen Chapter 4 COMPUTER ORGANIZATION FUNDAMENTAL CONCEPTS CONTENT The scheme of 5 units von Neumann principles Functioning of a von Neumann

More information

3.3 Hardware Parallel processing

3.3 Hardware Parallel processing Parallel processing is the simultaneous use of more than one CPU to execute a program. Ideally, parallel processing makes a program run faster because there are more CPUs running it. In practice, it is

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache. ECE 201 - Lab 8 Logic Design for a Direct-Mapped Cache PURPOSE To understand the function and design of a direct-mapped memory cache. EQUIPMENT Simulation Software REQUIREMENTS Electronic copy of your

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Chapter 5: Computer Systems Organization

Chapter 5: Computer Systems Organization Objectives Chapter 5: Computer Systems Organization Invitation to Computer Science, C++ Version, Third Edition In this chapter, you will learn about: The components of a computer system Putting all the

More information

Chapter 5: Computer Systems Organization. Invitation to Computer Science, C++ Version, Third Edition

Chapter 5: Computer Systems Organization. Invitation to Computer Science, C++ Version, Third Edition Chapter 5: Computer Systems Organization Invitation to Computer Science, C++ Version, Third Edition Objectives In this chapter, you will learn about: The components of a computer system Putting all the

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

ECE 574 Cluster Computing Lecture 16

ECE 574 Cluster Computing Lecture 16 ECE 574 Cluster Computing Lecture 16 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 26 March 2019 Announcements HW#7 posted HW#6 and HW#5 returned Don t forget project topics

More information

SIMD Divergence Optimization through Intra-Warp Compaction. Aniruddha Vaidya Anahita Shayesteh Dong Hyuk Woo Roy Saharoy Mani Azimi ISCA 13

SIMD Divergence Optimization through Intra-Warp Compaction. Aniruddha Vaidya Anahita Shayesteh Dong Hyuk Woo Roy Saharoy Mani Azimi ISCA 13 SIMD Divergence Optimization through Intra-Warp Compaction Aniruddha Vaidya Anahita Shayesteh Dong Hyuk Woo Roy Saharoy Mani Azimi ISCA 13 Problem GPU: wide SIMD lanes 16 lanes per warp in this work SIMD

More information

Parallel Computing Architectures

Parallel Computing Architectures Parallel Computing Architectures Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ 2 An Abstract Parallel Architecture Processor Processor

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Computer Organization

Computer Organization Objectives 5.1 Chapter 5 Computer Organization Source: Foundations of Computer Science Cengage Learning 5.2 After studying this chapter, students should be able to: List the three subsystems of a computer.

More information

Install and Configure ICT Equipment and Operating Systems Unit 229 Central Processing Unit (CPU)

Install and Configure ICT Equipment and Operating Systems Unit 229 Central Processing Unit (CPU) Install and Configure ICT Equipment and Operating Systems Unit 229 Central Processing Unit (CPU) Definition of CPU Alternately referred to as a processor, central processor, or microprocessor, the CPU

More information

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Lecture 29 Review CPU time: the best metric Be sure you understand CC, clock period Common (and good) performance metrics Be sure you understand CC, clock period Lecture 29 Review Suggested reading: Everything Q1: D[8] = D[8] + RF[1] + RF[4] I[15]: Add R2, R1, R4 RF[1] = 4 I[16]: MOV R3, 8 RF[4] = 5 I[17]: Add R2, R2, R3

More information

Computer Architecture and Assembly Language. Spring

Computer Architecture and Assembly Language. Spring Computer Architecture and Assembly Language Spring 2014-2015 What is a computer? A computer is a sophisticated electronic calculating machine that: Accepts input information, Processes the information

More information

5 Computer Organization

5 Computer Organization 5 Computer Organization 5.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three subsystems of a computer. Describe the

More information

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges

ELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018 Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

Parallel Computing Introduction

Parallel Computing Introduction Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets administrivia final hour exam next Wednesday covers assembly language like hw and worksheets today last worksheet start looking at more details on hardware not covered on ANY exam probably won t finish

More information

Some features of modern CPUs. and how they help us

Some features of modern CPUs. and how they help us Some features of modern CPUs and how they help us RAM MUL core Wide operands RAM MUL core CP1: hardware can multiply 64-bit floating-point numbers Pipelining: can start the next independent operation before

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Technology in Action

Technology in Action Technology in Action Chapter 9 Behind the Scenes: A Closer Look at System Hardware 1 Binary Language Computers work in binary language. Consists of two numbers: 0 and 1 Everything a computer does is broken

More information

x86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures

x86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures x86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures topics Preliminary material a look at what Assembly Language works with - How processors work»a moment

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Introduction to GPU programming with CUDA

Introduction to GPU programming with CUDA Introduction to GPU programming with CUDA Dr. Juan C Zuniga University of Saskatchewan, WestGrid UBC Summer School, Vancouver. June 12th, 2018 Outline 1 Overview of GPU computing a. what is a GPU? b. GPU

More information

Architectures of Flynn s taxonomy -- A Comparison of Methods

Architectures of Flynn s taxonomy -- A Comparison of Methods Architectures of Flynn s taxonomy -- A Comparison of Methods Neha K. Shinde Student, Department of Electronic Engineering, J D College of Engineering and Management, RTM Nagpur University, Maharashtra,

More information

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014

Introduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014 Introduction to MPI EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2014 References This presentation is almost an exact copy of Dartmouth College's Introduction

More information

The Host Environment. Module 2.1. Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved. The Host Environment - 1

The Host Environment. Module 2.1. Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved. The Host Environment - 1 The Host Environment Module 2.1 2006 EMC Corporation. All rights reserved. The Host Environment - 1 The Host Environment Upon completion of this module, you will be able to: List the hardware and software

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

Chapter 9: A Closer Look at System Hardware

Chapter 9: A Closer Look at System Hardware Chapter 9: A Closer Look at System Hardware CS10001 Computer Literacy Chapter 9: A Closer Look at System Hardware 1 Topics Discussed Digital Data and Switches Manual Electrical Digital Data Representation

More information

Chapter 9: A Closer Look at System Hardware 4

Chapter 9: A Closer Look at System Hardware 4 Chapter 9: A Closer Look at System Hardware CS10001 Computer Literacy Topics Discussed Digital Data and Switches Manual Electrical Digital Data Representation Decimal to Binary (Numbers) Characters and

More information

Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadm

Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadm XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel

More information

What are Clusters? Why Clusters? - a Short History

What are Clusters? Why Clusters? - a Short History What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by

More information

378: Machine Organization and Assembly Language

378: Machine Organization and Assembly Language 378: Machine Organization and Assembly Language Spring 2010 Luis Ceze Slides adapted from: UIUC, Luis Ceze, Larry Snyder, Hal Perkins 1 What is computer architecture about? Computer architecture is the

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Math 4370/6370 Lecture 1: Machines and Computation

Math 4370/6370 Lecture 1: Machines and Computation Math 4370/6370 Lecture 1: Machines and Computation Daniel R. Reynolds Spring 2017 reynolds@smu.edu Historically, we have depended on hardware advances to enable faster and larger simulations. In 1965,

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1 Pipelining and Vector Processing Parallel Processing: The term parallel processing indicates that the system is able to perform several operations in a single time. Now we will elaborate the scenario,

More information

More advanced CPUs. August 4, Howard Huang 1

More advanced CPUs. August 4, Howard Huang 1 More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into

More information

Processor Architecture

Processor Architecture ECPE 170 Jeff Shafer University of the Pacific Processor Architecture 2 Lab Schedule Ac=vi=es Assignments Due Today Wednesday Apr 24 th Processor Architecture Lab 12 due by 11:59pm Wednesday Network Programming

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Paral el Hardware All material not from online sources/textbook copyright Travis Desell, 2012

Paral el Hardware All material not from online sources/textbook copyright Travis Desell, 2012 Parallel Hardware All material not from online sources/textbook copyright Travis Desell, 2012 Overview 1. Von Neumann Architecture 2. Processes, Multitasking, Threads 3. Modifications to the Von Neumann

More information

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know.

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know. Administrivia HW0 scores, HW1 peer-review assignments out. HW2 out, due Nov. 2. If you re having Cython trouble with HW2, let us know. Review on Wednesday: Post questions on Piazza Introduction to GPUs

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Chapter 2 Parallel Hardware

Chapter 2 Parallel Hardware Chapter 2 Parallel Hardware Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Accelerating CFD with Graphics Hardware

Accelerating CFD with Graphics Hardware Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery

More information

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010

Pipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several

More information

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter

Binghamton University. CS-220 Spring Cached Memory. Computer Systems Chapter Cached Memory Computer Systems Chapter 6.2-6.5 Cost Speed The Memory Hierarchy Capacity The Cache Concept CPU Registers Addresses Data Memory ALU Instructions The Cache Concept Memory CPU Registers Addresses

More information

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc. Technology in Action Technology in Action Chapter 9 Behind the Scenes: A Closer Look a System Hardware Chapter Topics Computer switches Binary number system Inside the CPU Cache memory Types of RAM Computer

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

DC57 COMPUTER ORGANIZATION JUNE 2013

DC57 COMPUTER ORGANIZATION JUNE 2013 Q2 (a) How do various factors like Hardware design, Instruction set, Compiler related to the performance of a computer? The most important measure of a computer is how quickly it can execute programs.

More information

CS 3510 Comp&Net Arch

CS 3510 Comp&Net Arch CS 3510 Comp&Net Arch Pipeline Dr. Ken Hoganson 2010 Enhancing Performance We observed that we can obtain better performance in executing instructions, if a single cycle accomplishes multiple operations:

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

The Stored Program Computer

The Stored Program Computer The Stored Program Computer 1 1945: John von Neumann Wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC also Alan Turing Konrad Zuse Eckert & Mauchly The basic

More information

Chapter 17 - Parallel Processing

Chapter 17 - Parallel Processing Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers.

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers. Set No. 1 IV B.Tech I Semester Supplementary Examinations, March - 2017 COMPUTER ARCHITECTURE & ORGANIZATION (Common to Electronics & Communication Engineering and Electronics & Time: 3 hours Max. Marks:

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Computer Architecture and Data Manipulation. Von Neumann Architecture

Computer Architecture and Data Manipulation. Von Neumann Architecture Computer Architecture and Data Manipulation Chapter 3 Von Neumann Architecture Today s stored-program computers have the following characteristics: Three hardware systems: A central processing unit (CPU)

More information

CS 351 Final Review Quiz

CS 351 Final Review Quiz CS 351 Final Review Quiz Notes: You must explain your answers to receive partial credit. You will lose points for incorrect extraneous information, even if the answer is otherwise correct. Question 1:

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 2: Architecture of Parallel Computers Hardware Software 2.1.1 Flynn s taxonomy Single-instruction

More information

MARTHANDAM COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY TWO MARK QUESTIONS AND ANSWERS

MARTHANDAM COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY TWO MARK QUESTIONS AND ANSWERS MARTHANDAM COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY TWO MARK QUESTIONS AND ANSWERS SUB NAME: COMPUTER ORGANIZATION AND ARCHITECTTURE SUB CODE: CS 2253 YEAR/SEM:II/IV Marthandam

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

Knowledge Organiser. Computing. Year 10 Term 1 Hardware Organiser Computing Year 10 Term 1 Hardware Enquiry Question How does a computer do everything it does? Big questions that will help you answer this enquiry question: 1. What is the purpose of the CPU?

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 5: Processors Our goal: understand basics of processors and CPU understand the architecture of MARIE, a model computer a close look at the instruction

More information

Chapter 9. Pipelining Design Techniques

Chapter 9. Pipelining Design Techniques Chapter 9 Pipelining Design Techniques 9.1 General Concepts Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to be performed in sequence. Each subtask

More information

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers Course Introduction Purpose: This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers Objectives: Learn about error detection and address errors

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control

More information

5 Computer Organization

5 Computer Organization 5 Computer Organization 5.1 Foundations of Computer Science ã Cengage Learning Objectives After studying this chapter, the student should be able to: q List the three subsystems of a computer. q Describe

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information