Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Similar documents
Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

What is Parallel Computing?

EE/CSCI 451: Parallel and Distributed Computation

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Parallel Architectures

CS575 Parallel Processing

Parallel Architecture. Sathish Vadhiyar

Interconnection networks

COMP 308 Parallel Efficient Algorithms. Course Description and Objectives: Teaching method. Recommended Course Textbooks. What is Parallel Computing?

Downloaded from

Lecture 7: Parallel Processing

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

Physical Organization of Parallel Platforms. Alexandre David

Parallel Numerics, WT 2013/ Introduction

Model Questions and Answers on

Interconnection Networks. Issues for Networks

Advanced Parallel Architecture. Annalisa Massini /2017

Scalability and Classifications

SHARED MEMORY VS DISTRIBUTED MEMORY

Lecture 7: Parallel Processing

4. Networks. in parallel computers. Advances in Computer Architecture

CSC630/CSC730: Parallel Computing

Design of Parallel Algorithms. The Architecture of a Parallel Computer

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Interconnection Network

Introduction to Parallel Computing

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 2 Parallel Computer Models & Classification Thoai Nam

Parallel algorithms at ENS Lyon

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Types of Parallel Computers

Interconnection Network

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

An Introduction to Parallel Programming

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.

COSC 6385 Computer Architecture - Multi Processor Systems

Introduction to Parallel Programming

IE 495 Lecture 3. Septermber 5, 2000

PARALLEL ALGORITHM - QUICK GUIDE PARALLEL ALGORITHM - INTRODUCTION

Parallel Architectures

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

CS Parallel Algorithms in Scientific Computing

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

With regards to bitonic sorts, everything boils down to doing bitonic sorts efficiently.

Concurrent/Parallel Processing

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

F. THOMSON LEIGHTON INTRODUCTION TO PARALLEL ALGORITHMS AND ARCHITECTURES: ARRAYS TREES HYPERCUBES

Computer Architecture and Organization

COSC 6374 Parallel Computation. Parallel Computer Architectures

Lecture 2 Parallel Programming Platforms

Chapter 9 Multiprocessors

COSC 6374 Parallel Computation. Parallel Computer Architectures

Lecture: Interconnection Networks

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

Multiprocessors - Flynn s Taxonomy (1966)

Chapter 2: Parallel Programming Platforms

Top500 Supercomputer list

Lecture 3: Topology - II

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Interconnect Technology and Computational Speed

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

DPHPC: Performance Recitation session

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Parallel Architectures

Contents. Preface. About the Authors BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS. l. 1 The Demand for Computational Speed 3

The Impact of Optics on HPC System Interconnects

Understanding the Routing Requirements for FPGA Array Computing Platform. Hayden So EE228a Project Presentation Dec 2 nd, 2003

Advanced Computer Architecture. The Architecture of Parallel Computers

Basic Communication Ops

EE/CSCI 451: Parallel and Distributed Computation

ECE/CS 250 Computer Architecture. Summer 2016

High Performance Computing

Curriculum 2013 Knowledge Units Pertaining to PDC

INTERCONNECTION NETWORKS LECTURE 4

Parallel and Distributed Computing (PD)

Parallel Numerics, WT 2017/ Introduction. page 1 of 127

High-Performance Parallel Database Processing and Grid Databases

EE382 Processor Design. Illinois

BlueGene/L (No. 4 in the Latest Top500 List)

Lecture 8 Parallel Algorithms II

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

I) The Question paper contains 40 multiple choice questions with four choices and student will have

EE 4683/5683: COMPUTER ARCHITECTURE

Contents. 1 Introduction. 2 Searching and Traversal Techniques. Preface... (vii) Acknowledgements... (ix)

Dr e v prasad Dt

Interconnection Networks

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

Introduction to Parallel Programming

Chapter 11. Introduction to Multiprocessors

Overview. Processor organizations Types of parallel machines. Real machines

CSL 860: Modern Parallel

Goals of this Course

CPS222 Lecture: Parallel Algorithms Last Revised 4/20/2015. Objectives

Transcription:

Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K.

CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing 1.1 1.2 Components of Parallel Computing System 1.3 1.2.1 Parallel Hardware 1.3 1.2.2 Parallel Operating System 1.7 1.2.3 Parallel Programs 1.7 1.3 Multiprocessor vs. Multi-core Architecture 1.7 1.4 Why Parallelism 1.8 1.5 Moore's Law 1.9 1.6 Sequential vs. Parallel Computing 1.10 1.7 Program 1.13 1.8 Process 1.13 1.9 Thread 1-14 1.10 Instruction 1.15 1.11 Concurrent Computing 1.16 1.11.1 Communication between Concurrent Systems 1.16 1.11.2 Coordinating Access to Resources 1.17 1.12 Distributed Computing 1.18 1.12.1 Scalability 1.19 1.12.2 Redundancy 1.19 1.13 Levels of Parallelism 120 1.13.1 Data level Parallelism 1.20 1.13.2 Instruction Level Parallelism 1.22 1.13.3 Thread or Task Level Parallelism 1.22 1.13.4 Bit Level Parallelism 1.24 1.14 Considerations while Writing Parallel Programs 1.25 1.14.1 Communication 1-25 1.14.2 Load Balancing 1.27 1.14.3 Synchronization 1.27 1.15 Need for Parallel Programs 1-28 1.16 Models of Parallel Algorithm 1.29 1.16.1 Data Parallel Model 1.29 1.16.2 Pipeline Model 1.30

xii Contents 1.16.3 Work Pool Model 1-30 1.16.4 Master-Slave Model 1.31 1.16.5 Hybrid Model 1-32 1.17 Types of Parallel Computing 1-33 1.17.1 Highly Parallel Computing 1.33 1.17.2 Massively Parallel Computing 1.33 1.17.3 Cluster Computing 1-34 1.17.4 Grid Computing 1-34 1.18 Advantages of Parallel Computing 1.35 1.18.1 Time and Cost Efficiency 1.35 1.18.2 Solving Larger Problems 1.35 1.18.3 Using Non-local Resources 1.35 1.19 Application of Parallel Computing 1.35 1.19.1 Image Processing 1-35 1.19.2 Seismology 1-36 1.19.3 Protein Folding 136 1.19.4 Databases 1-36 1.19.5 Search Engines 1.36 1.19.6 Drug Discovery and Drug Design 1.36 Exercise 1-37 2. Architecture of Parallel Computers 2.1-2.23 2.1 Von Neumann Architecture 2.1 2.1.1 Von Neumann Instructions 2.3 2.1.2 Von Neumann Instruction Cycle 2.3 2.2 Instruction and Data Stream 2.4 2.2.1 Limitations of Von Neumann Architecture 2.5 2.2.2 Improvements of Von Neumann Architecture 2.5 2.3 Classification of Parallel Computers 2.8 2.3.1 Flynn's Classification 2.8 2.3.2 Parallelism at Hardware Level (Handler's Classification)... 2.12 2.3.3 Classification on the Basis of Structure 2.12 2.3.4 Levels of Parallelism on the Basis of Grain Size 2.18 2.4 Dependency and its Types 2.19 2.4.1 Data Dependency 2.19 2.4.2 Flow Dependency 2.20 2.4.3 Output Dependency 2.20 2.4.4 Anti-dependency 2.20 2.4.5 I/O Dependency 2.21 2.4.6 Control Dependency 2.21 2.4.7 Resource Dependency 2.21 2.5 Bernstein Conditions for Detecting Parallelism 2.21 Exercise 2.23

Contents xiii 3. Interconnection Topologies 3.1-3.26 3.1 Purpose of Interconnection 3.1 3.2 Internetworking Terminology 3.2 3.2.1 Topology 3.2 3.2.2 Switching 3.2 3.2.3 Routing 3.3 3.2.4 Flow Control 3.4 3.2.5 Node Degree 3.4 3.2.6 Network Diameter 3.4 3.2.7 Bisection Width 3.5 3.2.8 Network Redundancy 3.5 3.2.9 Network Throughput 3.5 3.2.10 Network Latency 3.5 3.2.11 Hot Spot 3.5 3.2.12 Dimension ofnetwork 3.6 3.2.13 Broadcast and Multicast 3.6 3.2.14 Blocking vs. Non-blocking Networks 3.6 3.2.15 Static vs. Dynamic Network 3.7 3.2.16 Direct vs. Indirect Interconnection Network 3.8 3.3 Network Topologies 3.8 3.3.1 Bus Topology 3.8 3.3.2 Star Topology 3.8 3.3.3 Linear Array 3.9 3.3.4 Mesh Topology 3.10 3.3.5 Ring Topology 3.12 3.3.6 Torus Topology 3.13 3.3.7 Fully Connected Topology 3.14 3.3.8 Crossbar Network Topology 3.14 3.3.9 Tree Interconnection Topology 3.16 3.3.10 Fat Tree Topology 3.17 3.3.11 Cube Internetwork Topology 3.18 3.3.12 Hypercube Internetworking 3.19 3.3.13 Shuffle Network 3.20 3.3.14 Omega Network 3.21 3.3.15 Butterfly Internetwork 3.23 3.3.16 Benz Network 3.24 3.3.17 Pyramid Network 3.25 Exercise 3.26 4. Parallel Algorithms 4.1-4.23 4.1 Algorithms 4.1 4.2 Analyzing a Sequential Algorithm 4.2 4.2.1 Big O Notation 4.3

XIV Contents 4.3 Analyzing Parallel Algorithms 4.6 4.3.1 Time Complexity 4.6 4.3.2 Cost 4.9 4.3.3 Number of Processors 4.9 4.3.4 Space Complexity 4.13 4.3.5 Speedup 4.13 4.3.6 Efficiency 4.14 4.3.7 Scalability 4.15 4.4 Amdahl's Law 4.15 4.5 Cost Optimality of Parallel Algorithms 4.16 4.5.1 Some Examples of Cost Optimal Algorithms 4.19 Exercise 4.22 5. Graph Algorithms 5.1-5.36 5.1 Graph Terminology 5.1 5.1.1 Cyclic Graph 5.4 5.1.2 Complete Graph 5.5 5.1.3 Weighted Graph 5.5 5.1.4 Shortest Path Between Vertices 5.5 5.2 Data Structure to Store Graph 5.6 5.3 Solving Problems with Graph 5.8 5.3.1 Graph Traversal 5.8-5.3.2 Prim's Algorithm Minimum Spanning Tree 5.18 5.3.3 Single-Source Shortest Path 5.28 5.3.4 Connected Components of a Graph 5.31 Exercise 5.35 6. Parallel Sorting and Searching 6.1-6.26 6.1 Sorting Networks 6.1 6.1.1 Bitonic Sorting Network 6.5 6.1.2 Merging Sorted Sequences 6.6 6.2 Parallel Searching Algorithms 6.7 6.2.1 Binary Search Algorithm 6.8 6.3 Parallel Sorting Algorithms 6.10 6.3.1 Odd-Even Swap Sort 6.10 6.3.2 Insertion Sort 6.12 6.3.3 Selection Sort 6.14 6.3.4 Bubble Sort 6.16 6.3.5 Merge Algorithm 6.18 6.4 Solving Linear Equations 6.21 6.4.1 Gaussian Elimination Method 6.21 Exercise 6.26

Contents XV 7. PRAM Model of Computation 7.1-7.15 7.1 Model of Computation 7.1 7.2 RAM Model of Computation 7.2 7.3 PRAM Model of Computation 7.2 7.3.1 Conflict Resolution Techniques 7.4 7.4 PRAM Models 7.4 7.4.1 Concurrent Read Concurrent Write (CRCW) 7.4 7.4.2 Concurrent Read Exclusive Write (CREW) 7.6 7.4.3 Exclusive Read Exclusive Write (EREW) 7.7 7.4.4 Exclusive Read Concurrent Write (ERCW) 7.9 7.5 PRAM Algorithms 7.10 7.5.1 CRCW Maximum Number Algorithm 7.10 7.5.2 CRCW Matrix Multiplication 7.11 7.5.3 EREW Search Algorithm 7.12 7.5.4 EREW Maximum Algorithm 7.13 7.5.5 CREW Matrix Multiplication 7.14 Exercise 7.15 8. Parallel Operating System 8.1-8.3 8.1 Parallel Operating System 8.1 8.1.1 Process Management 8.2 8.1.2 Scheduling 8.2 8.1.3 Process Synchronization 8.3 8.1.4 Protection 8.3 Exercise 8.3 9. Basic Data Structure 9.1-9.8 9.1 Data Structure 9.1 9.1.1 Arrays 9.1 9.1.2 Linked List 9.3 9.1.3 Binary Tree 9.7 Exercise 9.8 10. Trends in Parallel Computing 10.1-10.6 10.1 Parallel Operating System 10.1 10.1.1 How PVM Works? 10.2 10.2 Cluster Computing 10.2 10.3 Grid Computing 10.4 10.3.1 Grid Management Components (GMC) 10.5 10.3.2 Donor Software 10.5 10.3.3 Schedulers 10.5 10.4 Hyper-Threading 10.6 Exercise 10.6 Index I-l