Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K.
CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing 1.1 1.2 Components of Parallel Computing System 1.3 1.2.1 Parallel Hardware 1.3 1.2.2 Parallel Operating System 1.7 1.2.3 Parallel Programs 1.7 1.3 Multiprocessor vs. Multi-core Architecture 1.7 1.4 Why Parallelism 1.8 1.5 Moore's Law 1.9 1.6 Sequential vs. Parallel Computing 1.10 1.7 Program 1.13 1.8 Process 1.13 1.9 Thread 1-14 1.10 Instruction 1.15 1.11 Concurrent Computing 1.16 1.11.1 Communication between Concurrent Systems 1.16 1.11.2 Coordinating Access to Resources 1.17 1.12 Distributed Computing 1.18 1.12.1 Scalability 1.19 1.12.2 Redundancy 1.19 1.13 Levels of Parallelism 120 1.13.1 Data level Parallelism 1.20 1.13.2 Instruction Level Parallelism 1.22 1.13.3 Thread or Task Level Parallelism 1.22 1.13.4 Bit Level Parallelism 1.24 1.14 Considerations while Writing Parallel Programs 1.25 1.14.1 Communication 1-25 1.14.2 Load Balancing 1.27 1.14.3 Synchronization 1.27 1.15 Need for Parallel Programs 1-28 1.16 Models of Parallel Algorithm 1.29 1.16.1 Data Parallel Model 1.29 1.16.2 Pipeline Model 1.30
xii Contents 1.16.3 Work Pool Model 1-30 1.16.4 Master-Slave Model 1.31 1.16.5 Hybrid Model 1-32 1.17 Types of Parallel Computing 1-33 1.17.1 Highly Parallel Computing 1.33 1.17.2 Massively Parallel Computing 1.33 1.17.3 Cluster Computing 1-34 1.17.4 Grid Computing 1-34 1.18 Advantages of Parallel Computing 1.35 1.18.1 Time and Cost Efficiency 1.35 1.18.2 Solving Larger Problems 1.35 1.18.3 Using Non-local Resources 1.35 1.19 Application of Parallel Computing 1.35 1.19.1 Image Processing 1-35 1.19.2 Seismology 1-36 1.19.3 Protein Folding 136 1.19.4 Databases 1-36 1.19.5 Search Engines 1.36 1.19.6 Drug Discovery and Drug Design 1.36 Exercise 1-37 2. Architecture of Parallel Computers 2.1-2.23 2.1 Von Neumann Architecture 2.1 2.1.1 Von Neumann Instructions 2.3 2.1.2 Von Neumann Instruction Cycle 2.3 2.2 Instruction and Data Stream 2.4 2.2.1 Limitations of Von Neumann Architecture 2.5 2.2.2 Improvements of Von Neumann Architecture 2.5 2.3 Classification of Parallel Computers 2.8 2.3.1 Flynn's Classification 2.8 2.3.2 Parallelism at Hardware Level (Handler's Classification)... 2.12 2.3.3 Classification on the Basis of Structure 2.12 2.3.4 Levels of Parallelism on the Basis of Grain Size 2.18 2.4 Dependency and its Types 2.19 2.4.1 Data Dependency 2.19 2.4.2 Flow Dependency 2.20 2.4.3 Output Dependency 2.20 2.4.4 Anti-dependency 2.20 2.4.5 I/O Dependency 2.21 2.4.6 Control Dependency 2.21 2.4.7 Resource Dependency 2.21 2.5 Bernstein Conditions for Detecting Parallelism 2.21 Exercise 2.23
Contents xiii 3. Interconnection Topologies 3.1-3.26 3.1 Purpose of Interconnection 3.1 3.2 Internetworking Terminology 3.2 3.2.1 Topology 3.2 3.2.2 Switching 3.2 3.2.3 Routing 3.3 3.2.4 Flow Control 3.4 3.2.5 Node Degree 3.4 3.2.6 Network Diameter 3.4 3.2.7 Bisection Width 3.5 3.2.8 Network Redundancy 3.5 3.2.9 Network Throughput 3.5 3.2.10 Network Latency 3.5 3.2.11 Hot Spot 3.5 3.2.12 Dimension ofnetwork 3.6 3.2.13 Broadcast and Multicast 3.6 3.2.14 Blocking vs. Non-blocking Networks 3.6 3.2.15 Static vs. Dynamic Network 3.7 3.2.16 Direct vs. Indirect Interconnection Network 3.8 3.3 Network Topologies 3.8 3.3.1 Bus Topology 3.8 3.3.2 Star Topology 3.8 3.3.3 Linear Array 3.9 3.3.4 Mesh Topology 3.10 3.3.5 Ring Topology 3.12 3.3.6 Torus Topology 3.13 3.3.7 Fully Connected Topology 3.14 3.3.8 Crossbar Network Topology 3.14 3.3.9 Tree Interconnection Topology 3.16 3.3.10 Fat Tree Topology 3.17 3.3.11 Cube Internetwork Topology 3.18 3.3.12 Hypercube Internetworking 3.19 3.3.13 Shuffle Network 3.20 3.3.14 Omega Network 3.21 3.3.15 Butterfly Internetwork 3.23 3.3.16 Benz Network 3.24 3.3.17 Pyramid Network 3.25 Exercise 3.26 4. Parallel Algorithms 4.1-4.23 4.1 Algorithms 4.1 4.2 Analyzing a Sequential Algorithm 4.2 4.2.1 Big O Notation 4.3
XIV Contents 4.3 Analyzing Parallel Algorithms 4.6 4.3.1 Time Complexity 4.6 4.3.2 Cost 4.9 4.3.3 Number of Processors 4.9 4.3.4 Space Complexity 4.13 4.3.5 Speedup 4.13 4.3.6 Efficiency 4.14 4.3.7 Scalability 4.15 4.4 Amdahl's Law 4.15 4.5 Cost Optimality of Parallel Algorithms 4.16 4.5.1 Some Examples of Cost Optimal Algorithms 4.19 Exercise 4.22 5. Graph Algorithms 5.1-5.36 5.1 Graph Terminology 5.1 5.1.1 Cyclic Graph 5.4 5.1.2 Complete Graph 5.5 5.1.3 Weighted Graph 5.5 5.1.4 Shortest Path Between Vertices 5.5 5.2 Data Structure to Store Graph 5.6 5.3 Solving Problems with Graph 5.8 5.3.1 Graph Traversal 5.8-5.3.2 Prim's Algorithm Minimum Spanning Tree 5.18 5.3.3 Single-Source Shortest Path 5.28 5.3.4 Connected Components of a Graph 5.31 Exercise 5.35 6. Parallel Sorting and Searching 6.1-6.26 6.1 Sorting Networks 6.1 6.1.1 Bitonic Sorting Network 6.5 6.1.2 Merging Sorted Sequences 6.6 6.2 Parallel Searching Algorithms 6.7 6.2.1 Binary Search Algorithm 6.8 6.3 Parallel Sorting Algorithms 6.10 6.3.1 Odd-Even Swap Sort 6.10 6.3.2 Insertion Sort 6.12 6.3.3 Selection Sort 6.14 6.3.4 Bubble Sort 6.16 6.3.5 Merge Algorithm 6.18 6.4 Solving Linear Equations 6.21 6.4.1 Gaussian Elimination Method 6.21 Exercise 6.26
Contents XV 7. PRAM Model of Computation 7.1-7.15 7.1 Model of Computation 7.1 7.2 RAM Model of Computation 7.2 7.3 PRAM Model of Computation 7.2 7.3.1 Conflict Resolution Techniques 7.4 7.4 PRAM Models 7.4 7.4.1 Concurrent Read Concurrent Write (CRCW) 7.4 7.4.2 Concurrent Read Exclusive Write (CREW) 7.6 7.4.3 Exclusive Read Exclusive Write (EREW) 7.7 7.4.4 Exclusive Read Concurrent Write (ERCW) 7.9 7.5 PRAM Algorithms 7.10 7.5.1 CRCW Maximum Number Algorithm 7.10 7.5.2 CRCW Matrix Multiplication 7.11 7.5.3 EREW Search Algorithm 7.12 7.5.4 EREW Maximum Algorithm 7.13 7.5.5 CREW Matrix Multiplication 7.14 Exercise 7.15 8. Parallel Operating System 8.1-8.3 8.1 Parallel Operating System 8.1 8.1.1 Process Management 8.2 8.1.2 Scheduling 8.2 8.1.3 Process Synchronization 8.3 8.1.4 Protection 8.3 Exercise 8.3 9. Basic Data Structure 9.1-9.8 9.1 Data Structure 9.1 9.1.1 Arrays 9.1 9.1.2 Linked List 9.3 9.1.3 Binary Tree 9.7 Exercise 9.8 10. Trends in Parallel Computing 10.1-10.6 10.1 Parallel Operating System 10.1 10.1.1 How PVM Works? 10.2 10.2 Cluster Computing 10.2 10.3 Grid Computing 10.4 10.3.1 Grid Management Components (GMC) 10.5 10.3.2 Donor Software 10.5 10.3.3 Schedulers 10.5 10.4 Hyper-Threading 10.6 Exercise 10.6 Index I-l