Goals of this Course
|
|
- Ashlynn Snow
- 5 years ago
- Views:
Transcription
1 CISC High performance parallel algorithms for computational science Instructor: Dr. Michela Taufer Spring 2009 Goals of this Course This course is intended to provide students with an understanding of parallelization with MPI and OpenMP. Case studies for parallelization include Molecular Dynamics and Monte Carlo simulations, their principals, and their sequential and parallel algorithms. Emphasis is placed on the algorithmic and code components of these simulations, their performance analysis, and their scalability. From the syllabus 2 1
2 Course Topics Parallel Programming Parallel architectures Parallel programming with Message Passing Interface (MPI) Parallel programming with OpenMP Case study I: Molecular Dynamics (MD) simulations Parallelization of Molecular Dynamics (MD) algorithm with MPI and OpenMP Case Study II: Monte Carlo Simulations Parallelization of Monte Carlo algorithm with MPI Hybrid Parallelism: Combining MPI and OpenMP 3 Course Information and Deadlines Webpage: Mailing list: cisc849010_sp09@gcl.cis.udel.edu Access course material: User: cisc849student Password: Work4Fun! Schedule: Download it from the course webpage It is a tentative schedule! Syllabus Download it at the course webpage Read it carefully! 4 2
3 Books Parallel Programming with MPI by Peter Pacheco 5 Books Parallel Programming in C with MPI and OpenMP by Michael J. Quinn 6 3
4 Books Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 7 Books The Art of Molecular Dynamics Simulation by D.C. Rapaport, Cambridge Ed. 8 4
5 Books Molecular Modeling Principles and Applications by A.R. Leach, Pearson Ed. 9 Modern Scientific Method (I) Nature Observation Physical experiments and models Theory Classical science 10 5
6 Modern Scientific Method (II) Nature Observation Numerical simulations Physical experiments and models Expensive Time-consuming Unethical Impossible.. Theory Contemporary science 11 Grand Challenges Grand challenges are complex scientific problems: Quantum chemistry, statistical mechanics, and relativistic physics Cosmology and astrophysics Computational fluid dynamics and turbulence Biology, pharmacology, genome sequencing, protein folding, and cell modeling Global weather and environmental modeling They require extraordinarily powerful computers when solved via numerical simulations: Need more computational power Benefit from parallel computing 12 6
7 What is Parallel Computing? Parallel computing: use of multiple processors or computers working together to solve a single computational problem. Each processor works on its section of the problem Processors can exchange information Grid of Problem to be solved CPU #1 works on this area of the problem exchange CPU #2 works on this area of the problem y exchange exchange CPU #3 works on this area of the problem exchange CPU #4 works on this area of the problem x 13 Why Do Parallel Computing? Limits of single CPU computing performance available memory Parallel computing allows one to: solve problems that don t fit on a single CPU solve problems that can t be solved in a reasonable time We can solve larger problems faster more cases 14 7
8 Example: Weather Modeling and Forecasting For modeling a hurricane region: Assume region of interest is 1000 X 1000 miles, with height of 10 miles. Partition into segments of 0.1 x 0.1 x 0.1 miles: 10^10 grid points Simulate 2 days, with 30-minute time steps: 100 total time steps Assume the computations at each grid point require 100 instructions. A single time step then requires 10^12 instructions. For two days we need 10^14 instructions For serial computer with 10^8 instructions/sec, this takes 10^6 seconds (10 days!) to predict next 2 days!! THIS REQUIRES PARALLELISM FOR PERFORMANCE TO PREDICT Also requires lots of memory which implies parallelism Currently all major weather forecast centers (US, Europe, Asia) have supercomputers with 1000s of processors. 15 Other Examples Vehicle design and dynamics Analysis of protein structures Human genome work Quantum chromodynamics Cosmology Ocean modeling Imaging and Rendering Petroleum exploration Nuclear Weapon design Database query Ozone layer monitoring Natural language understanding Study of chemical phenomena And many other grand challenge projects 16 8
9 What is Parallel Computers? A Parallel Computer is a computer (or collection of computers) with multiple processors that can work together on solving a complex problem and supporting parallel computing Distributed multiprocessor: parallel computer constructed out of multiple computers and an interconnected network Centralized multiprocessor (or Symmetrical multiprocessor or SMP): all CPUs share access to a single global memory. How do the processors work together? 17 Distributed Multiprocessor 18 9
10 Centralized Multiprocessors 19 What is Parallel Programming? Parallel Programming: programming in a language that allow you to explicitly indicate how parts of the computation may be executed in parallel (concurrently). Confide the task to compiler technology: compiler detect and exploit the parallelism in existing code written in sequential languages Write your own parallel program: e.g., parallel programs written in C/C++/Fortran with MPI or OpenMP 20 10
11 MPI and OpenMP MPI (Message Passing Interface) MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. From OpenMP OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows NT platforms. From 21 Single Program, Multiple Data (SPMD) SPMD: dominant programming model for shared and distributed memory machines. One source code is written Code can have conditional execution based on which processor is executing the copy All copies of code are started simultaneously and communicate and synch with each other periodically MPMD: more general, and possible in hardware, but no system/programming software enables it 22 11
12 SPMD Programming Model source.c source.c source.c source.c source.c Processor 0 Processor 1 Processor 2 Processor 3 23 Types of Parallelism: Two Extremes Data parallelism Each processor performs the same task on different data Task parallelism (or Functional Parallelism) Each processor performs a different task Most applications fall somewhere on the continuum between these two extremes 24 12
13 Data Parallel Programming Example One code will run on 2 CPUs Program has array of data to be operated on by 2 CPUs so array is split into two parts. program: if CPU=a then low_limit=1 upper_limit=50 elseif CPU=b then low_limit=51 upper_limit=100 end if do I = low_limit, upper_limit work on A(I) end do... end program CPU A program: low_limit=1 upper_limit=50 do I= low_limit, upper_limit work on A(I) end do end program CPU B program: low_limit=51 upper_limit=100 do I= low_limit, upper_limit work on A(I) end do end program 25 Task Parallel Programming Example One code will run on 2 CPUs Program has 2 tasks (a and b) to be done by 2 CPUs program.f: initialize... if CPU=a then do task a elseif CPU=b then do task b end if. end program CPU A program.f: initialize do task a end program CPU B program.f: initialize do task b end program 26 13
14 Task Parallelism: Protein Folding Same initial protein structure Different MDs (independent tasks) Final set of folded protein structures Independent tasks: change atoms velocities to random seed values while preserving temperature 27 Data Parallelism: Protein Folding One single folding process is performed in parallel PC 1 PC 3 PC 2 PC 0 Partitioned the space in four regions 28 14
15 Data Dependency Graphs Formal method to identify parallelism A directed graph: Vertexes (circles) represent tasks to be completed Edges denote dependencies among tasks If there is not path between two vertexes, then the tasks are independent Labels inside circles represent the kind of tasks being performed Multiple circles with the same label represent tasks performing the same operation on different operands 29 A Parallelism in Data Dependency Graphs A A B B B B C BD B C CE C Data parallelism Task parallelism Sequential dependency 30 15
16 Pipeline Divide a process into stages Produce several items simultaneously E.g., automobile assembly line 31 Pipelining Given a sequential dependence graph (a sequence of independent tasks or stages), assume that: all tasks take the same amount of time multiple problem instances need to be proceeded Then the output of each functional unit is the input of the input to the next. i-2 i-1 i i+1 A B C Examples: von Neuman model where the various circuits in the CPU are split up into functional units; Automobile assembly line 32 16
17 Limits of Parallel Computing Theoretical Upper Limits Amdahl s Law Practical Limits Load balancing Non-computational sections Time to re-write code Hardware/System Limits Topology Network bandwidth and latency Number of processors 33 Amdahl s Law Amdahl s Law places a strict limit on the speedup that can be realized by using multiple processors. Effect of multiple processors on run time t n = ( f p / N + f s )t 1 Effect of multiple processors on speed up S = 1 Where f s + fp / N f s = serial fraction of code f p = parallel fraction of code N = number of processors 34 17
18 It takes only a small fraction of serial content in a code to degrade the parallel performance. 250 Illustration of Amdahl's Law S f p = f p = f p = f p = Number of processors 35 Practical Limits: Amdahl s Law vs. Reality Amdahl s Law provides a theoretical upper limit on parallel speedup assuming that there are no costs for communications. In reality, communications will result in a further degradation of performance. S f p = 0.99 Amdahl's Law Reality Number of processors 36 18
19 Shared and Distributed Memory P P P P P P BBus U S Memory P M P P P P P M M M M M Network Shared memory: single address space. All processors have access to a pool of shared memory. (examples: Cray SV1, IBM Power4 node) Methods of memory access: - Bus - Crossbar Distributed memory: each processor has it s own local memory. Must do message passing to exchange data between processors. (examples: Clusters, Cray T3E) Methods of memory access: - various topological interconnects 37 Bus-Based Shared-Memory Architecture (I) Processors are connected to global memory by means of a common data path called a bus. Global Memory BUS CPU CPU CPU 38 19
20 Critical Issues Simplicity of construction Provides uniform access to shared memory Bus can carry limited amount of data between the memory and processors As the number of processors increases each processor spends more time waiting for memory access while the bus is used by other processor Saturation of the bus SGI Challenge XL has only 36 processors 39 Bus-Based Shared-Memory Architecture (II) By adding caches to bus, the performance increases. Global Memory BUS Cache Cache Cache CPU CPU CPU 40 20
21 Bus with and without Cache With cache performance # processors What is the matter with this picture? 41 Switch-Based Shared-Memory Architecture M1 M2... Mm CPU 1 CPU 2 CPU p Switch element PxM crossbar switch: P processors and M memory banks Example of 5x5 crossbar switch: basic unit of the Convex SPP
22 Pro and Contra Do not suffer from problems of saturations BUT Very expensive architectures mxn crossbar needs mn hardware switches 43 Cost Estimation Crossbar switch is a non-blocking network: connection of a processor to a memory bank does not block the connection of any other processor to any other memory bank. Total # of switching elements required is: f(p*m) = approx f(p*p) (assuming p = m) As p grows, so does complexity of switching network f(p*p) Cross bar switches are not scalable in terms of cost
23 Type of Distributed-Memory Architectures Bus-based networks Cluster of workstations on a Ethernet Dynamic interconnect (indirect topology) Static interconnect (direct topology) 45 Dynamic vs. Static Interconnect DYNAMIC INTERCONNECT (indirect topology): Communication links are connected to one another dynamically by switching elements to establish path among processors STATIC INTERCONNECT (direct topology): Point to point communication links among processors. Switch Processor/memory pair Crossbar network: specialized switching nodes transfer the messages Mash: processors themselves as the routing nodes 46 23
24 Dynamic Interconnect: Crossbar Switch Network 47 Dynamic Interconnect: Omega Network (I) 48 24
25 Dynamic Interconnect: Omega Network (II) Routing 49 Example of Systems with Dynamic Interconnection Networks Crossbar switch network Fujitsu VPP 500 (224x224 crossbar with 224 nodes) Compromised strategy between crossbar and omega networks SP series from IBM Each switch of the omega structure is a 8x8 crossbar Larger installation machine has 512 nodes 50 25
26 Static Interconnection Networks completely connected star connected linear array ring mesh hypercube 51 Static Interconnect Completely Connected : Each processor has direct communication link to every other processor Star Connected Network : The middle processor is the central processor. Every other processor is connected to it. Counter part of Cross Bar switch in Dynamic interconnect
27 Static Interconnect Linear Array : Ring : Mesh Network (e.g. 2D) 53 Static Interconnect Torus or Wraparound Mesh : 54 27
28 Static Interconnect Hypercube Network : A multidimensional mesh of processors with exactly two processors in each dimension. A d dimensional processor consists of p = 2 d processors Shown below are 0, 1, 2, and 3D hypercubes 0-D 1-D 2-D 3-D hypercubes 55 Routing How is data transmitted between two nodes not directly connected? Hardware and hardware+software solutions How is a route between nodes decided if there are multiple routes? Deterministic shortest-path routing algorithms How do intermediate nodes forward communications? Store-and-forward routing Wormhole routing 56 28
29 Store-and-Forward vs. Wormhole Routing Store-and-forward routing: data being shipped through a network a packet at a time. We send the packet to the first intermediate node, then on to the second, and so forth. Wormhole routing: a worm crawling through a wormhole. A packet contains a header with routing information followed by a payload containing the actual data, probably followed by a checksum or something to guarantee integrity. Once the header has arrived at a node, it's possible to make routing decisions and pass it along immediately, rather than waiting for the entire packet to arrive first. Wormhole routing dramatically reduces latency, but creates new possibilities for deadlock. 57 Evaluate Network Topology Diameter Connectivity Bisection width Channel width Channel rate Channel bandwidth Bisection bandwidth 58 29
30 Metrics Diameter: Maximum distance between any two processors in the network. The distance between two processors is defined as the shortest path, in terms of links, between them. This relates to communication time. Diameter for completely connected network is 1, for star network is 2, for ring is p/2 (for p even processors) 59 Metrics Connectivity: This is a measure of the multiplicity of paths between any two processors (# arcs that must be removed to break into two). High connectivity is desired since it lowers contention for communication resources. Connectivity is 1 for linear array, 1 for star, 2 for ring, 2 for mesh, 4 for tours in previous examples 60 30
31 Metrics Bisection width: Minimum # of communication links that have to be removed to partition the network into two equal halves. Bisection width is 2 for ring, sq. root(p) for mesh with p (even) processors, p/2 for hypercube, (p*p)/4 for completely connected (p even). Channel width: # of physical wires in each communication link Channel rate: Peak rate at which a single physical wire link can deliver bits 61 Metrics Channel bandwidth: Peak rate at which data can be communicated between the ends of a communication link: = (channel width) * (channel rate) ) Bisection bandwidth: Minimum volume of communication allowed between any 2 halves of the network with equal # of processors: = (bisection width) * (channel BW) 62 31
32 Example I: 2D Mesh Processor nodes n=d 2 Without wraparound connections Switch nodes n Diameter 2( n 1) Bisection width n Edges/node 4 Edge length constant 63 Example 2: Binary Tree Network Processor nodes n=2 d Switch nodes 2n-1 Diameter 2logn Bisection width 1 Edges/node 3 Edge length variable 64 32
33 Next Lecture. Topics: Programming with MPI Deadlines: Course: Read syllabus and tentative schedule Print slides for next lecture. Take slides with you!!!! Seminar: Choose your seminar day and your paper Project: Read the project descriptions - Next deadline is 2/19!!!! Homework: No homework assignment this time. 65 Get some Practice Find memory model and topology for the following machines: CRAY T3E CRAY SV1 IBM RS/6000 SP Hitachi SR8000 Compaq HPC320 IBM eserver p690 SGI Origin 2000 Clusters of SMPs Cray T3D Fujitsu VPP5000 series Next lecture: you give your answers!!!! 66 33
Interconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationPhysical Organization of Parallel Platforms. Alexandre David
Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationCSC630/CSC730: Parallel Computing
CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control
More informationOutline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)
Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationInterconnection Network
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics
More informationInterconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationOverview of Parallel Computing. Timothy H. Kaiser, PH.D.
Overview of Parallel Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu Introduction What is parallel computing? Why go parallel? The best example of parallel computing Some Terminology Slides and examples
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationCS575 Parallel Processing
CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationFundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.
Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationOverview of High Performance Computing
Overview of High Performance Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu http://inside.mines.edu/~tkaiser/csci580fall13/ 1 Near Term Overview HPC computing in a nutshell? Basic MPI - run an example
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationTDT Appendix E Interconnection Networks
TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages
More informationParallel Numerics, WT 2013/ Introduction
Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationObjective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.
CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes
More informationInterconnection networks
Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationLet s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.
Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationEE382 Processor Design. Illinois
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors Part II EE 382 Processor Design Winter 98/99 Michael Flynn 1 Illinois EE 382 Processor Design Winter 98/99 Michael Flynn 2 1 Write-invalidate
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationIntro to Multiprocessors
The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple
More informationOutline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples
Outline Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples OVERVIEW y What is Parallel Computing? Parallel computing: use of multiple processors
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationParallel Computers. c R. Leduc
Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?
More informationCPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport
CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 2: Architecture of Parallel Computers Hardware Software 2.1.1 Flynn s taxonomy Single-instruction
More informationLecture: Interconnection Networks
Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationCommodity Cluster Computing
Commodity Cluster Computing Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne http://capawww.epfl.ch Commodity Cluster Computing 1. Introduction 2. Characterisation of nodes, parallel machines,applications
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline
More informationLecture 23 Database System Architectures
CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationParallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?
Parallel Programming Concepts Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04 Parallel Background Why Bother? 1 What is Parallel Programming? Simultaneous use of multiple
More informationParallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.
Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)
More informationProgramming Shared Memory Systems with OpenMP Part I. Book
Programming Shared Memory Systems with OpenMP Part I Instructor Dr. Taufer Book Parallel Programming in OpenMP by Rohit Chandra, Leo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, Ramesh Menon 2 1 Machine
More informationParallel Computer Architecture II
Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationMultiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)
Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationCS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationParallel Real-Time Systems
Parallel Real-Time Systems Parallel Computing Overview References (Will be expanded as needed) Website for Parallel & Distributed Computing: www.cs.kent.edu/~jbaker/pdc-f08/ Selected slides from Introduction
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationParallel Ant System on Max Clique problem (using Shared Memory architecture)
Parallel Ant System on Max Clique problem (using Shared Memory architecture) In the previous Distributed Ants section, we approach the original Ant System algorithm using distributed computing by having
More informationCS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2
Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99
More informationIntroduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2
Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS Teacher: Jan Kwiatkowski, Office 201/15, D-2 COMMUNICATION For questions, email to jan.kwiatkowski@pwr.edu.pl with 'Subject=your name.
More informationParallel Architectures
Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 CS61C L28 Parallel Computing (1) Andy Carle Scientific Computing Traditional Science 1) Produce
More informationCS61C : Machine Structures
CS61C L28 Parallel Computing (1) inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 Andy Carle Scientific Computing Traditional Science 1) Produce
More informationTypes of Parallel Computers
slides1-22 Two principal types: Types of Parallel Computers Shared memory multiprocessor Distributed memory multicomputer slides1-23 Shared Memory Multiprocessor Conventional Computer slides1-24 Consists
More informationInterconnection Networks
Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationInterconnection Networks. Issues for Networks
Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationSchool of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor
School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Intro to HPC Architecture Instructor: Ekpe Okorafor A little about me! PhD Computer Engineering Texas A&M University Computer
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1
More informationIntroduction. CSCI 4850/5850 High-Performance Computing Spring 2018
Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel
More informationIntroduction. HPC Fall 2007 Prof. Robert van Engelen
Introduction HPC Fall 2007 Prof. Robert van Engelen Syllabus Title: High Performance Computing (ISC5935-1 and CIS5930-13) Classes: Tuesday and Thursday 2:00PM to 3:15PM in 152 DSL Evaluation: projects
More informationCMSC 611: Advanced. Parallel Systems
CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems
More informationFirst, the need for parallel processing and the limitations of uniprocessors are introduced.
ECE568: Introduction to Parallel Processing Spring Semester 2015 Professor Ahmed Louri A-Introduction: The need to solve ever more complex problems continues to outpace the ability of today's most powerful
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance
More informationCommunication Performance in Network-on-Chips
Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In
More informationParallel Numerics, WT 2017/ Introduction. page 1 of 127
Parallel Numerics, WT 2017/2018 1 Introduction page 1 of 127 Scope Revise standard numerical methods considering parallel computations! Change method or implementation! page 2 of 127 Scope Revise standard
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationModule 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth
Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012
More informationDesign of Parallel Algorithms. The Architecture of a Parallel Computer
+ Design of Parallel Algorithms The Architecture of a Parallel Computer + Trends in Microprocessor Architectures n Microprocessor clock speeds are no longer increasing and have reached a limit of 3-4 Ghz
More information