Introduction to High Performance Computing. Agenda

Size: px
Start display at page:

Download "Introduction to High Performance Computing. Agenda"

Transcription

1 Introduction to High Performance Computing Jon Johansson Academic ICT University of Alberta Agenda What is High Performance Computing? What is a supercomputer? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? 1

2 High Performance Computing HPC is the field that concentrates on developing supercomputers and software to run on supercomputers a main area of this discipline is developing parallel processing algorithms and software programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors High Performance Computing HPC is about big problems, i.e. need: lots of memory many cpu cycles big hard drives no matter what field you work in, perhaps your research would benefit by making problems larger 2d 3d finer mesh increase number of elements in the simulation 2

3 Grand Challenges weather forecasting economic modeling computer-aided design drug design exploring the origins of the universe searching for extra-terrestrial life computer vision nuclear power and weapons simulations Grand Challenges Protein To simulate the folding of a 300 amino acid protein in water: # of atoms: ~ 32,000 folding time: 1 millisecond # of FLOPs: Machine Speed: 1 PetaFLOP/s Simulation Time: 1 year (Source: IBM Blue Gene Project) Ken Dil and Kit Lau s protein folding model. IBM s answer: The Blue Gene Project US$ 100 M of funding to build a 1 PetaFLOP/s computer Charles L Brooks III, Scripps Research Institute 3

4 Grand Challenges - Nuclear National Nuclear Security Administration use supercomputers to run three-dimensional codes to simulate instead of test address critical problems of materials aging simulate the environment of the weapon and try to gauge whether the device continues to be usable stockpile science, molecular dynamics and turbulence calculations Grand Challenges - Nuclear March 7, 2002: first fullsystem three-dimensional ASCI White simulations of a nuclear weapon explosion simulation used more than 480 million cells (grid: 780x780x780) if the grid is a cube 1,920 processors on IBM ASCI White at the Lawrence Livermore National laboratory 2,931 wall-clock hours or days 6.6 million CPU hours Test shot Badger Nevada Test Site Apr Yield: 23 kilotons 4

5 Grand Challenges - Nuclear Advanced Simulation and Computing Program (ASC) Agenda What is High Performance Computing? What is a supercomputer? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? 5

6 What is a Mainframe? large and reasonably fast machines the speed isn't the most important characteristic high-quality internal engineering and resulting proven reliability expensive but high-quality technical support top-notch security strict backward compatibility for older software What is a Mainframe? these machines can, and do, run successfully for years without interruption (long uptimes) repairs can take place while the mainframe continues to run the machines are robust and dependable IBM coined a term advertise the robustness of their mainframe computers : Reliability, Availability and Serviceability (RAS) 6

7 What is a Mainframe? Introducing IBM System z9 109 Designed for the On Demand Business IBM is delivering a holistic approach to systems design Designed and optimized with a total systems approach Helps keep your applications running with enhanced protection against planned and unplanned outages Extended security capabilities for even greater protection capabilities Increased capacity with more available engines per server What is a Supercomputer?? at any point in time the term Supercomputer refers to the fastest machines currently available a supercomputer this year might be a mainframe in a couple of years a supercomputer is typically used for scientific and engineering applications that must do a great amount of computation 7

8 What is a Supercomputer?? the most significant difference between a supercomputer and a mainframe: a supercomputer channels all its power into executing a few programs as fast as possible if the system crashes, restart the job(s) no great harm done a mainframe uses its power to execute many programs simultaneously e.g. a banking system must run reliably for extended periods What is a Supercomputer?? to see the worlds fastest computers look at top500 org/ measure performance with the Linpack benchmark solve a dense system of linear equations the performance numbers give a good indication of peak performance 8

9 Terminology combining a number of processors to run a program is called variously: multiprocessing parallel processing coprocessing Terminology parallel computing harnessing a bunch of processors on the same machine to run your computer program note that this is one machine generally a homogeneous architecture same processors, memory, operating system all the machines in the Top 500 are in this category 9

10 Terminology cluster: a set of generally homogeneous machines originally i built using low-cost commodity hardware to increase density, clusters are now commonly build with 1-u rack servers or blades can use standard network interconnect or high performance interconnect such as Infiniband ib or Myrinet cluster hardware is becoming quite specialized thought of as a single machine with a name, e.g. glacier glacier.westgrid.ca Terminology distributed computing - harnessing a bunch of processors on different machines to run your computer program heterogeneous architecture different operating systems, cpus, memory the terms parallel and distributed computing are often used interchangeably the work is divided into sections so each processor does a unique piece 10

11 Terminology some distributed computing projects are built on BOINC (Berkeley Open Infrastructure t for Network Computing): SETI@home Search for Extraterrestrial Intelligence Proteins@home deduces DNA sequence, given a protein Hydrogen@home enhance clean energy technology by improving hydrogen production and storage (this is beta now) Grid computing Terminology a Grid is a cluster of supercomputers in the ideal case: we submit our job with resource requirements the job is run on a machine with available resources we get results back NOTE: we don t care where the resources are, just that the job is run. 11

12 Terminology Utility computing computation and storage facilities are provided as a commercial service charges are for resources actually used Pay and Use computing Cloud computing aka on-demand computing any IT-related capability can be provided as a service repackages grid computing and utility computing users can access computing resources in the Cloud i.e. out in the Internet How to Measure Speed? count the number of floating point operations required to solve the problem + - x / results of the benchmark are so many Floating point Operations Per Second (FLOPS) a supercomputer is a machine that can provide a very large number of FLOPS 12

13 Floating Point Operations multiply x1000 matrices for each resulting array element 1000 multiplies 999 adds do this 1,000,000 times ~10 9 operations needed increasing array size has the number of operations increasing as O(N 3 ) N 2... N N 2... N Agenda What is High Performance Computing? What is a supercomputer? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? 13

14 High Performance Computing supercomputers use many CPUs to do the work note that all supercomputing architectures have processors and some combination cache some form of memory and IO the processors are separated from the other processors by some distance there are major differences in the way that the parts are connected some problems fit into different architectures better than others High Performance Computing increasing computing power available to researchers allows increasing problem dimensions adding more particles to a system increasing the accuracy of the result improving experiment turnaround time 14

15 Flynn s Taxonomy Michael J. Flynn (1972) classified computer architectures based on the number of concurrent instructions and data streams available single instruction, single data (SISD) basic old PC multiple instruction, single data (MISD) redundant systems single instruction, multiple data (SIMD) vector (or array) processor multiple instruction, multiple data (MIMD) shared or distributed memory systems: symmetric multiprocessors and clusters common extension: single program (or process), multiple data (SPMD) Architectures we can also classify supercomputers according to how the processors and memory are connected couple processors to a single large memory address space couple computers, each with its own memory address space 15

16 Symmetric Multiprocessing (SMP) Uniform Memory Access (UMA) multiple CPUs, residing in one cabinet, share the same memory processors and memory are tightly coupled the processors share memory and the I/O bus or data path Architectures Architectures SMP a single copy of the operating system is in charge of all the processors SMP systems range from two to as many as 32 or more processors 16

17 Architectures SMP "capability computing" one CPU can use all the memory all the CPUs can work on a little memory whatever you need Architectures UMA-SMP negatives as the number of CPUs get large the buses become saturated long wires cause latency problems 17

18 Architectures Non-Uniform Memory Access (NUMA) NUMA is similar to SMP - multiple CPUs share a single memory space hardware support for shared memory memory is separated into close and distant banks basically a cluster of SMPs memory on the same processor board as the CPU (local memory) is accessed faster than memory on other processor boards (shared memory) hence "non-uniform" NUMA architecture scales much better to higher numbers of CPUs than SMP Architectures 18

19 Architectures University of Alberta SGI Origin SGI NUMA cables Architectures Cache Coherent NUMA (ccnuma) each CPU has an associated cache ccnuma machines use special-purpose hardware to maintain cache coherence typically done by using inter-processor communication between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache ccnuma performs poorly when multiple processors attempt to access the same memory area in rapid succession 19

20 Architectures Distributed Memory Multiprocessor (DMMP) each computer has its own memory address space looks like NUMA but there is no hardware support for remote memory access the special purpose switched network is replaced by a general purpose network such as Ethernet or more specialized interconnects: Infiniband Myrinet Lattice: Calgary s HP ES40 and ES45 cluster each node has 4 processors Architectures Massively Parallel Processing (MPP) Cluster of commodity PCs processors and memory are loosely coupled "capacity computing" each CPU contains its own memory and copy of the operating system and application. each subsystem communicates with the others via a highspeed interconnect. in order to use MPP effectively, a problem must be breakable into pieces that can all be solved simultaneously 20

21 Architectures Architectures lots of how to build a cluster tutorials on the web just Google: uilding.html 21

22 Architectures Vector Processor or Array Processor a CPU design that is able to run mathematical operations on multiple data elements simultaneously a scalar processor operates on data elements one at a time vector processors formed the basis of most supercomputers through the 1980s and into the 1990s pipeline the data Architectures Vector Processor or Array Processor operate on many pieces of data simultaneously consider the following add instruction: C = A + B on both scalar and vector machines this means: add the contents of A to the contents of B and put the sum in C' on a scalar machine the operands are numbers on a vector machine the operands are vectors and the instruction directs the machine to compute the pair-wise sum of each pair of vector elements 22

23 Architectures University of Victoria has 4 NEC SX-6/8A vector processors in the School of Earth and Ocean Sciences each has 32 GB of RAM 8 vector processors in the box peak performance is 72 GFLOPS Agenda What is High Performance Computing? What is a supercomputer? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? 23

24 BlueGene/L The fastest on the Nov top 500 list: top500 org/ installed at the Lawrence Livermore National Laboratory (LLNL) (US Department of Energy) Livermore California 24

25 BlueGene/L processors: memory: 72 TB 104 racks each has 2048 processors the first 64 had 512 GB of RAM (256 MB/processor) the 40 new racks have 1 TB of RAM (512 MB/processor) a Linpack performance of TFlop/s in Nov 2005 it was the only system ever to exceed the 100 TFlop/s mark there are now 10 machines over 100 TFlop/s The Fastest Five Site Computer Cores Year R max (Gflops) R peak (Gflops) DOE/NNSA/LANL United States DOE/NNSA/LLNL United States Argonne National Laboratory United States Roadrunner BladeCenter QS22/LS21 Cluster Cell/Opteron IBM BlueGene/L - eserver Blue Gene Solution IBM BlueGene/P Solution IBM ,026,000 1,375, , , , ,060 Texas Advanced Computing Center/Univ. of Texas United States DOE/Oakridge National Laboratory United States Ranger SunBlade x6420, Opteron Quad 2 GHz SGI Jaguar Cray XT4 QuadCore Opteron 2.1 GHz Hewlett-Packard , , , ,000 25

26 # of Processors with Time The number of processors in the fastest machines has increased by about a factor of 200 in the last 15 years # of Gflops Increase with Time One Petaflop! Machine speed has increased by more than a factor of since 1993 Roadrunner tests at > 1 petaflop for June

27 Future BlueGene Roadrunner cores: ,562 Opteron dual-core, 12,240 Cell memory: 98 TB 278 racks a Linpack performance of TFlop/s in June 2008 it was the only system ever to exceed the 1 PetaFlop/s mark cost: $100 million weight: 500,000 lbs power: 2.35 (or 3.9) megawatts 27

28 Roadrunner Agenda What is High Performance Computing? What is a supercomputer? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? 28

29 Speedup how can we measure how much faster our program runs when using more than one processor? define Speedup S as: T1 S = the ratio of 2 program execution times T constant problem size P T 1 is the execution time for the problem on a single processor (use the best serial time) T P is the execution time for the problem on P processors Speedup Linear speedup p the time to execute the problem decreases by the number of processors if a job requires 1 week with 1 processor it will take less that 10 minutes with 1024 processors 29

30 Speedup Sublinear speedup the usual case there are generally some limitations to the amount of speedup that you get communication Speedup Superlinear speedup very rare memory access patterns may allow this for some algorithms 30

31 why do a speedup test? it s hard to tell how a program will behave e.g. Strange is actually fairly common behaviour for untuned code in this case: linear speedup to ~10 cpus after 24 cpus speedup is starting to decrease Speedup Speedup to use more processors efficiently i change this behaviour change loop structure adjust algorithms?? run jobs with processors so the machines are used efficiently 31

32 Speedup one class of jobs that have linear speed up are called embarrassingly parallel a better name might be perfectly parallel doesn t take much effort to turn the problem into a bunch of parts that can be run in parallel: parameter searches rendering the frames in a computer animation brute force searches in cryptography Speedup we have been discussing Strong Scaling the problem size is fixed and we increase the number of processors decrease computational time (Amdahl Scaling) the amount of work available to each processor decreases as the number of processors increases eventually, the processors are doing more communication than number crunching and the speedup curve flattens difficult to have high h efficiency i for large numbers of processors 32

33 Speedup we are often interested in Weak Scaling double the problem size when we double the number of processors constant computational time (Gustafson scaling) the amount of work for each processor has stays roughly constant parallel overhead is (hopefully) small compared to the real work the processor does e.g. Weather prediction Amdahl s Law Gene Amdahl: 1967 parallelize some of the program some must serial parallel remain serial f is the fraction of the calculation that is serial 1-f is the fraction of the calculation that is parallel f 1-f the maximum speedup that 1 can be obtained by using P Smax = processors is: (1 f ) f + P 33

34 Amdahl s Law if 25% of the calculation must remain serial the best speedup you can obtain is 4 need to parallelize as much of the program as possible to get the best advantage from multiple processors Agenda What is High Performance Computing? What is a supercomputer? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? 34

35 Parallel Programming need to do something to your program to use multiple processors need to incorporate commands into your program which allow multiple threads to run one thread per processor each thread gets a piece of the work several ways (APIs) to do this Parallel Programming OpenMP introduce statements into your code in C: #pragma in FORTRAN: C$OMP or!$omp can compile serial and parallel executables from the same source code restricted to shared memory machines not clusters! 35

36 Parallel Programming OpenMP demo: MatCrunch mathematical operations on the elements of an array introduce 2 OMP directives before a loop # pragma omp parallel // define a parallel section # pragma omp for // loop is to be parallel serial section: 4.03 sec parallel section 1 cpu: secs parallel l section 2 cpu: secs speedup = 1.99 // not bad for adding 2 lines Parallel Programming for a larger number of processors the speedup for MatCrunch is not linear need to do the speedup test to see how your program will behave 36

37 Parallel Programming MPI (Message Passing Interface) a standard set of communication subroutine libraries works for SMPs and clusters programs written with MPI are highly portable information and downloads MPICH: LAM/MPI: Open MPI: / Parallel Programming MPI (Message Passing Interface) supports the SPMD, single program multiple l data model all processors use the same program each processor has its own data think of a cluster each node is getting a copy of the program but running a specific portion of it with its own data 37

38 Parallel Programming starting mpi jobs is not standard for mpich2 use mpiexec start a job with 6 processes 6 copies of the program run in the default Communicator Group MPI_COMM_WORLD each process has an ID its rank Parallel Programming example: start N processes to calculate N-1 factorial 0! = 1 1! = 1 2! = 2 x 1 = 2 3! = 3 x 2 x 1 = 6 n! = n x (n-1) x x 2 x 1 38

39 Parallel Programming generally the master process will: send work to other processes receive results from processes that complete send more work to those processes do final calculations output results designing i an efficient i algorithm for all this is up to you Parallel Programming it s possible to combine OpenMP and MPI for running on clusters of SMP machines the trick in parallel programming is to keep all the processors working ( load balancing ) working on data that no other processor needs to touch (there aren t any cache conflicts) parallel programming is generally harder than serial programming 39

40 Agenda What is High Performance Computing? What is a supercomputer? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? Grid Computing A computational grid: is a large-scale distributed computing infrastructure composed of geographically distributed, autonomous resource providers lots of computers joined together requires excellent networking that supports resource sharing and distribution offers access to all the resources that are part of the grid compute cycles storage capacity visualization/collaboration is intended for integrated and collaborative use by multiple organizations 40

41 Grids Ian Foster (the Father of the Grid ) says that to be a Grid three points must be met computing resources are not administered centrally many sites connected open standards are used not a proprietary system non-trivial quality of service is achieved it is available most of the time CERN says a Grid is a service for sharing computer power and data storage capacity over the Internet Canadian Academic Computing Sites in

42 Canadian Grids Some sites in Canada have tied their resources together to form 7 Canadian Grid Consortia: ACENET Atlantic Computational Excellence Network CLUMEQ Consortium Laval UQAM McGill and Eastern Quebec for High Performance Computing SCINET University of Toronto HPCVL High Performance Computing Virtual Laboratory RQCHP Reseau Quebecois de calcul de haute performance SHARCNET Shared Hierarchical Academic Research Computing Network WESTGRID Alberta, British Columbia WestGrid SFU Campus Edmonton Calgary UBC Campus 42

43 Grids the ultimate goal of the Grid idea is to have a system that you can submit a job to, so that: your job uses resources that fit requirements that you specify 128 nodes on an SMP 200 GB of RAM or 256 nodes on a PC cluster 1 GB/processor when done the results come back to you you don t care where the job runs Vancouver or St. John s or in between Sharing Resources HPC resources are not available quite as readily as your desktop computer the resources must be shared fairly the idea is that each person get as much of the resource as necessary to run their job for a reasonable time if the job can t finish in the allotted time the job needs to checkpoint save enough information to begin running again from where it left off 43

44 Sharing Resources Portable Batch System (Torque) submit a job to PBS job is placed in a queue with other users jobs jobs in the queue are prioritized by a scheduler your job executes at some time in the future An HPC Site Sharing Resources When connecting to a Grid we need a layer of middleware tools to securely access the resources Globus is one example org/ A Grid of HPC Sites 44

45 Questions? Many details in other sessions of this seminar series! 45

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines

More information

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 Intro to HPC Architecture Instructor: Ekpe Okorafor A little about me! PhD Computer Engineering Texas A&M University Computer

More information

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

What are Clusters? Why Clusters? - a Short History

What are Clusters? Why Clusters? - a Short History What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

High Performance Computing (HPC) Introduction

High Performance Computing (HPC) Introduction High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing

More information

BİL 542 Parallel Computing

BİL 542 Parallel Computing BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,

More information

High Performance Computing Course Notes HPC Fundamentals

High Performance Computing Course Notes HPC Fundamentals High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program. Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance

More information

Parallel Computing. Hwansoo Han (SKKU)

Parallel Computing. Hwansoo Han (SKKU) Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo

More information

High Performance Computing Course Notes Course Administration

High Performance Computing Course Notes Course Administration High Performance Computing Course Notes 2009-2010 2010 Course Administration Contacts details Dr. Ligang He Home page: http://www.dcs.warwick.ac.uk/~liganghe Email: liganghe@dcs.warwick.ac.uk Office hours:

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Overview of High Performance Computing

Overview of High Performance Computing Overview of High Performance Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu http://inside.mines.edu/~tkaiser/csci580fall13/ 1 Near Term Overview HPC computing in a nutshell? Basic MPI - run an example

More information

PCS - Part 1: Introduction to Parallel Computing

PCS - Part 1: Introduction to Parallel Computing PCS - Part 1: Introduction to Parallel Computing Institute of Computer Engineering University of Lübeck, Germany Baltic Summer School, Tartu 2009 Part 1 - Overview Reasons for parallel computing Goals

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

Dheeraj Bhardwaj May 12, 2003

Dheeraj Bhardwaj May 12, 2003 HPC Systems and Models Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110 016 India http://www.cse.iitd.ac.in/~dheerajb 1 Sequential Computers Traditional

More information

Introduction to Cluster Computing

Introduction to Cluster Computing Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA Overview High performance computing High throughput computing NOW, HPC, and HTC Parallel algorithms Software

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

Making a Case for a Green500 List

Making a Case for a Green500 List Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges

More information

UVA HPC & BIG DATA COURSE INTRODUCTORY LECTURES. Adam Belloum

UVA HPC & BIG DATA COURSE INTRODUCTORY LECTURES. Adam Belloum UVA HPC & BIG DATA COURSE INTRODUCTORY LECTURES Adam Belloum Introduction to Parallel programming distributed systems Parallel programming MPI/openMP/RMI Service Oriented Architecture and Web Service Grid

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Fabio AFFINITO.

Fabio AFFINITO. Introduction to High Performance Computing Fabio AFFINITO What is the meaning of High Performance Computing? What does HIGH PERFORMANCE mean??? 1976... Cray-1 supercomputer First commercial successful

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 CS61C L28 Parallel Computing (1) Andy Carle Scientific Computing Traditional Science 1) Produce

More information

CS61C : Machine Structures

CS61C : Machine Structures CS61C L28 Parallel Computing (1) inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #28: Parallel Computing 2005-08-09 Andy Carle Scientific Computing Traditional Science 1) Produce

More information

Parallel Computers. c R. Leduc

Parallel Computers. c R. Leduc Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

What does Heterogeneity bring?

What does Heterogeneity bring? What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Parallel Computing Introduction

Parallel Computing Introduction Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

ECE 574 Cluster Computing Lecture 1

ECE 574 Cluster Computing Lecture 1 ECE 574 Cluster Computing Lecture 1 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 22 January 2019 ECE574 Distribute and go over syllabus http://web.eece.maine.edu/~vweaver/classes/ece574/ece574_2019s.pdf

More information

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Parallel Processing Basics Prof. Onur Mutlu Carnegie Mellon University Readings Required Hill, Jouppi, Sohi, Multiprocessors and Multicomputers, pp. 551-560 in Readings in Computer

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother? Parallel Programming Concepts Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04 Parallel Background Why Bother? 1 What is Parallel Programming? Simultaneous use of multiple

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming ATHENS Course on Parallel Numerical Simulation Munich, March 19 23, 2007 Dr. Ralf-Peter Mundani Scientific Computing in Computer Science Technische Universität München

More information

Intro to Multiprocessors

Intro to Multiprocessors The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple

More information

Practical Scientific Computing

Practical Scientific Computing Practical Scientific Computing Performance-optimised Programming Preliminary discussion, 17.7.2007 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de Dipl.-Geophys.

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Customer Success Story Los Alamos National Laboratory

Customer Success Story Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Case Study June 2010 Highlights First Petaflop

More information

Outline. Lecture 11: EIT090 Computer Architecture. Small-scale MIMD designs. Taxonomy. Anders Ardö. November 25, 2009

Outline. Lecture 11: EIT090 Computer Architecture. Small-scale MIMD designs. Taxonomy. Anders Ardö. November 25, 2009 Outline Anders Ardö EIT Electrical and Information Technology, Lund University 1 / 49 2 / 49 Taxonomy SISD (Single Instruction stream, Single Data stream) traditional uniprocessor SIMD (Single Instruction

More information

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS CPUs DO NOT share physical memory IITAC Cluster [in Lloyd building] 346 x IBM e326 compute node each with 2 x 2.4GHz

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why serial is not enough Computing architectures Parallel paradigms Message Passing Interface How

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013 Readings: Multiprocessing Required Amdahl, Validity of the single processor

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Advanced Scientific Computing

Advanced Scientific Computing Advanced Scientific Computing Fall 2010 H. Zhang CS595 Overview, Page 1 Topics Fundamentals: Parallel computers (application oriented view) Parallel and distributed numerical computation MPI: message-passing

More information

Distributed and Cloud Computing

Distributed and Cloud Computing Distributed and Cloud Computing K. Hwang, G. Fox and J. Dongarra Chapter 2: Computer Clusters for Scalable parallel Computing Adapted from Kai Hwang University of Southern California March 30, 2012 Copyright

More information

Practical Scientific Computing

Practical Scientific Computing Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming Dipartimento di Informatica e Sistemistica University of Pavia Processor Architectures, Fall 2011 Denition Motivation Taxonomy What is parallel programming? Parallel computing is the simultaneous use of

More information

High Performance Computing in C and C++

High Performance Computing in C and C++ High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University WELCOME BACK Course Administration Contact Details Dr. Rita Borgo Home page: http://cs.swan.ac.uk/~csrb/

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Why do we need HPC? High Performance Computing Amazon can ship products within hours would it

More information

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 2 part 1. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 2 part 1 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Provocative question (p30) How much do we need to know about the HW to write good par. prog.? Chap. gives HW background knowledge

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster

More information

Computer parallelism Flynn s categories

Computer parallelism Flynn s categories 04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information