Overview of High Performance Computing
|
|
- Sharleen Fields
- 6 years ago
- Views:
Transcription
1 Overview of High Performance Computing Timothy H. Kaiser, PH.D. Show_me_some_local_HPC_tutorials/ 1
2 Introduction What is High Performance Computing? Why go parallel? When do you go parallel? What are some limits of parallel computing? Types of parallel computers Some terminology What is available How this all works 2
3 What the Exa? Exa = 1,152,921,504,606,846,976 = 2**60=1024**6 = 10**18.06 Peta = 1,125,899,906,842,624 = 2**50=1024**5 = 10**15.05 Tera = 1,099,511,627,776 = 2**40=1024**4 = 10**12.04 Giga = 1,073,741,824 = 2**30=1024**3 = 10**9.03 Mega = 1,048,576 = 2**20=1024**2 = 10**6.02 Kilo = 1,024 = 2**10=1024**1 = 10**3.01 3
4 Top 500 4
5 What is Parallelism? Consider your favorite computational application One processor can give me results in N hours Why not use N processors -- and get the results in just one hour? The concept is simple: Parallelism = applying multiple processors to a single problem 5
6 Parallel computing is computing by committee Parallel computing: the use of multiple computers or processors working together on a common task. Each processor works on its section of the problem Grid of a Problem to be Solved Process 0 does work for this region Process 1 does work for this region Processors are allowed to exchange information with other processors Process 2 does work for this region Process 3 does work for this region 6
7 Why do parallel computing? Limits of single CPU computing Available memory Performance Parallel computing allows: Solve problems that don t fit on a single CPU Solve problems that can t be solved in a reasonable time 7
8 Why do parallel computing? We can run Larger problems Faster More cases Run simulations at finer resolutions Model physical phenomena more realistically 8
9 Weather Forecasting Atmosphere is modeled by dividing it into three-dimensional regions or cells 1 mile x 1 mile x 1 mile (10 cells high) about 500 x 10 6 cells. The calculations of each cell are repeated many times to model the passage of time. About 200 floating point operations per cell per time step or floating point operations necessary per time step 10 day forecast with 10 minute resolution => 1.5x10 14 flop 100 Mflops would take about 17 days 1.7 Tflops would take 2 minutes 17 Tflops would take 8 seconds 105 Tflops would take 1.3 seconds What might you want to do if running for 1.3 seconds? 9
10 Modeling Motion of Astronomical bodies (brute force) Each body is attracted to each other body by gravitational forces. Movement of each body can be predicted by calculating the total force experienced by the body. For N bodies, N - 1 forces / body yields N 2 calculations each time step A galaxy has, stars => 10 9 years for one iteration Using a N log N efficient approximate algorithm => about a year NOTE: This is closely related to another hot topic: Protein Folding 10
11 Types of parallelism two extremes Data parallel Each processor performs the same task on different data Example - grid problems Bag of Tasks or Embarrassingly Parallel is a special case Task parallel Each processor performs a different task Example - signal processing such as encoding multitrack data Pipeline is a special case 11
12 Simple data parallel program Example: integrate 2-D propagation problem Starting partial differential equation: Finite Difference Approximation: PE #0 PE #1 PE #2 PE #3 y PE #4 PE #5 PE #6 PE #7 x 12
13 Typical Task Parallel Application DATA Normalize Task FFT Task Multiply Task Inverse FFT Task Signal processing Use one processor for each task Can use more processors if one is overloaded This is a pipeline 13
14 Parallel Program Structure Communicate & Repeat work 1a work 2a work (N)a Begin start parallel work 1b work 1c work 2b work 2c work (N)b work (N)c End Parallel End work 1d work 2d work (N)d 14
15 Parallel Problems Communicate & Repeat work 1a work 2a work (N)a Begin start parallel work 1b work 1c work 2b work 2c work (N)b work (N)c End Parallel Start Serial Section work 1d work 2d Subtasks don t finish together work (N)d Serial Section (No Parallel Work) work 1x work 2x work (N)x End Serial Section start parallel work 1y work 2y work (N)y work 1z work 2z work (N)z Not using all processors End Parallel End 15
16 A Real example #!/usr/bin/env python from sys import argv from os.path import isfile from time import sleep from math import sin,cos # fname="message" my_id=int(argv[1]) print("\n%d starting program \n" % (my_id)) # if (my_id == 1): sleep(2) myval=cos(10.0) mf=open(fname,"w") mf.write(str(myval)) mf.close() if (my_id == 0): myval=sin(10.0) notready=true while notready : if isfile(fname) : notready=false sleep(3) mf=open(fname,"r") message=float(mf.readline()) mf.close() total=myval**2+message**2 else: sleep(5) print("sin(10)**2+cos(10)**2= %15.12f" % (total)) print("%d done with program \n" %(my_id)) 16
17 Theoretical upper limits All parallel programs contain: Parallel sections Serial sections Serial sections are when work is being duplicated or no useful work is being done, (waiting for others) Serial sections limit the parallel effectiveness If you have a lot of serial computation then you will not get good speedup No serial work allows perfect speedup Amdahl s Law states this formally 17
18 Amdahl s Law Amdahl s Law places a strict limit on the speedup that can be realized by using multiple processors. Effect of multiple processors on run time t p = (f p /N + f s )t s Effect of multiple processors on speed up Where Fs = serial fraction of code Fp = parallel fraction of code N = number of processors Perfect speedup t=t1/n or S(n)=n S = t s tp = 1 fp /N + f s 18
19 Illustration of Amdahl's Law It takes only a small fraction of serial content in a code to degrade the parallel performance. 19
20 Amdahl s Law Vs. Reality Amdahl s Law provides a theoretical upper limit on parallel speedup assuming that there are no costs for communications. In reality, communications will result in a further degradation of performance fp = Amdahl's Law Reality Number of processors 20
21 Sometimes you don t get what you expect! 21
22 Some other considerations Writing effective parallel application is difficult Communication can limit parallel efficiency Serial time can dominate Load balance is important Is it worth your time to rewrite your application Do the CPU requirements justify parallelization? Will the code be used just once? 22
23 Parallelism Carries a Price Tag Parallel programming Involves a steep learning curve Is effort-intensive Parallel computing environments are unstable and unpredictable Don t respond to many serial debugging and tuning techniques May not yield the results you want, even if you invest a lot of time Will the investment of your time be worth it? 23
24 Terms related to algorithms Amdahl s Law (talked about this already) Superlinear Speedup Efficiency Cost Scalability Problem Size Gustafson s Law 24
25 Superlinear Speedup S(n) > n, may be seen on occasion, but usually this is due to using a suboptimal sequential algorithm or some unique feature of the architecture that favors the parallel formation. One common reason for superlinear speedup is the extra cache in the multiprocessor system which can hold more of the problem data at any instant, it leads to less, relatively slow memory traffic. 25
26 Efficiency Efficiency = Execution time using one processor over the Execution time using a number of processors Its just the speedup divided by the number of processors 26
27 Cost The processor-time product or cost (or work) of a computation defined as Cost = (execution time) x (total number of processors used) The cost of a sequential computation is simply its execution time, t s. The cost of a parallel computation is t p x n. The parallel execution time, t p, is given by t s /S(n) Hence, the cost of a parallel computation is given by Cost-Optimal Parallel Algorithm One in which the cost to solve a problem on a multiprocessor is proportional to the cost 27
28 Scalability Used to indicate a hardware design that allows the system to be increased in size and in doing so to obtain increased performance - could be described as architecture or hardware scalability. Scalability is also used to indicate that a parallel algorithm can accommodate increased data items with a low and bounded increase in computational steps - could be described as algorithmic scalability. 28
29 Problem size Problem size: the number of basic steps in the best sequential algorithm for a given problem and data set size Intuitively, we would think of the number of data elements being processed in the algorithm as a measure of size. However, doubling the date set size would not necessarily double the number of computational steps. It will depend upon the problem. For example, adding two matrices has this effect, but multiplying matrices quadruples operations. Note: Bad sequential algorithms tend to scale well 29
30 Other names for Scaling Strong Scaling (Engineering) For a fixed problem size how does the time to solution vary with the number of processors Weak Scaling How the time to solution varies with processor count with a fixed problem size per processor 30
31 Some Classes of machines Network Processor Processor Processor Processor Memory Memory Memory Memory Distributed Memory Processors only Have access to their local memory talk to other processors over a network 31
32 Some Classes of machines Uniform Shared Memory (UMA) Processor Processor All processors have equal access to Memory Processor Processor Memory Processor Processor Can talk via memory Processor Processor 32
33 Some Classes of machines Hybrid Shared memory nodes connected by a network... 33
34 Some Classes of machines More common today Each node has a collection of multicore chips... Ra has 268 nodes 256 quad core dual socket 12 dual core quad socket 34
35 Some Classes of machines Hybrid Machines Add special purpose processors to normal processors Not a new concept but, regaining traction Example: our Power8/K80 nodes Issue: transfer speed between units "Normal" CPU Special Purpose Processor FPGA, GPU, Vector, Cell... 35
36 Network Topology For ultimate performance you may be concerned how you nodes are connected. Avoid communications between distant node For some machines it might be difficult to control or know the placement of applications 36
37 Network Terminology Latency How long to get between nodes in the network. Bandwidth How much data can be moved per unit time. Bandwidth is limited by the number of wires and the rate at which each wire can accept data and choke points 37
38 Ring 38
39 Grid Wrapping produces torus 39
40 Tree Fat tree the lines get wider as you go up 40
41 Hypercube dimensional hypercube 41
42 4D Hypercube Some communications algorithms are hypercube based How big would a 9d hypercube be? 42
43 5d Torus 3d Grid 01,03,29 3d Torus adds 01,03,29 5d adds 12 43
44 5d - Blue Gene Q MidPlane nodes 4x4x4x4x2 44
45 5D Torus Network BGQ Layout The network topology of BlueGene/Q is a five-dimensional (5D) torus, with direct links between the nearest neighbors in the ±A, ±B, ±C, ±D, and ±E directions. As such there are only a few optimum block sizes that will use the network efficiently. Node Boards Compute Nodes Cores Torus Dimensions x2x2x2x2 2 (adjacent pairs) x2x4x2x2 4 (quadrants) x2x4x4x2 8 (halves) x2x4x4x2 16 (midplane) x4x4x4x2 32 (1 rack) x4x4x8x2 64 (2 racks) x4x8x8x2 45
46 Star? Quality depends on what is in the center 46
47 Example: An Infiniband Switch Infiniband, DDR, Cisco 7024 IB Server Switch - 48 Port Adaptors. Each compute node has one DDR 1-Port HCA 4X DDR=> 16Gbit/sec 140 nanosecond hardware latency 1.26 microsecond at software level 47
48 Measured Bandwidth 48
49 Infiniband Rates 49
50 New Kid on the Block - Intel Omnipath Designed with the technical and cost requirements of future exascale supercomputers in mind Packet Integrity Protection: a link-level error checking capability that is applied to all data traversing the wire. It allows for transparent detection and recovery of transmission errors as they occur. Dynamic Lane Scaling: maintains link continuity in the event of a lane failure. With the help of PIP, Omni-Path uses the remaining lanes in the link to continue operation. Traffic Flow Optimization: improves quality of service by allowing higher priority data packets to preempt lower priority packets, regardless of packet ordering. 50
51 More Omnipath Info 100 gigabits/sec of bandwidth per port, with port-to-port latencies on par with that of EDR InfiniBand. Intel has stated that their host architecture supports message rates of up to 160 million messages per second Higher Density switches Host Integration Roadmap Intel is planning to offer an in-package host adapter configuration, where the fabric ASIC is integrated into the processor socket. Further down the road, the Omni-Path host interface will be integrated directly into the processor. 51
52 Back to coprocessors In the simple case all nodes contain just a collection of normal CPUs and Memory Similar to desktop to laptop machines Connected together via some network There are nonstandard nodes CPU with GPU FPGA High core count (Knights XXX or Phi) 52
53 Graphic Processing Unit - GPU Graphics cards are available in many systems Some years ago people realized graphics cards are good at some operations - vector and matrix Why not use them for general computing Difficulties Initially not designed for it Difficult to program bandwidth to/from CPU memory 53
54 GPU - Now NVIDIA is the biggest supplier of GPU cards for HPC Cards developed specifically for processing Programming has become easier Bandwidth is much improved Special instructions for AI Many libraries available Lots of applications 54
55 Vintage Nvidia GPU Systems 55
56 56
57 Nvidia - IBM Two computers Summit (ORNL) Sierra (LLNL) Pflops IBM Power 9 - Nvidia Volta GPU NVLink High Speed Interconnect EDR Infiniband 57
58 DoE IBM/Nvidia Machines Combines IBM Power 9 CPU Nvidia Volta GPU NVLink interconnect 58
59 Key features Volta will Peak out at over 7 Tflops Stacked Memory (very dense and lots of it) NVLink is a key technology in Summit s and Sierra s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other s memory (Unified memory >512GB HBM+DDR4) NVLink will be up to 5 to 12 times faster than PCI3 Less than half the watts per flop of current generation chips > 40 Tflops/node * 3,400 nodes about 150 PFlops Back of the envelop calculation - Power 9 = 14 Tflops (Don t quote me on this.) 59
60 What is Intel Knights-xxx or Xeon - Phi Xeon - Phi (Mic) A processing chip that contains a large number of cores >60 with > 240 threads Cores are lower performance than normal Xeon Slower clock speed 1st gen Only in order and Missing some Xeon instructions Runs as a coprocessor, can t boot OS 60
61 Current and Coming Knights Landing 3 versions: 1 Card and 2 bootable Support for full instruction set On package memory On board external memory with bootable versions Much better memory bandwidth 3 Tflops each, implies 576 Tflops/rack (48*4*3) 61
62 Shipping Knights Landing 62
63 What is Intel Knights-xxx or Xeon - Phi Many programming paradigms Runs regular C and Fortran Intel compilers Supports OpenMP Can run MPI Supports offloaded calculations Knights Mill (announced) will have special support for AI 63
64 Our Resources Documentation Hardware Mio BlueM Golden (AuN) Energy (Mc2) Next Machine? 64
65 Getting Help About Resources Getting accounts Mio node information 65
66 Platforms - Overview Mines has three high performance computing platforms available for campus use, AuN, Mc2, and Mio. AuN and Mc2 share a 480 Tbyte file system and are collectively known as BlueM. AuN is a 144 node, 50 Tflop, X86 system. Mc2 is a 512 node Blue Gene Q rated at 104 Tflop. Mio, is a shared resource built up using what is commonly know as the condo model. Individual research groups own nodes and they have priority access. There are also nodes owned by students. Mio currently has ~200 x86 nodes, three x86/gpu nodes, and two 4 way Phi nodes, two Power8/K80GPU nodes and is serviced by a 240 Tbyte file system. 104 Tflops- CPU 66
67 2010-current Mio Nodes: ~200 x86 11 GPUs 8 Phi 2 Power8 ~ 104 Tflops It s All Mine 240 TByte file system 67
68 Mio Concept CCIT Funds infrastructure Groups purchase nodes Groups can use their nodes when they desire Research Groups have priority access to their nodes Students have priority access to TechFee nodes When nodes are not being used by owners they are available for others Owner starting a job will kick others off 68
69 2x Intel Xeon 5770 CPU 24 GB Total 8 cores Total 2.93GHz gpu001 Nvida T10 Processor 4 GB 240 Cores Nvida T10 Processor 4 GB 240 Cores 2x Intel Xeon 5650 CPU 48 GB Total 12 cores Total 2.66GHz gpu003 Nvida M2070 Processor 5.6 GB 448 Cores Nvida M2070 Processor 5.6 GB 448 Cores Nvida M2070 Processor 5.6 GB 448 Cores 2x Intel Xeon 5770 CPU 24 GB Total 8 cores Total 2.93GHz gpu002 Nvida T10 Processor 4 GB 240 Cores Nvida T10 Processor 4 GB 240 Cores IBM Power 8 processor 256 GB Total 20 cores Total 3.49 GHz ppc001 Nvida K80 24 GB 4992 Cores Nvida K80 24 GB 4992 Cores Mio head node, management node, and network switch TechFee GPU Nodes IBM Power 8 processor 256 GB Total 20 cores Total 3.49 GHz ppc002 Nvida K80 24 GB 4992 Cores Nvida K80 24 GB 4992 Cores 240 Tbyte Parallel (GPFS) File System mic0 Intel Xeon Phi 8 GB Coprocessor P 60 Cores 2x Intel Xeon E CPU mic1 Intel Xeon Phi 8 GB Coprocessor P 60 Cores Compute Nodes GB Total 12 cores Total 2.3GHz phi002 mic2 Intel Xeon Phi 8 GB Coprocessor P 60 Cores mic3 Intel Xeon Phi 8 GB Coprocessor P 60 Cores Intel Xeon CPUs Based Nodes mic0 Intel Xeon Phi 8 GB Coprocessor P 60 Cores 2x Intel Xeon E CPU mic1 Intel Xeon Phi 8 GB Coprocessor P 60 Cores GB Each 8-28 cores Each 32GB Total 12 cores Total 2.3GHz phi001 mic2 Intel Xeon Phi 8 GB Coprocessor P 60 Cores 69 mic3 Intel Xeon Phi 8 GB Coprocessor P 60 Cores
70 BlueM Mines` Supercomputer 154 Tflops 17.4 Tbytes 10,496 Cores 85 KW dual architecture Best of both worlds Two Distinct Compute Units idataplex Blue Gene Q Shared 480 Tbyte File System Compact Low Power Consumption 70
71 BlueM s Compute Units - AuN AuN (Golden) idataplex Intel 8x2 core SandyBridge 144 Nodes 2,304 Cores 9,216 Gbytes Feature Latest Generation Intel Processors Large Memory / Node Common architecture Similar user environment to RA and Mio Quickly get researchers up and running 50 Tflops
72 BlueM s Compute Units - MC2 MC2 (Energy) Blue Gene Q PowerPC A2 17 Core 512 Nodes 8,192 Cores 8,192 Gbytes 104 Tflops Feature New Architecture Designed for large core count jobs Highly scaleable Multilevel parallelism - Direction of HPC Room to Grow Future looking machine
73 Mc2 - AuN Comparison Feature Mc2 AuN Gflop/Node Memory/Node 16 Gbytes 64 Gbytes Gflop/Gbytes Recommended Loading 16*4=64 16 Bandwidth Faster Fast 8 73
74 Advertised Layout Cooling Distribution Unit Hose for water cooling
75 75
76 Allocations for BlueM By proposal Must be faculty to propose Students can work on faculty s grant 76
77 Other Resources Xsede RMACC NCAR/UoWy computational-systems/cheyenne 77
78 Extreme Science and Engineering Discovery Environment The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world. It is a single virtual system that scientists can use to interactively share computing resources, data, and expertise. Scientists and engineers around the world use these resources and services things like supercomputers, collections of data, and new tools to make our lives healthier, safer, and better. XSEDE, and the experts who lead the program, will make these resources easier to use and help more people use them. The five-year, $110-million project is supported by the National Science Foundation. In summer of 2016, the NSF announced that XSEDE was awarded an additional 5 years of funding after the first 5-year award completed. Originally, XSEDE replaced and expanded on the NSF TeraGrid project. More than 10,000 scientists used the TeraGrid to complete thousands of research projects, at no cost to the scientists. 78
79 79
80 Rocky Mountain Advanced Computing Consortium The Rocky Mountain Advanced Computing Consortium is a collaboration among academic and research institutions located throughout the intermountain states. Our mission is to facilitate widespread effective use of high performance computing throughout the Rocky Mountain region by: Educating graduate and undergraduate students, faculty, researchers, and industry partners on the use of computational science and high performance computing. Coordinating multi-institutional efforts to advance research, practice, and education in computational science in order to address important regional problems. Bringing together a broad range of researchers, faculty, and industry partners with a depth of experience and expertise not available at any single institution and facilitate their collaboration in multi-disciplinary and multi-institutional teams. Mines is a founding member RMACC High Performance Computing Symposium 80
81 RMACC available resources Accessing Summit Summit is a new HPC resource for researchers at CU, CSU, and RMACC partners Key features include 400 TFlops peak performance General compute nodes High-memory nodes GPGPU nodes KNL Xeon Phi nodes Omni-Path interconnect GPFS scratch filesystem 81
82 NCAR/UoWy Cheyenne 82
83 NCAR/UoWy Cheyenne Climate Simulation Laboratory Researchers must have funding from NSF awards to address the climate-related questions for which they are requesting CSL allocations. University Community In general, any U.S.-based researcher with an NSF award in the atmospheric sciences or computational science in support of the atmospheric sciences NCAR Community NCAR investigators have access Wyoming-NCAR Alliance The NWSC represents a collaboration between NCAR and the University of Wyoming. As part of the Wyoming-NCAR Alliance (WNA), a portion of the Cheyenne system about 160 million core-hours per year is reserved for Wyoming-led projects and allocated by a University of Wyoming-managed process. 83
84 How do you run? Getting on Programming Running 84
85 Getting on ssh from your machine to Mio or BlueM 85
86 Hello World in Parallel Compile your program with Parallel compilers 86
87 Running We write & run a script that tells the system what we want to do 87
88 Output After some time our program will run 88
Overview of High Performance Computing
Overview of High Performance Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu http://inside.mines.edu/~tkaiser/csci580fall13/ 1 Near Term Overview HPC computing in a nutshell? Basic MPI - run an example
More informationOverview of Parallel Computing. Timothy H. Kaiser, PH.D.
Overview of Parallel Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu Introduction What is parallel computing? Why go parallel? The best example of parallel computing Some Terminology Slides and examples
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationAdvanced High Performance Computing CSCI 580
Advanced High Performance Computing CSCI 580 2:00 pm - 3:15 pm Tue & Thu Marquez Hall 322 Timothy H. Kaiser, Ph.D. tkaiser@mines.edu CTLM 241A http://inside.mines.edu/~tkaiser/csci580fall13/ 1 Two Similar
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationIntel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins
Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationComputing on Mio Introduction
Computing on Mio Introduction Timothy H. Kaiser, Ph.D. tkaiser@mines.edu Director - CSM High Performance Computing Director - Golden Energy Computing Organization http://inside.mines.edu/mio/tutorial/
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationThe Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center
The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.
More informationHigh Performance Computing in C and C++
High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University WELCOME BACK Course Administration Contact Details Dr. Rita Borgo Home page: http://cs.swan.ac.uk/~csrb/
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationLecture 20: Distributed Memory Parallelism. William Gropp
Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationDetermining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace
Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace James Southern, Jim Tuccillo SGI 25 October 2016 0 Motivation Trend in HPC continues to be towards more
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers
More informationExascale: challenges and opportunities in a power constrained world
Exascale: challenges and opportunities in a power constrained world Carlo Cavazzoni c.cavazzoni@cineca.it SuperComputing Applications and Innovation Department CINECA CINECA non profit Consortium, made
More informationHigh Performance Computing Course Notes Course Administration
High Performance Computing Course Notes 2009-2010 2010 Course Administration Contacts details Dr. Ligang He Home page: http://www.dcs.warwick.ac.uk/~liganghe Email: liganghe@dcs.warwick.ac.uk Office hours:
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationThe Use of Cloud Computing Resources in an HPC Environment
The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationHigh Performance Computing (HPC) Introduction
High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationUniversity at Buffalo Center for Computational Research
University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationHigh Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA
High Performance Computing Leopold Grinberg T. J. Watson IBM Research Center, USA High Performance Computing Why do we need HPC? High Performance Computing Amazon can ship products within hours would it
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationIntel Knights Landing Hardware
Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute
More informationInfiniBand SDR, DDR, and QDR Technology Guide
White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationParallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?
Parallel Programming Concepts Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04 Parallel Background Why Bother? 1 What is Parallel Programming? Simultaneous use of multiple
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationThread and Data parallelism in CPUs - will GPUs become obsolete?
Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationHow To Design a Cluster
How To Design a Cluster PRESENTED BY ROBERT C. JACKSON, MSEE FACULTY AND RESEARCH SUPPORT MANAGER INFORMATION TECHNOLOGY STUDENT ACADEMIC SERVICES GROUP THE UNIVERSITY OF TEXAS-RIO GRANDE VALLEY Abstract
More informationAMath 483/583 Lecture 11. Notes: Notes: Comments on Homework. Notes: AMath 483/583 Lecture 11
AMath 483/583 Lecture 11 Outline: Computer architecture Cache considerations Fortran optimization Reading: S. Goedecker and A. Hoisie, Performance Optimization of Numerically Intensive Codes, SIAM, 2001.
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationPower Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017
Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past,
More informationHigh Performance Computing Course Notes HPC Fundamentals
High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationAMath 483/583 Lecture 11
AMath 483/583 Lecture 11 Outline: Computer architecture Cache considerations Fortran optimization Reading: S. Goedecker and A. Hoisie, Performance Optimization of Numerically Intensive Codes, SIAM, 2001.
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationDouble Rewards of Porting Scientific Applications to the Intel MIC Architecture
Double Rewards of Porting Scientific Applications to the Intel MIC Architecture Troy A. Porter Hansen Experimental Physics Laboratory and Kavli Institute for Particle Astrophysics and Cosmology Stanford
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationPORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune
PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further
More informationWVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services
WVU RESEARCH COMPUTING INTRODUCTION Introduction to WVU s Research Computing Services WHO ARE WE? Division of Information Technology Services Funded through WVU Research Corporation Provide centralized
More informationScalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany
Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationEARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA
EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationIntroduction to tuning on many core platforms. Gilles Gouaillardet RIST
Introduction to tuning on many core platforms Gilles Gouaillardet RIST gilles@rist.or.jp Agenda Why do we need many core platforms? Single-thread optimization Parallelization Conclusions Why do we need
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationEarly Experiences Writing Performance Portable OpenMP 4 Codes
Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic
More informationThe knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016
The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016 2016 Supermicro 15 Minutes Two Swim Lanes Intel Phi Roadmap & SKUs Phi in the TOP500 Use Cases Supermicro
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationEfficient Parallel Programming on Xeon Phi for Exascale
Efficient Parallel Programming on Xeon Phi for Exascale Eric Petit, Intel IPAG, Seminar at MDLS, Saclay, 29th November 2016 Legal Disclaimers Intel technologies features and benefits depend on system configuration
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University How to.. Process terabytes
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationVincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012
Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Outline NICS and AACE Architecture Overview Resources Native Mode Boltzmann BGK Solver Native/Offload
More informationPART I - Fundamentals of Parallel Computing
PART I - Fundamentals of Parallel Computing Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1 What is scientific computing?
More information