PART I - Fundamentals of Parallel Computing

Similar documents
Concepts from High-Performance Computing

Trends in HPC (hardware complexity and software challenges)

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration

Introduction to High-Performance Computing

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

COMP 633 Parallel Computing.

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

The Beauty and Joy of Computing

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

HPC Technology Trends

INSPUR and HPC Innovation

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

White Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents

High Performance Computing in Europe and USA: A Comparison

computational Fluid Dynamics - Prof. V. Esfahanian

CME 213 S PRING Eric Darve

Steve Scott, Tesla CTO SC 11 November 15, 2011

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

The Beauty and Joy of Computing

Fabio AFFINITO.

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Fundamentals of Computer Design

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Introduction to High-Performance Computing

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

CS4961 Parallel Programming. Lecture 1: Introduction 08/25/2009. Course Details. Mary Hall August 25, Today s Lecture.

High Performance Computing: Architecture, Applications, and SE Issues. Peter Strazdins

High-Performance Scientific Computing

Fundamentals of Computers Design

Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect

Engineering 1000 Chapter 6: Abstraction and Modeling

CS10 The Beauty and Joy of Computing

EE 7722 GPU Microarchitecture. Offered by: Prerequisites By Topic: Text EE 7722 GPU Microarchitecture. URL:

Introduction to Computational Science (aka Scientific Computing)

INTENSIVE COMPUTATION. Annalisa Massini

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

The Use of Cloud Computing Resources in an HPC Environment

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

INSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM

Green Supercomputing

On the Path to Energy-Efficient Exascale: A Perspective from the Green500

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

High Performance Computing

CS61C : Machine Structures

CS61C : Machine Structures

Chapter 5 Supercomputers

Introduction to Parallel Computing

GPU Architecture. Alan Gray EPCC The University of Edinburgh

Motivation Goal Idea Proposition for users Study

Unit 9 : Fundamentals of Parallel Processing

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

Introduction to Computational Mathematics

BİL 542 Parallel Computing

Computer Architecture. R. Poss

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

ORAP Forum October 10, 2013

Hybrid Architectures Why Should I Bother?

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Software Engineering

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA

HPC Algorithms and Applications

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com

An Overview of Cryptanalysis Research for the Advanced Encryption Standard

The Mont-Blanc approach towards Exascale

Computer Architecture. What is it?

Multicore Computing and Scientific Discovery

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

Advanced Topics in Numerical Analysis: High Performance Computing

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Exascale: challenges and opportunities in a power constrained world

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Intel Performance Libraries

Composite Metrics for System Throughput in HPC

Parallelism and Concurrency. COS 326 David Walker Princeton University

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

Using Graphics Chips for General Purpose Computation

Workshop: Innovation Procurement in Horizon 2020 PCP Contractors wanted

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Intel Many Integrated Core (MIC) Architecture

Numerical Methods. PhD Program in Natural Risks Assessment and Management (NRAM) Spring Paola Gervasio

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

High Performance Computing. What is it used for and why?

Parallel and Distributed Programming Introduction. Kenjiro Taura

III. CONCEPTS OF MODELLING II.

John Hengeveld Director of Marketing, HPC Evangelist

Chapter 1: Introduction to Parallel Computing

High Performance Computing. What is it used for and why?

Brand-New Vector Supercomputer

Digital Signal Processor Supercomputing

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

What s inside your computer? Session 3. Peter Henderson

Transcription:

PART I - Fundamentals of Parallel Computing

Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1

What is scientific computing? The concepts of numerical analysis, scientific computing, and simulation are related but distinct. Numerical analysis is the study of the design and analysis of algorithms that use numerical approximations (as opposed to symbolic manipulations) for solving problems in mathematical analysis (as opposed to discrete mathematics). Examples of the types of problems popular in numerical analysis are special function evaluation, interpolation, root-finding, numerical linear algebra (linear systems of equations, eigenvalue problems, etc.), differential equations, optimization, and quadrature problems. Numerical analysis does not assume or require the existence of a computer to perform calculations. Approximation errors due to rounding, truncation, or discretization would still exist, and the study of their propagation through stability of the problem and the algorithm would remain important. 2

Scientific computing generally refers to the solution of a mathematical model by means of a computer. Naturally, numerical analysis is fundamentally important to scientific computing. Scientific computing is often used synonymously with computational science. Computational science is about using computation, as opposed to theory and experiment, to do science. I view computational science as a superset of scientific computing and would use it more synonymously with a holistic (as opposed to literal) interpretation of simulation, i.e., constructing a mathematical model of a system of interest, simulating it (i.e., solving the model using a computer), and interpreting or otherwise quantitatively analyzing its output. Regardless of the nuances, all three concepts are distinct from computer science, which is more about the study of computers, how computations are performed, and information processing. 3

The need for more computing Historically, science has been based on the paradigms of theory and experiment. However, with the extraordinary and relatively recent increase in computing power, a third scientific paradigm has emerged: simulation. Simulation allows scientists to test theories without the need for experiments. This is especially important when experiments are expensive, dangerous, or simply impossible. Simulation can also reduce the number of experiments needed to validate a theory. Similar to experiment, simulation can generate data that can inform theory. Industry uses simulation to make informed decisions in the absence of complete theory or observational data but also to reduce the time required to go from design to prototype to product. 4

The need for more computing The natural progression of scientific discovery and industrial research is to continually seek out harder and harder problems to study. We have already reached a stage where computing is all around us: robots, cars, weather prediction, GPS, google,..., all require significant amounts of highlevel computation to function. As of November 2013, the fastest 1 supercomputer (the Tianhe-2 (MilkyWay-2) computer, based on Intel Xeon E5-2692 12C 2.2000 GHz / Intel Xeon Phi 31S1P with 3,120,000 cores developed by China s National University of Defense Technology) was clocked at about 34 PFlops ( sustained ); its theoretical peak is about 55 PFlops. This may sound like a lot of flops per second, but it may not be as much as you think. 1 according to the maximal LINPACK performance achieved 5

The need for more computing Example: A virtual human heart. The human heart has approximately 10 10 muscle cells. A model of the human heart will require approximately 100 variables per cell. Suppose it takes 1000 flops to advance the state of each variable by 1 microsecond of simulated time. Then a microsecond of simulated time needs 10 15 flops. The bad news: The fastest supercomputer can only do about 10 10 flops per microsecond of real time. The good news: This is a bit of a worst-case analysis (total flop count for a microsecond of a useful virtual heart might be closer to 10 11 ), and computers are still getting faster (exaflop computing). We might see a useful virtual heart this decade. 6

Projected Performance Development Source: www.top500.org 7

The need for more computing Other examples: Climate modelling Natural disaster prediction / mitigation Protein folding; drug discovery Energy research Data analysis (genomics, particle physics, astronomy, ocean research, search engines) 8

The economics of CPU technology Technology is governed by economics. The rate at which a computer processor can execute instructions (e.g., floating-point operations) is proportional to the frequency of its clock. However, power (voltage) 2 (frequency), voltage frequency, = power (frequency) 3 e.g., a 50% increase in performance by increasing frequency requires 3.4 times more power. By going to 2 cores, this performance increase can be achieved with 16% less power 2. 2 This assumes you can make 100% use of each core. 9

Physical limits of CPU technology Suppose we wanted to build a single CPU capable of 1 TFlop (= 10 12 flops) per second; e.g., add vectors x, y with 10 12 elements and store in vector z. This requires 3 10 12 registers per second. copies between memory and If data can travel at the speed of light (3 10 8 m/s), then the average distance of a word of memory to the CPU must be 10 4 m. Suppose we have a square grid with the CPU at the centre. The size of the square can be 2 10 4 m. A row of memory contains 3 10 12 = 3 10 6 words. So each word must fit into 2 10 4 3 10 6 10 10 m. This is the size of a relatively small atom! 10

Physical limits of CPU technology In other words, unless we can figure out how to represent a 32- (or 64-) bit word with a single atom, it will be impossible to build such a computer. Transistors are already getting so small that (undesirable) quantum effects (such as electron tunnelling) are becoming non-negligible. No matter which way you look at it, increasing frequency is no longer an option! When increasing frequency was possible, software got faster because hardware got faster. Hardware is in some sense still getting better, but increases in performance are only possible nowadays by exploiting parallelism. The only software that will survive will be that which takes advantage of parallel processing. 11

When to parallelize Not every problem needs to be parallelized in order to find its solution. There are only two scenarios in which parallelization makes sense as a way to help solve a problem: 1. The problem is too big to fit into the memory of one computer. 2. The problem takes too long to run. The goal in both of these scenarios can be described as reducing the amount of (real) time it takes to get the solution 3. Note that this is not the same as reducing the overall amount of computation. 3 In the first scenario, the original time required can be viewed as infinite (no such single computer exists) or indefinite (until you acquire a new computer). 12

The challenge of parallel computing Even if we had an unlimited processors and memory, there are still many critical issues to deal with. Designing and implementing suitable interconnection networks for processors and memory modules. Designing and implementing system software. Devising algorithms and data structures. Usefully dividing algorithms and data structures into sub-problems. Identifying how the sub-problems will communicate with each other. Assigning sub-problems to processors and memory modules. In this course, we will be mostly concerned with the last 4 items. 13

The need for parallel computing Parallel programming and parallel computers allow us to take better advantage of computing resources to solve larger and more memory-intensive problems in less time than is possible on a single serial computer. Trends in computer architecture towards multi-core and accelerated systems make parallel programming and HPC a necessary practice going forward. It is already difficult to do leading-edge computational science without parallel computing, and this situation can only get worse in the future. Computational scientists who ignore parallel computing do so at their own risk! 14

Summary Parallel computing is here to stay. Parallel programming will become the prevalent mode of programming in the very near future. Leading-edge computational science and technology must take advantage of parallel computers and parallel programs. 15