Current Trends in High Performance Computing
|
|
- Mary Bennett
- 6 years ago
- Views:
Transcription
1 Current Trends in High Performance Computing Chokchai Box Leangsuksun, PhD SWEPCO Endowed Professor*, Computer Science Director, High Performance Computing Initiative Louisiana Tech University 1 *SWEPCO endowed professorship is made possible by LA Board of Regents Outline What is HPC? Current Trends More on PS3 and GPU computing Conclusion 12 December
2 Mainstream CPUs CPU speed plateaus 3-4 Ghz More cores in a single chip Dual/Quad core is now Manycore (GPGPU) Traditional Applications won t get a free rides Conversion to parallel computing (HPC, MT) 3-4 Ghz cap This diagram is from no free lunch article in DDJ 12 December New trends in computing Old & current SMP, Cluster Multicore computers Intel Core 2 Duo AMD 2x 64 Many-core accelerators GPGPU, FPGA, Cell More Many brains in one computer Not to increase CPU frequency Harness many computers a cluster computing 12/12/11 4 2
3 What is HPC? High Performance Computing Parallel, Supercomputing Achieve the fastest possible computing outcome Subdivide a very large job into many pieces Enabled by multiple high speed CPUs, networking, software & programming paradigms fastest possible solution Technologies that help solving non-trivial tasks including scientific, engineering, medical, business, entertainment and etc. Time to insights, Time to discovery, Times to markets 12 December Parallel Programming Concepts Conventional serial execution where the problem is represented as a series of instructions that are executed by the CPU Problem CPU Parallel execution of a problem involves partitioning of the problem into multiple executable parts that are mutually exclusive and collectively exhaustive represented as a partially ordered set exhibiting concurrency. Problem Task Task Task Task instructions Parallel computing takes advantage of concurrency to : Solve larger problems with less time Save on Wall Clock Time Overcoming memory constraints CPU CPU CPU CPU Utilizing non-local resources 6 Source from Thomas Sterling s intro to HPC 12 December instructions 3
4 HPC Applications and Major Industries Finite Element Modeling Auto/Aero Fluid Dynamics Auto/Aero, Consumer Packaged Goods Mfgs, Process Mfg, Disaster Preparedness (tsunami) Imaging Seismic & Medical Finance & Business Banks, Brokerage Houses (Regression Analysis, Risk, Options Pricing, What if, ) Wal-mart s HPC in their operations Molecular Modeling Biotech and Pharmaceuticals Complex Problems, Large Datasets, Long Runs This slide is from Intel presentation Technologies for Delivering Peak Performance on HPC and Grid Applications 12 December HPC Drives Knowledge Economy 12/12/11 8 4
5 Life Science Problem an example of Protein Folding Take a computing year (in serial mode) to do molecular dynamics simulation for a protein folding problem Excerpted from IBM David Klepacki s The future of HPC 12 December 2011 Petaflop = a thousand trillion floating point operations per second 9 Disaster Preparedness - example Project LEAD Severe Weather prediction (Tornado) OU leads. HPC & Dynamically adaptation to weather forecast Professor Seidel s LSU CCT Hurricane Route Prediction Emergency Preparedness Accuracy of prediction 1 Mile 2 = $1 M 12 December
6 HPC accelerates a product FE analysis on 1 CPU 1,000,000 elements Numerical processing for 1 element =.1 secs One computer will take 100,000 secs = 27.7 hrs Says 100 CPUs.27 hr ~ 16 mins 12 December Avian Flu Pandemic Modeled on a Supercomputer MIDAS (Models of Infectious Disease Agent Study) program The large-scale, stochastic simulation model examines the nationwide spread of a pandemic influenza virus strain A simulation starts with 2 passengers with contaminated AF arriving LAX The simulation rolls out a city-city and census-tract-level picture of the spread of infection a synthetic population of 281 million people over the course of 180 days It is a very large scale and complex multi-variant 12 December
7 Avian Flu Pandemic (90 days) Timothy C. Germann, Kai Kadau, Catherine A. Macken (Los Alamos National Laboratory); Ira M. Longini Jr. (Emory University) Source from 12 December Avian Flu Pandemic (II) The results show that advance preparation of a modestly effective vaccine in large quantities appears to be preferable to waiting for the development of a well-matched vaccine that may be too late. The simulation models a synthetic population that matches U.S. census demographics and worker mobility data by randomly assigning the simulated individuals to households, workplaces, schools, and the like. The models serve as virtual laboratories to study how infectious diseases and what intervention strategies are more effective Run on the Los Alamos supercomputer known as Pink, a 1,024-node (2,048 processor) LinuxBIOS/Bpro with 2 GB/ node. Source from 12 December
8 Significant indicators why HPC now? Main stream computers with multi-cores (Intel or AMD) In past 1-2 years, CPU speed was flatten at 3+ Ghz More CPUs in one chip Dual core, multi-core chips Traditional software won t take advantage of these new processors Personal/Desktop Supercomputing. Many real problems are highly computational intensive. NSA uses supercomputing to do data mining DOE fusion, plasma, energy related (including weaponry). Help solving many other important areas (nanotech, life science etc.) Product design, ERM/Inventory Management Giants recently sneeze out HPC Bush s state of union speech 3 main S&T focus of which Supercomputing is one of them Bill Gates keynote speech at SC05 MS goes after HPC Google search engine - 100,000 nodes Playstation 3 is a personal supercomputing platform Hollywood (Entertainment) is HPC-bound (Pixar more than 3000 CPUs to render animation) 12 December HPC preparedness Build work forces that understand HPC paradigm & its applications HPC/Grid Curriculum in IT/CS/CE/ICT Offer HPC-enabling tracks to other disciplinary (engineering, life science, physic, computational chem, business etc..) Training business community Bring awareness to public National and strategic policies Improve Infrastructure 12 December
9 Pause here Switch to a tour of machine rooms Clusters, our Lab to show what they will be using.. Get students info on signup sheet for accounts on our clusters (azul, quadcore, GPU and PS3). Intro to Linux Then continue on HPC101 12/12/11 17 HPC December
10 How to Run Applications Faster? There are 3 ways to improve performance: Work Harder Work Smarter Get more Help Computer Analogy Using faster hardware Optimized algorithms and techniques used to solve computational tasks Multiple computers to solve a particular task 12 December Parallel Programming Concepts Problem Task Task Task Task instructions CPU CPU CPU CPU Source from Thomas Sterling s intro to HPC 12 December
11 HPC objective High Performance Computing Parallel, Supercomputing Achieve the fastest possible computing outcome Subdivide a very large job into many pieces Enabled by multiple high speed CPUs, networking, software & programming paradigms fastest possible solution Technologies that help solving non-trivial tasks including scientific, engineering, medical, business, entertainment and etc. 12 December Flynn s Taxonomy of Computer Architectures l SISD - Single Instruction/Single Data l SIMD - Single Instruction/Multiple Data l MISD - Multiple Instruction/Single Data l MIMD - Multiple Instruction/Multiple Data 22 11
12 Single Instruction/Single Data PU Processing Unit Your desktop, before the spread of dual core CPUs Slide Source: Wikipedia, Flynn s Taxonomy 23 Flavors of SISD Instructions: 24 12
13 More on pipelining 25 Single Instruction/Multiple Data Processors that execute same instruction on multiple pieces of data: NVIDIA GPUs Slide Source: Wikipedia, Flynn s Taxonomy 26 13
14 Single Instruction/Multiple Data l l Each core runs the same set of instructions on different data Example: l GPGPU: processes pixels of an image in parallel Slide Source: Klimovitski & Macri, Intel 27 SISD versus SIMD Writing a compiler for SIMD architectures is VERY difficult (inter-thread communication complicates the picture ) Slide Source: ars technica, Peakstream article 28 14
15 Multiple Instruction/Single Data Pipe line : CMU Warp machine. Slide Source: Wikipedia, Flynn s Taxonomy 29 Multiple Instruction/Multiple Data e.g. Multicore systems were based on a MIMD architecture + programming paradigm Such as openmp, multithreads Slide Source: Wikipedia, Flynn s Taxonomy 30 15
16 Multiple Instruction/Multiple Data l The sky is the limit: each PU is free to do as it pleases l Can be of either shared memory or distributed memory categories Instructions: 31 Current HPC Hardware Traditionally HPC has adopted expensive parallel hardware: Massively Parallel Processors (MPP) Symmetric Multi-Processors (SMP) Cluster Computers Recent trends in HPC Multicore systems Heterogeneous Computing with Accelerator Boards (GPGPU, FPGA) 12 December
17 HPC cluster Login Compile Submit job At least 2 connections Run tasks 12 December Parallel Programming Env Parallel Programming Environments and Tools Threads (PCs, SMPs, NOW..) POSIX Threads Java Threads MPI Linux, NT, on many Supercomputers OpenMP (predominantly on SMP) PVM (old) UPC, Co-array Fortran CUDA, Brooks+, opencl Software DSMs (Shmem) Compilers RAD (rapid application development tools) Debuggers Performance Analysis Tools Visualization Tools 12 December
18 Recent Trends in HPC Hardware Multicore & Manycore are now. Multi CPUs in a single die Better power consumption tightly couple and better for multi-threading GPGPU As a build blocks for a much larger system New Top 500 HPC systems - clusters of multi-core & GPGPU 12 December What are HPC systems 12/12/
19 Current top 5 systems 12/12/11 37 Shared vs Distributed Memory 12/12/
20 Shared memory Global memory space, accessible by all processors Processors may have local memory to hold copies of some global memory. Consistency of copies is usually maintained by hardware (cache coherency) 12/12/11 39 Two typical classes of SM Uniform Memory Access (UMA): Equal access times identical processors typically represented by Symmetric Multi- processor Machines (SMP) or Multicores Non-Uniform Memory Access (NUMA): Memory access times are not uniform, memory access across a link is slower Often made by physically linking two or more SMPs or heterogeneous computing 12/12/
21 Advantage & Disadvantage Global address space is user-friendly Data sharing between tasks is fast System may suffer from lack of scalability. Adding CPUs increases traffic on shared memory - to - CPU path. This is especially true for cache coherent systems Programmer is responsible for correct synchronization Systems larger than an SMP need some specialpurpose components. 12/12/11 41 Distributed Memory 12/12/
22 Multicores Three multicore classifications Homogeneous Heterogeneous Hybrid 12 December Multicores(I) Homogeneous Cores (a main CPU) All cores are identical A traditional MC with few cores Good for jumbo & few tasks Not as many tasks/threads as accelerators or GPU. E.g. Intel Core2Duo, i3, i5, i7, AMD Programming Multithreads/openMP 12 December
23 Multicores(II) Homogeneous Cores as accelerator or compute device Need a main CPU system As attached processing units All cores are identical and many Good for many SIMD tasks/threads E.g. NVIDIA GPGPU, Clearspeed FPGA Programming library calls from a main program or a new language extension, e.g. CUDA 12 December Multicores(III) Heterogeneous Cores All cores are NOT identical All in one die Programming is more difficult See more in PS3 presentation 12 December
24 Multicores(IV) Hybrid System Mix between host cores & accelerator cores A typical host can be a desktop to server system, e.g. Intel or AMD Accelerator NVDIA, ATI Stream or FPGA Programming model is more complex Issues memory bandwidth between host vs. devices 12 December Introduction to Cell BE (PS3) Programming HPCI: High Performance Computing Initiative 24
25 PS3 - awesome HPC system IBM Cell processor Affordable But currently not many tools 12 December Cell BE Architecture PowerPC Processor Element Main Processor 64 bit Also support Vector/SIMD Run the OS, Manage SPE 12 December 2011 Synergistic Processor Element 128-bit RISC, SIMD processor 256 KB local storage memory Use DMA to transfer data between local storage and main memory Picture ref: 25
26 Cell Programming IBM Cell SDK Main Process run on PPE Threads run on SPEs PPE Centric programming paradigm PPE process SPE thread SPE thread SPE thread December 2011 GPGPU General Purpose Graphic Processing Unit 12/12/
27 Two major players Parallel Computing on a GPU NVIDIA GPU Computing Architecture Via a HW device interface In laptops, desktops, workstations, servers 8-series GPUs deliver 50 to 500 GFLOPS on compiled parallel C applications Tesla T from 1-4 TFLOPS GPU parallelism is better than Moore s law, more doubling every year GPGPU is a GPU that allows user to process both graphics and non-graphics applications. Tesla D870 GeForce 8800 David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, UrbanaChampaign 27
28 NVIDIA GeForce 8800 (G80) the eighth generation of NVIDIA s GeForce graphic cards. High performance CUDA-enabled GPGPU 128 cores Memory MB or 1.5 GB in Tesla High-speed memory bandwidth Supports Scalable Link Interface (SLI) NVIDIA Tesla TM Feature GPU Computing for HPC No display ports Dedicate to computation For massively Multi-threaded computing Supercomputing performance 28
29 NVIDIA Tesla Card >> C-Series(Card) = 1 GPU with 1.5 GB D-Series(Deskside unit) = 2 GPUs S-Series(1U server) = 4 GPUs Note: 1 G80 GPU = 128 cores = ~500 GFLOPs 1 T10 = 240 cores = 1 TFLOPs << NVIDIA G80" David Kirk/ NVIDIA and Wen-mei This slide is from NVDIA CUDA tutorial 29
30 GPGPU Programming with CUDA CUDA (Compute Unified Device Architecture) is a SDK and API that allow a programmer to write C and Fortran programs to execute on GPGPU. Works with NVIDIA G80 or later and Tesla The GPGPU is viewed as a compute device ATI Stream (1) 12/12/
31 ATI /12/11 61 ATI 4870 X2 12/12/
32 Architecture of ATI Radeon 4000 series This slide is from ATI presentation 32
33 This slide is from ATI presentation Introduction to Open CL Toward new approach in Computing Moayad Almohaishi 33
34 Introduction to opencl OpenCL stands for Open Computing Language. It is from consortium efforts such as Apple, NVDIA, AMD etc. The Khronos group who was responsible for OpenGL. Take 6 months to come up with the specifications. OpenCL 1. Royalty-free. 2. Support both task and data parallel programing modes. 3. Works for vendor-agnostic GPGPUs 4. including multi cores CPUs 5. Works on Cell processors. 6. Support handhelds and mobile devices. 7. Based on C language under C99. 34
35 OpenCL Can make query on available devices and build an context of the available devices. Programmers would be able to program more freely for any kind of device. Applications are more resuable even if the hardware changed in the future. 35
36 OpenCL Platform Model CPUs+GPU platforms 12/12/
37 Performance of GPGPU Note: A cluster of dual Xeon 2.8GZ 30 nodes, Peak performance ~336 GFLOPS David Kirk/NVIDIA and Wen-mei W. Hwu, 37
38 Last words! HPC or Supercomputing system is not necessarily gigantic in a big machine room but is accessible for Thais and may now be sitting next to your desk Computing is necessity and Fast computing provides competitive edge, esp Knowledge Economy New trends of HPC includes GPGPU, various multicore architecture Prepare ourselves and strengthen our S&T, and industry as well business community for this phenomenon (HPC goes mainstream) before too late. 12 December Back up slides 12/12/
39 Cancer Gene-mining Unsuccessful on a uni-processor Our approach Novel parallel gene-mining algorithms Input from microarray Retain accuracy Significantly speed up (superlinear) IBM P5 supercomputer (128 node PPC). Time taken(in secs) Time to run the algorithm, keeping number of nodes fixed Number of processors Bladder 100 Mesothelioma Breast Renal Leukemia Prostate 0 Lung Pancreas Colorectal Ovary Lymphoma Melanoma OvaMarker based Selection GeneSetMine based Selection 12 December Drug Delivery By WU & Palmer, Louisiana Tech U Assisted by HPCI A study of microcapsules for drug delivery. Computational Fluid Dynamics methodology to model the generation of droplets or cores (using alginate and oil) Goal: better understanding process parameters needed for generating cores of homogeneous size for the manufacturing of microcapsules. 12 December
40 Droplet Generation: Experimental Procedure 12 December Droplet Generation: Example Results Case 1: Olive oil: Density 930 kg/m3 Viscosity 0.03 kg/m-s Alginate: Density 1012 kg/m3 Viscosity kg/m-s Case 2: Phase 1: Density 918 kg/m3 Viscosity kg/m-s Phase 2: Density kg/m3 Viscosity kg/m-s 12 December 2011 Source from wu s thesis 80 40
High Performance Computing
GPGPU A Current Trend in High Performance Computing Chokchai Box Leangsuksun, PhD SWEPCO Endowed Professor*, Computer Science Director, High Performance Computing Initiative Louisiana Tech University box@latech.edu
More informationA General Discussion on! Parallelism!
Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy of
More informationA General Discussion on! Parallelism!
Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware!! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationGraphics Processor Acceleration and YOU
Graphics Processor Acceleration and YOU James Phillips Research/gpu/ Goals of Lecture After this talk the audience will: Understand how GPUs differ from CPUs Understand the limits of GPU acceleration Have
More informationParallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor
Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationBox s 1 minute Bio l B. Eng (AE 1983): Khon Kean University
CSC469/585: Winter 2011-12 High Availability and Performance Computing: Towards non-stop services in HPC/HEC/Enterprise IT Environments Chokchai (Box) Leangsuksun, Associate Professor, Computer Science
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationHow to Write Fast Code , spring th Lecture, Mar. 31 st
How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationAdministrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting
CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi Meeting Monday and Wednesday 6:00 7:30pm Moore 212 Recorded lectures upon
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationBİL 542 Parallel Computing
BİL 542 Parallel Computing 1 Chapter 1 Parallel Programming 2 Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion,
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationSaman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology
Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology http://cag.csail.mit.edu/ps3 6.189-chair@mit.edu A new processor design pattern emerges: The Arrival of Multicores MIT Raw 16 Cores
More informationMaster Informatics Eng.
Advanced Architectures Master Informatics Eng. 2018/19 A.J.Proença Data Parallelism 3 (GPU/CUDA, Neural Nets,...) (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2018/19 1 The
More informationUsing Graphics Chips for General Purpose Computation
White Paper Using Graphics Chips for General Purpose Computation Document Version 0.1 May 12, 2010 442 Northlake Blvd. Altamonte Springs, FL 32701 (407) 262-7100 TABLE OF CONTENTS 1. INTRODUCTION....1
More informationParallel Computing Why & How?
Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel
More informationLet s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.
Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow. Big problems and Very Big problems in Science How do we live Protein
More informationParallel and Distributed Computing
Parallel and Distributed Computing NUMA; OpenCL; MapReduce José Monteiro MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer Science and Engineering
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationComplexity and Advanced Algorithms. Introduction to Parallel Algorithms
Complexity and Advanced Algorithms Introduction to Parallel Algorithms Why Parallel Computing? Save time, resources, memory,... Who is using it? Academia Industry Government Individuals? Two practical
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationParallel Computing Introduction
Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices
More informationAn Introduction to Parallel Programming
Dipartimento di Informatica e Sistemistica University of Pavia Processor Architectures, Fall 2011 Denition Motivation Taxonomy What is parallel programming? Parallel computing is the simultaneous use of
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationCOMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES
COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:
More informationCS 668 Parallel Computing Spring 2011
CS 668 Parallel Computing Spring 2011 Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 11-1 MW or by appointment Tel: 513-556-1807 Meeting: TuTh 2:00-3:25 in RecCenter 3240 Lecture
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationGPU for HPC. October 2010
GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,
More informationThe MOSIX Scalable Cluster Computing for Linux. mosix.org
The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What
More informationAn Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento
More informationParallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein
Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationHigh Performance Computing (HPC) Introduction
High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More information3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:
BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort
PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class
More informationIntroduction. CSCI 4850/5850 High-Performance Computing Spring 2018
Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationTHREAD LEVEL PARALLELISM
THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationHigh Performance Computing Course Notes HPC Fundamentals
High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationMoore s Law. Computer architect goal Software developer assumption
Moore s Law The number of transistors that can be placed inexpensively on an integrated circuit will double approximately every 18 months. Self-fulfilling prophecy Computer architect goal Software developer
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationIntroduction to GPU computing
Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU
More informationHigh Performance Computing Course Notes Course Administration
High Performance Computing Course Notes 2009-2010 2010 Course Administration Contacts details Dr. Ligang He Home page: http://www.dcs.warwick.ac.uk/~liganghe Email: liganghe@dcs.warwick.ac.uk Office hours:
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationCurrent Trends in Computer Graphics Hardware
Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)
More informationMulti-Processors and GPU
Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock
More informationParallelism and Concurrency. COS 326 David Walker Princeton University
Parallelism and Concurrency COS 326 David Walker Princeton University Parallelism What is it? Today's technology trends. How can we take advantage of it? Why is it so much harder to program? Some preliminary
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationCDA3101 Recitation Section 13
CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces
More informationThe Use of Cloud Computing Resources in an HPC Environment
The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes
More informationChapter 1: Introduction to Parallel Computing
Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky
More informationIntroduction to Parallel Processing
Babylon University College of Information Technology Software Department Introduction to Parallel Processing By Single processor supercomputers have achieved great speeds and have been pushing hardware
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationMANY-CORE COMPUTING. 7-Oct Ana Lucia Varbanescu, UvA. Original slides: Rob van Nieuwpoort, escience Center
MANY-CORE COMPUTING 7-Oct-2013 Ana Lucia Varbanescu, UvA Original slides: Rob van Nieuwpoort, escience Center Schedule 2 1. Introduction, performance metrics & analysis 2. Programming: basics (10-10-2013)
More information