Tools for Change David E. Bernholdt Oak Ridge National Laboratory
|
|
- Josephine Moody
- 6 years ago
- Views:
Transcription
1 Tools for Change David E. Bernholdt Group Leader, Computer Science Research Computer Science and Mathematics Division and National Center for Computational Sciences Oak Ridge National Laboratory
2 Acknowledgements COMPOSE-HPC People: Galois: Matt Sottile LLNL: Tom Epperly, Tammy Dahlgren, Adrian Prantl ORNL: David Bernholdt, Wael Elwasif, Samantha Foley PNNL: Daniel Chavarria, Sriram Krishnamoorthy, Ajay Panyala SNL: Rob Armstrong, Ben Allan, Geoff Hulette Funding: DOE/ASCR X-Stack Software Research (2010) Hercules/Klonos People: ORNL: Oscar Hernandez, Christos Kartsaklis, Chung-Hsing Hsu, Wayne Joubert, Rich Graham U Houston: Barbara Chapman, Wei Ding (now ORNL) Funding: ORNL Director s R&D Fund Oak Ridge Leadership Computing Facility Oak Ridge Associated Universities This work was performed in part at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract No. DE-AC05-00OR
3 Stability is Good Since the mid-1990s, supercomputing has been dominated by one basic architecture Commodity CPU (especially x86) Deep memory hierarchy (multiple caches, DRAM) Interconnect (distributed memory, explicit messaging) External storage system (after some flirtation with node-local disk) Good for software productivity Increasingly well-understood abstract machine model Understand how to design SW for such machines 3
4 Stable Systems Resist Change The end of Dennard scaling (constant power density as transistors get smaller) triggered several new trends CPU clocks stopped getting faster Instead, increase parallelism to get more aggregate performance Shift to multi-core CPUs to continue leveraging Moore s Law Most applications adapted without radical shifts in the abstract machine model they were targeting Treat cores like CPUs one MPI rank per core Expose more parallelism: finer decompositions, larger problems Some changes start to appear, around the margins Additional level of parallelism (threading) More complexity in memory (NUMA domains) 4
5 We re Approaching a Breaking Point Recent trends Multi-core many-core Treating cores like CPUs leads to resource contention elsewhere Cache, memory bandwidth, memory capacity, etc. Adding accelerators into the mix Multiple architectures (GPUs, Phi, others ) Separate (PCI) bus or integrated? Separate memory or integrated? Interconnects are just devices to access remote memory (RDMA) Coming attractions Many-core many-many-core Greater heterogeneity Processing in memory Integrated NIC Multi-level networks (on-chip, inrack, between racks) Multi-level parallelism Event/task parallelism to deal with massive parallelism Power, power, power Reliability, reliability, reliability 5
6 Stability Has Left Us Unprepared We are entering a period of high flux and diversity in HPC architectures Applications will have to be re-thought, or at least adapted to coming architectures We are not prepared to deal with the changes that will be necessary Mentally Tool-wise Detrimental to software productivity Need agility Need tools that support agility 6
7 Scientific Applications are Constantly Evolving Porting to new architectures Extending scientific capabilities Improving quality or performance Example code changes: Updating library calls Changing data structures Platform-specific optimizations How do you transform the code? Manual editing? Sed, grep? Perl or python scripts? Simple text processing tools don t respect programming language semantics! COMPOSE-HPC focused on creating a tool chain to allow application developers to specify and apply languageaware transformations Hercules focuses on a compiler-based infrastructure to find patterns and apply transformations and optimizations; build knowledge-bases about code 7
8 KNOT: A Language-Aware Transformation Tool Chain COMPOSE-HPC KNOT-Based Custom Transformation Tool PAUL ROTE BRAID Annotated Source Code Annotation Language Transformations Generation/ Optimizations Defines Defines Defines Transformed Source Code Annotates Original Source Code Application or Tool Developer Compiler 8
9 PAUL: Annotation Parser COMPOSE-HPC Customizable, reusable, application-specific directives for parameterizing transformation tools Can be used with ROTE to guide rewrite tools with application-specific information, or can be used alone to add structured annotation facility to ROSE-based tools "I am an annotation inside a comment!" "I change the code in this way" /* %GUARD cond = NonZero arg = x onerrorcall = failwith errorcallarg = "x = 0" */ double foo(double x, double y) { return y / x; } double foo(double x, double y) { if(x!= 0) { failwith("x = 0"); } return y / x; } "Based on these parameters" Transformation (ROTE) "With these values" 9
10 ROTE: Retargetable Open Transformation Engine Internal Structure ROTE COMPOSE-HPC Source Code C, C++, Fortran ROSE Minitermite (src2term --stratego) Abstract Syntax Tree (AST) Term Representation Rewrite Rules Stratego/XT Minitermite (term2src --stratego) ROSE Transformed Term Representation Transformed AST Transformed Source Code C, C++, Fortran 10
11 A Simple Example of a Rewrite Rule COMPOSE-HPC R1 : multiply_op( add_op(stratego_a,stratego_b, binary_op_annotation(type_int, preprocessing_info(list)),gen_info()), STRATEGO_c, binary_op_annotation(type_int, preprocessing_info(list)),gen_info()) -> x = (a+b)*c; add_op( multiply_op(stratego_a,stratego_c, binary_op_annotation(type_int, preprocessing_info(list)),gen_info()), multiply_op(stratego_b,stratego_c, binary_op_annotation(type_int, preprocessing_info(list)),gen_info()), binary_op_annotation(type_int, preprocessing_info(list)),gen_info()) x = a*c + b*c; Color Legend: syntactic structure, variables, types, source-to-source location info, ROSE AST annotations 11
12 Structural Differencing to Yield Patches COMPOSE-HPC Goal: Infer rewrite rules from examples of transformation int foo(int a, int b, int c) { int x; x = (a+b)*c; return x; } int foo(int a, int b, int c) { int x; x = a*c + b*c; return x; } Stratego rewrite rule inferred from example code R1 : multiply_op(add_op(stratego_a,stratego_b,binary_op_annotation(type_int,preprocessing_info(list)),gen_info()),st RATEGO_c,binary_op_annotation(type_int,preprocessing_info(LIST)),gen_info()) -> add_op(multiply_op(stratego_a,stratego_c,binary_op_annotation(type_int,preprocessing_info(list)),gen_info()),mul tiply_op(stratego_b,stratego_c,binary_op_annotation(type_int,preprocessing_info(list)),gen_info()),binary_op_ann otation(type_int,preprocessing_info(list)),gen_info()) strategies 12
13 Woven Representation for Differencing COMPOSE-HPC 13
14 Turning Structural Differences into Rewrite Rules COMPOSE-HPC R1 : multiply_op(add_op(stratego_a,stratego_b,binary_op_annotation(type_int,preprocessing_info(list)),gen_info()),st RATEGO_c,binary_op_annotation(type_int,preprocessing_info(LIST)),gen_info()) -> add_op(multiply_op(stratego_a,stratego_c,binary_op_annotation(type_int,preprocessing_info(list)),gen_info()),mul tiply_op(stratego_b,stratego_c,binary_op_annotation(type_int,preprocessing_info(list)),gen_info()),binary_op_ann otation(type_int,preprocessing_info(list)),gen_info()) strategies 14
15 BRAID Customized Code Generation COMPOSE-HPC SIDL Parser Paul Annotations (under development) Rote Bridge Declarative IR Based on SIDL template-driven / rule-based transformations term-based Languageindependent IR optimization typemaps Code generators C C++ FORTRAN 77 Fortran 90/95 Fortran 2003/08 Python Java Chapel IR Intermediate Representation 15
16 Building on KNOT: Transformation Tools (for Exascale) COMPOSE-HPC Language interoperability (Chapel w/ C, C++, Fortran, Java) Transformation of NWChem dynamic load balancing using Global Array Toolkit shared counter to use TASCEL task pools with work stealing Porting to accelerator-based versions of libraries Selective instrumentation to facilitate simulation of exascale systems Composing code with different preferences for MPI processes vs threads Interface contracts Verify code after transformations Sanity checks for resilience (detect silent errors) 16
17 17 Porting to New Programming Models NWChem SCF uses global shared counter for dynamic load balancing Replace with TASCEL task pools with work stealing for load balancing Use a series of four KNOT-based transformations 1. Separate obtaining next task and what the ID means 2. Localize task enumeration 3. Locally filter for sparsity 4. Introduce TASCEL calls to queue tasks Parallel Efficiency Parallel Efficiency Base Tascel # Cores 1 Base Tascel Be atoms 352 Be atoms COMPOSE-HPC # Cores
18 Hercules Hercules A code translation tool that systematically helps to find patterns and transform codes in applications. Distinctive features: Infrastructure to manage program analysis information to facilitate the understanding of the application Automates the process of applying transformations multiple times throughout the code base Separation of concerns: application science vs. optimizations Documents the transformation process done by scientists Reusability of transformation workflow Implementation leverages compiler infrastructure, can combine transformations and optimizations 18
19 Architecture and Implementation Hercules 19
20 Workflow and Example Hercules 20
21 HScan: Pattern Detection Hercules HScan can scan a code base for patterns Identify interesting points General scan of code base for exploration/understanding Patterns of interest selected by users at invocation time Pattern library (currently) defined by Hercules developers Examples Detection of stencils $ bash-3.2$ hscan --stm-stencil --sln-triangular application.f90 hscan: (V) --stm-stencil: in application.f90::test02_: array B (regular 1d stencil of 3 points) hscan: (V) --stm-stencil: in application.f90::test11_: array A (regular 2d stencil of 5 points) hscan: (V) --stm-stencil: in application.f90::test11_: array A (regular 2d stencil of 4 points) hscan: (V) --sln-triangular: in application.b::test07_: loops at lines 72 & 74. Patterns for GPU porting in CAM/SE climate code hscan: (V) --sl-aos: in aquaplanet.b::aquaplanet_init_state.in.aquaplanet: in loops at line 542; arrays: [ELEM] hscan: (V) --sl-disjoint-reads-writes: in aquaplanet.b::aquaplanet_init_state.in.aquaplanet: loop at line 492; rset=[qve,tme], wset=[elem] hscan: (V) --sl-parallelizable:in aquaplanet.b::aquaplanet_forcing.in.aquaplanet: loop at line 147. hscan: (V) --sl-parallelizable: in aquaplanet.b::aquaplanet_init_state.in.aquaplanet: loop at line
22 Spin-Off: Klonos Similarity Analysis CAM/SE climate code Simulation Code KLONOS System Facilitates the porting of applications to new platforms Use supercomputers to analyze applications and create a porting plan Saves months of manual inspections of codes NCCS supercomputers Porting Plan To OLCF Systems Tree Families of Similar codes Hercules Source Code Similarity Among CAM/SE Procedures Analysis requires 1000s computational hours, 2TB of Memory. Classification of Procedures Degree of Similarity Procedures Procedures 22
23 Summary and Conclusions The HPC community lacks robust, accessible tools to systematically transform code for refactoring and other purposes COMPOSE-HPC and Hercules begin to address this Challenging (interesting) underlying problems So far incomplete, but promising Trying to get the compiler expert out of the loop Potential to capture and share transformations in a systematic and reusable form Using novel measures of similarity to compare and comprehend code bases 23
Early Experiences Writing Performance Portable OpenMP 4 Codes
Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationA More Realistic Way of Stressing the End-to-end I/O System
A More Realistic Way of Stressing the End-to-end I/O System Verónica G. Vergara Larrea Sarp Oral Dustin Leverman Hai Ah Nam Feiyi Wang James Simmons CUG 2015 April 29, 2015 Chicago, IL ORNL is managed
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationHPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017
Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationAllowing Users to Run Services at the OLCF with Kubernetes
Allowing Users to Run Services at the OLCF with Kubernetes Jason Kincl Senior HPC Systems Engineer Ryan Adamson Senior HPC Security Engineer This work was supported by the Oak Ridge Leadership Computing
More informationRevealing Applications Access Pattern in Collective I/O for Cache Management
Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationD-TEC DSL Technology for Exascale Computing
D-TEC DSL Technology for Exascale Computing Progress Report: March 2013 DOE Office of Science Program: Office of Advanced Scientific Computing Research ASCR Program Manager: Dr. Sonia Sachs 1 Introduction
More informationToward Improved Support for Loosely Coupled Large Scale Simulation Workflows. Swen Boehm Wael Elwasif Thomas Naughton, Geoffroy R.
Toward Improved Support for Loosely Coupled Large Scale Simulation Workflows Swen Boehm Wael Elwasif Thomas Naughton, Geoffroy R. Vallee Motivation & Challenges Bigger machines (e.g., TITAN, upcoming Exascale
More informationchallenges in domain-specific modeling raphaël mannadiar august 27, 2009
challenges in domain-specific modeling raphaël mannadiar august 27, 2009 raphaël mannadiar challenges in domain-specific modeling 1/59 outline 1 introduction 2 approaches 3 debugging and simulation 4 differencing
More informationPhilip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory
Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory A Tree-Based Overlay Network (TBON) like MRNet provides scalable infrastructure for tools and applications MRNet's
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences Computer Science and Mathematics Power Measurement for High Performance Computing: State of the Art PMP 2011 Chung-Hsing Hsu Steve Poole
More informationPortable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.
Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationBenefits of Programming Graphically in NI LabVIEW
Benefits of Programming Graphically in NI LabVIEW Publish Date: Jun 14, 2013 0 Ratings 0.00 out of 5 Overview For more than 20 years, NI LabVIEW has been used by millions of engineers and scientists to
More informationBenefits of Programming Graphically in NI LabVIEW
1 of 8 12/24/2013 2:22 PM Benefits of Programming Graphically in NI LabVIEW Publish Date: Jun 14, 2013 0 Ratings 0.00 out of 5 Overview For more than 20 years, NI LabVIEW has been used by millions of engineers
More informationLLVM for the future of Supercomputing
LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM Developers' Meeting What is Supercomputing? Computing for large, tightly-coupled problems. Lots of computational
More informationMulticore Computing and Scientific Discovery
scientific infrastructure Multicore Computing and Scientific Discovery James Larus Dennis Gannon Microsoft Research In the past half century, parallel computers, parallel computation, and scientific research
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationTHE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research
THE PATH TO EXASCALE COMPUTING Bill Dally Chief Scientist and Senior Vice President of Research The Goal: Sustained ExaFLOPs on problems of interest 2 Exascale Challenges Energy efficiency Programmability
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationIntroduction to HPC Parallel I/O
Introduction to HPC Parallel I/O Feiyi Wang (Ph.D.) and Sarp Oral (Ph.D.) Technology Integration Group Oak Ridge Leadership Computing ORNL is managed by UT-Battelle for the US Department of Energy Outline
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationTrends and Challenges in Multicore Programming
Trends and Challenges in Multicore Programming Eva Burrows Bergen Language Design Laboratory (BLDL) Department of Informatics, University of Bergen Bergen, March 17, 2010 Outline The Roadmap of Multicores
More informationHSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!
Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2
More informationMPI in 2020: Opportunities and Challenges. William Gropp
MPI in 2020: Opportunities and Challenges William Gropp www.cs.illinois.edu/~wgropp MPI and Supercomputing The Message Passing Interface (MPI) has been amazingly successful First released in 1992, it is
More informationThe Effect of Emerging Architectures on Data Science (and other thoughts)
The Effect of Emerging Architectures on Data Science (and other thoughts) Philip C. Roth With contributions from Jeffrey S. Vetter and Jeremy S. Meredith (ORNL) and Allen Malony (U. Oregon) Future Technologies
More informationEfficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford
Efficiency and Programmability: Enablers for ExaScale Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Scientific Discovery and Business Analytics Driving an Insatiable
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationCompiling and Interpreting Programming. Overview of Compilers and Interpreters
Copyright R.A. van Engelen, FSU Department of Computer Science, 2000 Overview of Compilers and Interpreters Common compiler and interpreter configurations Virtual machines Integrated programming environments
More informationThe Titan Tools Experience
The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD
More informationGrid Services and the Globus Toolkit
Grid Services and the Globus Toolkit Lisa Childers childers@mcs.anl.gov The Globus Alliance Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved. This
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationThe Common Component Architecture: Building Frameworks for Computational Science
The : Building Frameworks for Computational Science David E. Bernholdt Oak Ridge National Laboratory bernholdtde@ornl.gov http://www.cca-forum.org Work supported in part by the Scientific Discovery through
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationTowards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.
Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing
More informationOpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4
OpenACC Course Class #1 Q&A Contents OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC/CUDA/OpenMP Q: Is OpenACC an NVIDIA standard or is it accepted
More informationDRAM and Storage-Class Memory (SCM) Overview
Page 1 of 7 DRAM and Storage-Class Memory (SCM) Overview Introduction/Motivation Looking forward, volatile and non-volatile memory will play a much greater role in future infrastructure solutions. Figure
More informationHETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE
HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE PHILIP HEAH ASSISTANT CHIEF EXECUTIVE TECHNOLOGY & INFRASTRUCTURE GROUP LAUNCH OF SERVICES AND DIGITAL ECONOMY (SDE) TECHNOLOGY ROADMAP (NOV 2018) Source
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationThe Common Communication Interface (CCI)
The Common Communication Interface (CCI) Presented by: Galen Shipman Technology Integration Lead Oak Ridge National Laboratory Collaborators: Scott Atchley, George Bosilca, Peter Braam, David Dillow, Patrick
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationParallelism. CS6787 Lecture 8 Fall 2017
Parallelism CS6787 Lecture 8 Fall 2017 So far We ve been talking about algorithms We ve been talking about ways to optimize their parameters But we haven t talked about the underlying hardware How does
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationResilience Design Patterns: A Structured Approach to Resilience at Extreme Scale
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale Saurabh Hukerikar Christian Engelmann Computer Science Research Group Computer Science & Mathematics Division Oak Ridge
More informationIntroduction. CS 2210 Compiler Design Wonsun Ahn
Introduction CS 2210 Compiler Design Wonsun Ahn What is a Compiler? Compiler: A program that translates source code written in one language to a target code written in another language Source code: Input
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationWorkload Optimized Systems: The Wheel of Reincarnation. Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013
Workload Optimized Systems: The Wheel of Reincarnation Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013 Outline Definition Technology Minicomputers Prime Workstations Apollo Graphics
More informationThe Future of High Performance Computing
The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationEasy Multicore Programming using MAPS
Easy Multicore Programming using MAPS Jeronimo Castrillon, Maximilian Odendahl Multicore Challenge Conference 2012 September 24 th, 2012 Institute for Communication Technologies and Embedded Systems Outline
More informationIBM Power Systems HPC Cluster
IBM Power Systems HPC Cluster Highlights Complete and fully Integrated HPC cluster for demanding workloads Modular and Extensible: match components & configurations to meet demands Integrated: racked &
More informationDistributed & Heterogeneous Programming in C++ for HPC at SC17
Distributed & Heterogeneous Programming in C++ for HPC at SC17 Michael Wong (Codeplay), Hal Finkel DHPCC++ 2018 1 The Panel 2 Ben Sanders (AMD, HCC, HiP, HSA) Carter Edwards (SNL, Kokkos, ISO C++) CJ Newburn
More informationMADNESS. Rick Archibald. Computer Science and Mathematics Division ORNL
MADNESS Rick Archibald Computer Science and Mathematics Division ORNL CScADS workshop: Leadership-class Machines, Petascale Applications and Performance Strategies July 27-30 th Managed by UT-Battelle
More informationHSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!
Advanced Topics on Heterogeneous System Architectures HSA foundation! Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationA Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt Hardware/Software Co-Design, University of Erlangen-Nuremberg SPPEXA-Kolloquium, Erlangen, Germany; July 09, 2014 Challenges
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationInterfacing Chapel with traditional HPC programming languages
Interfacing Chapel with traditional HPC programming languages Shams Imam, Vivek Sarkar Rice University Adrian Prantl, Tom Epperly LLNL 1 Introducing new programming language developed by Cray Inc. as part
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationMinsoo Ryu. College of Information and Communications Hanyang University.
Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based
More informationIntroduction to Computing Systems - Scientific Computing's Perspective. Le Yan LSU
Introduction to Computing Systems - Scientific Computing's Perspective Le Yan HPC @ LSU 5/28/2017 LONI Scientific Computing Boot Camp 2018 Why We Are Here For researchers, understand how your instrument
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationIntel High-Performance Computing. Technologies for Engineering
6. LS-DYNA Anwenderforum, Frankenthal 2007 Keynote-Vorträge II Intel High-Performance Computing Technologies for Engineering H. Cornelius Intel GmbH A - II - 29 Keynote-Vorträge II 6. LS-DYNA Anwenderforum,
More information: A new version of Supercomputing or life after the end of the Moore s Law
: A new version of Supercomputing or life after the end of the Moore s Law Dr.-Ing. Alexey Cheptsov SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov OUTLINE About us Convergence of Supercomputing into
More informationCompilers and Code Optimization EDOARDO FUSELLA
Compilers and Code Optimization EDOARDO FUSELLA The course covers Compiler architecture Pre-requisite Front-end Strong programming background in C, C++ Back-end LLVM Code optimization A case study: nu+
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationACME Exploratory Analysis and Classic Diagnostics Viewer
ACME Exploratory Analysis and Classic Diagnostics Viewer Raymond Borges Marcia Branstetter Katherine Evans John Harney Brian Jewell Benjamin Mayer Jeff Painter Galen Shipman Brian Smith Chad Steed Dean
More informationThe Exascale Architecture
The Exascale Architecture Richard Graham HPC Advisory Council China 2013 Overview Programming-model challenges for Exascale Challenges for scaling MPI to Exascale InfiniBand enhancements Dynamically Connected
More informationApplication and System Memory Use, Configuration, and Problems on Bassi. Richard Gerber
Application and System Memory Use, Configuration, and Problems on Bassi Richard Gerber Lawrence Berkeley National Laboratory NERSC User Services ScicomP 13, Garching, Germany, July 17, 2007 NERSC is supported
More informationEnergy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS
Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory
More informationAMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016
AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP
More informationThe DEEP (and DEEP-ER) projects
The DEEP (and DEEP-ER) projects Estela Suarez - Jülich Supercomputing Centre BDEC for Europe Workshop Barcelona, 28.01.2015 The research leading to these results has received funding from the European
More informationChristopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens
LA-UR- 14-25437 Approved for public release; distribution is unlimited. Title: Portable Parallel Halo and Center Finders for HACC Author(s): Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationIBM Spectrum Scale IO performance
IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationThe APGAS Programming Model for Heterogeneous Architectures. David E. Hudak, Ph.D. Program Director for HPC Engineering
The APGAS Programming Model for Heterogeneous Architectures David E. Hudak, Ph.D. Program Director for HPC Engineering dhudak@osc.edu Overview Heterogeneous architectures and their software challenges
More informationProgrammable NICs. Lecture 14, Computer Networks (198:552)
Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport
More informationProject Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet.
Project Name The Eclipse Integrated Computational Environment Jay Jay Billings, ORNL 20140219 Parent Project None selected yet. Background The science and engineering community relies heavily on modeling
More informationSampling Using GPU Accelerated Sparse Hierarchical Models
Sampling Using GPU Accelerated Sparse Hierarchical Models Miroslav Stoyanov Oak Ridge National Laboratory supported by Exascale Computing Project (ECP) exascaleproject.org April 9, 28 Miroslav Stoyanov
More information7 Trends driving the Industry to Software-Defined Servers
7 Trends driving the Industry to Software-Defined Servers The Death of Moore s Law. The Birth of Software-Defined Servers It has been over 50 years since Gordon Moore saw that transistor density doubles
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationProgramming Models for Supercomputing in the Era of Multicore
Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need
More informationHow to write code that will survive the many-core revolution Write once, deploy many(-cores) F. Bodin, CTO
How to write code that will survive the many-core revolution Write once, deploy many(-cores) F. Bodin, CTO Foreword How to write code that will survive the many-core revolution? is being setup as a collective
More informationCOP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher
COP4020 ming Languages Compilers and Interpreters Robert van Engelen & Chris Lacher Overview Common compiler and interpreter configurations Virtual machines Integrated development environments Compiler
More informationThe Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center
The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.
More informationThe Future of GPU Computing
The Future of GPU Computing Bill Dally Chief Scientist & Sr. VP of Research, NVIDIA Bell Professor of Engineering, Stanford University November 18, 2009 The Future of Computing Bill Dally Chief Scientist
More informationFrom High Level to Machine Code. Compilation Overview. Computer Programs
From High Level to Algorithm/Model Java, C++, VB Compilation Execution Cycle Hardware 27 October 2007 Ariel Shamir 1 Compilation Overview Algorithm vs. Programs From Algorithm to Compilers vs. Interpreters
More informationExpressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17
Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]
More informationSupercomputing and Mass Market Desktops
Supercomputing and Mass Market Desktops John Manferdelli Microsoft Corporation This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationA Code Transformation Approach to Achieving High Performance Portability
1 A Code Transformation Approach to Achieving High Performance Portability SPPEXA Annual Plenary Meeting Jan 25, 2016@Leibniz Supercomputer Center Hiroyuki TAKIZAWA (Tohoku University/JST) Daisuke TAKAHASHI,
More information