GPPD Grupo de Processamento Paralelo e Distribuído Parallel and Distributed Processing Group
|
|
- Beryl Cain
- 6 years ago
- Views:
Transcription
1 GPPD Grupo de Processamento Paralelo e Distribuído Parallel and Distributed Processing Group Philippe O. A. Navaux HPC e novas Arquiteturas CPTEC - Cachoeira Paulista - SP 8 de março de 2017
2 Team Professor: Philippe O. A. Navaux Post-Doc: Francieli Z. Boito (ended 11/2015) Marco A. Zanata Alves (ended 3/2016) Matthias Diener Eduardo Cruz Master Students: Jean Bez Jimmy Sanchez Matheus Serpa PhD Students: Daniel Oliveira Eduardo Roloff Edson Padoin Emmanuell Carreño Francis Birck Moreira Rafael Tesser Rodrigo Kassick Victor Abaunza
3 Research Areas #1: Memory Hierarchy Optimization - Data and Thread Mapping - Scheduling - Placement - Memory Request Scheduling People: A. Carissimi, E. Cruz, F. Moreira, M. Diener, P. Navaux #2: Multi-core Architecture and Power Consumption Optimization - Automatic frequency control in heterogeneous systems - Energy efficiency of cache memories People: E. Padoin, M. Alves, P. Navaux #3: I/O Optimization - Application-Guided I/O Scheduling - Dynamic I/O Reconfiguration People: Francieli Zanon Boito, Jean Bez, Rodrigo Kassick, P. Navaux
4 Research Areas (2) #4: Software errors on HPC architectures - Fault Tolerance Techniques Efficiency - Radiation Sensitivity People: Daniel Oliveira, Luigi Carro, Paolo Rech, Philippe Navaux #5: Applications in HPC Systems - Dynamic Load Balancing - Distributed Systems Applications - GPU implementation - Models and Metrics People: Rafael Tesser, Victor Abaunza, Philippe Navaux #6: CLOUD Computing (including IoT and Big Data) - HPC in Cloud - Data Intensive Analysis People: A. Carissimi, E. Roloff, Emmanuell Carreño, J. Sanchez, P. Navaux
5 #1 Affinity-based thread and data mapping Objective: Improve performance and energy efficiency of memory accesses by: 1. executing threads that access the same data close to each other in the hierarchy (thread mapping) 2. placing memory pages on memory controllers that perform most accesses to them (data mapping) Previous Results: speedup of up to 4x using online mechanisms Papers: Matthias Diener et al. kmaf: Automatic Kernel-Level Management of Thread and Data Affinity. PACT Eduardo Cruz et al. Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols. JPDC 2014.
6 #2: Power for Exascale DVFS and load balancing - EnergyLB Ondes Lulesh Applying DVFS and load balancing during the execution: Reduction of the total energy consumption of up to: 11% over ScotchLB and 8.7% over GreedyLB -> Ondes3D 10% over ScotchLB and 6.3% over GreedyLB -> Lulesh
7 #2: Power for Exascale ARM Processor (+GPU) in HPC Objective: Verify the Energy Efficiency of a seismic model on a low power heterogeneous architecture. o Jetson TK1 (Heterogeneous cores) o Seismic Model: Ondes 3D (BRGM - France) Previous Results: ARM Cortex-A15 + NVIDIA GK20a GPU (Jetson TK1) compared to: o 2 Intel Xeon E Tesla TM M2075 (2.99x more efficient) o 1 Intel i Tesla K20c (5.82x more efficient) Víctor Martínez, et al. Task-based programming on low-power Nvidia Jetson TK1 manycore architecture: Application to earthquake modeling. Latin America High Performance Computing Conference (CARLA '2015) 2015.
8 #3: Storage for Exascale Parallel I/O for HPC Objective: To provide scalable high performance I/O for HPC architectures Research topics: I/O scheduling for parallel file systems I/O scheduling in the I/O forwarding layer Pattern matching for access pattern detection (collaboration with Barcelona Supercomputing Center) Low-power storage servers Boito, F. et al. Automatic I/O Scheduling Algorithm Selection for Parallel File Systems. Concurrency and Computation: Practice and Experience. Wiley, Boito, F. et al. AGIOS: Application-Guided I/O Scheduling for Parallel File Systems. International Conference on Parallel and Distributed Systems (ICPADS), 2013.
9 #3: Coordinate Access to Parallel File System Servers Objective Coordinate the access of I/O nodes to the data servers to reduce contention TWINS Scheduler We designed a scheduler that uses time windows to coordinate the I/O nodes accesses to different data servers Results Improve read performance of shared files By up to 28% over alternatives and by up to 50% over not forwarding I/O requests Bez et al., TWINS: Server Access Coordination in the I/O Forwarding Layer, in Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2017.
10 #3: Towards Energy-Efficient Storage Servers in HPC Objective Evaluate the viability of using low-power architectures as file systems servers Data servers Processing power is less important ARM processors as an alternative Experiments with: Representative access patterns Hou10ni application Results Replace one regular data server by two ARM boards would double the bandwidth and decrease energy consumption by 85% while not compromising on performance, specially for read-intensive workloads Machado et al., Towards Energy-Efficient Storage Servers, in the 32nd ACM Symposium on Applied Computing (SAC), 2017.
11 #4: Transient errors on HPC architectures Objective: Evaluation and mitigation of radiation-induced errors in HPC. We perform radiation experiments in Los Alamos and Didcot to measure the error rate of Xeon-Phi, K40, APU (CPU+GPU), TK1, and etc.. neutrons We design experimentally-tuned mitigation strategies UFRGS setup at LANSCE, Los Alamos. Nov Previous Results: Predict and validate Titan radiation-induced error rate (HPCA2015), design Algorithm Based Fault Tolerance for MxM and FFT, and Duplication With Comparison for other codes (Trans. Comp. 2015). We have, for the first time, compared the error rate of Xeon-Phi and K40 (submitted to SELSE2016).
12 #5: Software for Exascale: Improving scheduling on heterogeneous architectures Objective: Improve scheduling on heterogeneous architectures: 1. Splitting a seismic model called Ondes 3D (BRGM - France) into tasks. 2. Executing parallel tasks on heterogeneous architectures (CPU+accelerators) using all available cores for computing. Previous Results: Maximum speedup o Only accelerator cores - if simulation fits on memory (in-core): up to 7x. o All cores (CPU+accelerator) - if simulation doesn t fit on memory (out-of-core): up to 25x. Víctor Martínez, et al. Towards seismic wave modeling on heterogeneous many-core architectures using task-based runtime system. SBAC-PAD 2015.
13 #6 Cloud Computing Objective: Study of Cloud as an environment for HPC 1. Study the cost-efficiency of public clouds for HPC. 2. Port HPC Applications to the cloud. 3. Improve the performance of cloud resource using task mapping Previous Results: Compreensive model for cost-efficiency o Tested using Azure, EC2 and Rackspace CLOUD 2012, CloudCom 2012 and Book Chapter BRAMS (weather prediction) ported to Azure o Several improvments made using cloud features ICCS 2015 Task mapping improves the performance up to 40% CCGRID 2016
14 H2020 Project Participation WP2 Disruptive Exascale Computer Architecture
15 General Organization 15 HPC4E Slides Template *
16 Tasks Partners MAIN COLLABORATIONS Tasks Transversal WPs Deliverables Memory Pages Mapping UFRGS, LNCC, INRIA 2.2 Full Waveform Inversion BSC, COPPE, 2.1, 2.2, 2.3, WP6 UFRGS, LNCC, INRIA , 2.2, 2.3, 2.4, 2.6 Acoustic Propagation on GPUs ITA, UFRGS, Petrobras, BSC WP6 2.1, 2.3 Elastic Propagation on Intel s Architectures BSC, REPSOL, LNCC, 2.1, 2.2, 2.4 COPPE WP6 2.1, 2.2, 2.4 BOAST Kernels for ALYA INRIA, BSC, UFRGS 2.1, 2.2, 2.4 WP4, WP5 2.1, 2.2, 2.3, 2.4 GPU Kernels for ALYA BSC, COPPE, ITA 2.1, 2.4 WP4, WP5 2.1, 2.3, 2.4 Radiation-induced Error Criticality UFRGS, BSC 2.1, 2.4 WP4, WP5, WP6 2.1 Porting libmesh to MontBlanc COPPE, BSC 2.1, 2.3 WP3, WP HPC4E Slides Template 2.1, 2.3 WP4, WP5, WP6 2.2 *
17 Thanks! GPPD Grupo de Processamento Paralelo e Distribuído Parallel and Distributed Processing Group HPC4E Project: Research has received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under grant agreement n
Radiation-Induced Error Criticality In Modern HPC Parallel Accelerators
Radiation-Induced Error Criticality In Modern HPC Parallel Accelerators Presented by: Christopher Boggs, Clayton Connors on 09/26/2018 Authored by: Daniel Oliveira, Laercio Pilla, Mauricio Hanzich, Vinicius
More informationMatheus Serpa, Eduardo Cruz and Philippe Navaux
Matheus Serpa, Eduardo Cruz and Philippe Navaux Informatics Institute Federal University of Rio Grande do Sul (UFRGS) Hillsboro, September 27 IXPUG Annual Fall Conference 2018 Goal of this work 2 Methodology
More informationPerformance Evaluation of Multiple Cloud Data Centers Allocations for HPC
Performance Evaluation of Multiple Cloud Data Centers Allocations for HPC Eduardo Roloff 1(B), Emmanuell Diaz Carreño 1, Jimmy K.M. Valverde-Sánchez 1, Matthias Diener 1, Matheus da Silva Serpa 1, Guillaume
More informationHigh Performance I/O for Seismic Wave Propagation Simulations
2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing High Performance I/O for Seismic Wave Propagation Simulations Francieli Zanon Boito 1, Jean Luca Bez 2,
More informationPhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015.
PhD in Computer And Control Engineering XXVII cycle Torino February 27th, 2015. Parallel and reconfigurable systems are more and more used in a wide number of applica7ons and environments, ranging from
More informationD2.2 Report of the Implementation of computing Kernels in symmetric multicore-based architectures
D2.2 Report of the Implementation of computing Kernels in symmetric multicore-based architectures Version 1.0 Document Information Contract Number 689772 Project Website Contractual Deadline Dissemination
More informationExpressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17
Expressing Heterogeneous Parallelism in C++ with Intel Threading Building Blocks A full-day tutorial proposal for SC17 Tutorial Instructors [James Reinders, Michael J. Voss, Pablo Reble, Rafael Asenjo]
More informationCode Auto-Tuning with the Periscope Tuning Framework
Code Auto-Tuning with the Periscope Tuning Framework Renato Miceli, SENAI CIMATEC renato.miceli@fieb.org.br Isaías A. Comprés, TUM compresu@in.tum.de Project Participants Michael Gerndt, TUM Coordinator
More informationBig Data Systems on Future Hardware. Bingsheng He NUS Computing
Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big
More informationLoad Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs
Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs Laércio Lima Pilla llpilla@inf.ufrgs.br LIG Laboratory INRIA Grenoble University Grenoble, France Institute of Informatics
More informationExperiences Using Tegra K1 and X1 for Highly Energy Efficient Computing
Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian
More informationCarlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)
Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB
More informationEnergy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS
Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory
More informationPLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters
PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,
More informationTowards Energy-Efficient Storage Servers
Towards Energy-Efficient Storage Servers Vinícius Machado, Amanda Braga, Natália Rampon, Jean Bez, Francieli Boito, Rodrigo Kassick, Edson Padoin, Julien Diaz, Jean-François Méhaut, Philippe Navaux To
More informationTuning Alya with READEX for Energy-Efficiency
Tuning Alya with READEX for Energy-Efficiency Venkatesh Kannan 1, Ricard Borrell 2, Myles Doyle 1, Guillaume Houzeaux 2 1 Irish Centre for High-End Computing (ICHEC) 2 Barcelona Supercomputing Centre (BSC)
More informationAUTOMATIC SMT THREADING
AUTOMATIC SMT THREADING FOR OPENMP APPLICATIONS ON THE INTEL XEON PHI CO-PROCESSOR WIM HEIRMAN 1,2 TREVOR E. CARLSON 1 KENZO VAN CRAEYNEST 1 IBRAHIM HUR 2 AAMER JALEEL 2 LIEVEN EECKHOUT 1 1 GHENT UNIVERSITY
More informationIWES st Italian Workshop on Embedded Systems Pisa September 2016
IWES 2016 1st Italian Workshop on Embedded Systems Pisa -- 19 September 2016 Research Group Overview Roberto Giorgi University of Siena, Italy http://www.dii.unisi.it/~giorgi Siena on Earth 2 Engineering
More informationAn Extension of the StarSs Programming Model for Platforms with Multiple GPUs
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs Eduard Ayguadé 2 Rosa M. Badia 2 Francisco Igual 1 Jesús Labarta 2 Rafael Mayo 1 Enrique S. Quintana-Ortí 1 1 Departamento
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationThe DEEP (and DEEP-ER) projects
The DEEP (and DEEP-ER) projects Estela Suarez - Jülich Supercomputing Centre BDEC for Europe Workshop Barcelona, 28.01.2015 The research leading to these results has received funding from the European
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationDynamic Load Balancing for Weather Models via AMPI
Dynamic Load Balancing for Eduardo R. Rodrigues IBM Research Brazil edrodri@br.ibm.com Celso L. Mendes University of Illinois USA cmendes@ncsa.illinois.edu Laxmikant Kale University of Illinois USA kale@cs.illinois.edu
More informationA Sharing-Aware Memory Management Unit for Online Mapping in Multi-Core Architectures
A Sharing-Aware Memory Management Unit for Online Mapping in Multi-Core Architectures Eduardo H. M. Cruz 1, Matthias Diener 1, Laércio L. Pilla 2, Philippe O. A. Navaux 1 1 Informatics Institute, Federal
More informationHigh Performance Computing on ARM
C. Steinhaus C. Wedding christian.{wedding, steinhaus}@rwth-aachen.de February 12, 2015 Overview 1 Dense Linear Algebra 2 MapReduce 3 Spectral Methods 4 Structured Grids 5 Conclusion and future prospect
More informationCommunication Optimization of Parallel Applications in the Cloud
Communication Optimization of Parallel Applications in the Cloud Emmanuell D. Carreño, Matthias Diener, Eduardo H. M. Cruz, Philippe O. A. Navaux Informatics Institute, UFRGS, Porto Alegre, Brazil Research
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationManycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT
Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate
More informationIHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing
: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing Balazs Gerofi Exascale System Software Team, RIKEN Center for Computational Science 218/Nov/15 SC 18 Intel Extreme Computing
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows Rafael Ferreira da Silva, Scott Callaghan, Ewa Deelman 12 th Workflows in Support of Large-Scale Science (WORKS) SuperComputing
More informationAdaptability and Dynamicity in Parallel Programming The MPI case
Adaptability and Dynamicity in Parallel Programming The MPI case Nicolas Maillard Instituto de Informática UFRGS 21 de Outubro de 2008 http://www.inf.ufrgs.br/~nicolas The Institute of Informatics, UFRGS
More informationEnergy Efficiency and I/O Performance of Low-Power Architectures
Energy Efficiency and I/O Performance of Low-Power Architectures Pablo Pavan, Ricardo Lorenzoni, Vinícius Machado, Jean Bez, Edson Padoin, Francieli Zanon Boito, Philippe Navaux, Jean-François Méhaut To
More informationReal-Time Support for GPU. GPU Management Heechul Yun
Real-Time Support for GPU GPU Management Heechul Yun 1 This Week Topic: Real-Time Support for General Purpose Graphic Processing Unit (GPGPU) Today Background Challenges Real-Time GPU Management Frameworks
More informationEnergy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München
Energy Efficiency Tuning: READEX Madhura Kumaraswamy Technische Universität München Project Overview READEX Starting date: 1. September 2015 Duration: 3 years Runtime Exploitation of Application Dynamism
More informationOverview of Performance Prediction Tools for Better Development and Tuning Support
Overview of Performance Prediction Tools for Better Development and Tuning Support Universidade Federal Fluminense Rommel Anatoli Quintanilla Cruz / Master's Student Esteban Clua / Associate Professor
More informationAutoTune Workshop. Michael Gerndt Technische Universität München
AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy
More informationHPC projects. Grischa Bolls
HPC projects Grischa Bolls Outline Why projects? 7th Framework Programme Infrastructure stack IDataCool, CoolMuc Mont-Blanc Poject Deep Project Exa2Green Project 2 Why projects? Pave the way for exascale
More informationAccelerating Data Centers Using NVMe and CUDA
Accelerating Data Centers Using NVMe and CUDA Stephen Bates, PhD Technical Director, CSTO, PMC-Sierra Santa Clara, CA 1 Project Donard @ PMC-Sierra Donard is a PMC CTO project that leverages NVM Express
More informationRUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS
RUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS Yash Ukidave, Perhaad Mistry, Charu Kalra, Dana Schaa and David Kaeli Department of Electrical and Computer Engineering
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationTiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation
Tiny GPU Cluster for Big Spatial Data: A Preliminary Performance Evaluation Jianting Zhang 1,2 Simin You 2, Le Gruenwald 3 1 Depart of Computer Science, CUNY City College (CCNY) 2 Department of Computer
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationBigger GPUs and Bigger Nodes. Carl Pearson PhD Candidate, advised by Professor Wen-Mei Hwu
Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by Professor Wen-Mei Hwu 1 Outline Experiences from working with domain experts to develop GPU codes on Blue Waters
More informationGPU Computing: Development and Analysis. Part 1. Anton Wijs Muhammad Osama. Marieke Huisman Sebastiaan Joosten
GPU Computing: Development and Analysis Part 1 Anton Wijs Muhammad Osama Marieke Huisman Sebastiaan Joosten NLeSC GPU Course Rob van Nieuwpoort & Ben van Werkhoven Who are we? Anton Wijs Assistant professor,
More informationPORTING PARALLEL APPLICATIONS TO HETEROGENEOUS SUPERCOMPUTERS: LIBRARIES AND TOOLS CAN MAKE IT TRANSPARENT
PORTING PARALLEL APPLICATIONS TO HETEROGENEOUS SUPERCOMPUTERS: LIBRARIES AND TOOLS CAN MAKE IT TRANSPARENT Jean-Yves VET, DDN Storage Patrick CARRIBAULT, CEA Albert COHEN, INRIA CEA, DAM, DIF, F-91297
More informationA Multi-Tiered Optimization Framework for Heterogeneous Computing
A Multi-Tiered Optimization Framework for Heterogeneous Computing IEEE HPEC 2014 Alan George Professor of ECE University of Florida Herman Lam Assoc. Professor of ECE University of Florida Andrew Milluzzi
More informationTR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut
TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1
More informationTopology and affinity aware hierarchical and distributed load-balancing in Charm++
Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Emmanuel Jeannot, Guillaume Mercier, François Tessier Inria - IPB - LaBRI - University of Bordeaux - Argonne National
More informationAnalyses, Hardware/Software Compilation, Code Optimization for Complex Dataflow HPC Applications
Analyses, Hardware/Software Compilation, Code Optimization for Complex Dataflow HPC Applications CASH team proposal (Compilation and Analyses for Software and Hardware) Matthieu Moy and Christophe Alias
More informationLEGaTO: First Steps Towards. Energy-Efficient Toolset for. Heterogeneous Computing SAMOS XVIII. Tobias Becker (Maxeler) 18/July/2018
LEGaTO: First Steps Towards Energy-Efficient Toolset for Heterogeneous Computing Tobias Becker (Maxeler) 18/July/2018 The LEGaTO project has received funding from the European Union's Horizon 2020 research
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationShadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies
Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies Alexander Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, Karsten Schwan {merritt.alex,abhishek.verma}@gatech.edu {vishakha,ada,schwan}@cc.gtaech.edu
More informationOPERA. Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications
OPERA Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications Co-funded by the Horizon 2020 Framework Programme of the
More informationGuiding the optimization of parallel codes on multicores using an analytical cache model
Guiding the optimization of parallel codes on multicores using an analytical cache model Diego Andrade, Basilio B. Fraguela, and Ramón Doallo Universidade da Coruña, Spain {diego.andrade,basilio.fraguela,ramon.doalllo}@udc.es
More informationGpuWrapper: A Portable API for Heterogeneous Programming at CGG
GpuWrapper: A Portable API for Heterogeneous Programming at CGG Victor Arslan, Jean-Yves Blanc, Gina Sitaraman, Marc Tchiboukdjian, Guillaume Thomas-Collignon March 2 nd, 2016 GpuWrapper: Objectives &
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationEvolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017
Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &
More informationThe 7 deadly sins of cloud computing [2] Cloud-scale resource management [1]
The 7 deadly sins of [2] Cloud-scale resource management [1] University of California, Santa Cruz May 20, 2013 1 / 14 Deadly sins of of sin (n.) - common simplification or shortcut employed by ers; may
More informationA low memory footprint OpenCL simulation of short-range particle interactions
A low memory footprint OpenCL simulation of short-range particle interactions Raymond Namyst STORM INRIA Group With Samuel Pitoiset, Inria and Emmanuel Cieren, Laurent Colombet, Laurent Soulard, CEA/DAM/DPTA
More informationScientific Workflow Scheduling with Provenance Support in Multisite Cloud
Scientific Workflow Scheduling with Provenance Support in Multisite Cloud Ji Liu 1, Esther Pacitti 1, Patrick Valduriez 1, and Marta Mattoso 2 1 Inria, Microsoft-Inria Joint Centre, LIRMM and University
More informationThe Mont-Blanc project Updates from the Barcelona Supercomputing Center
montblanc-project.eu @MontBlanc_EU The Mont-Blanc project Updates from the Barcelona Supercomputing Center Filippo Mantovani This project has received funding from the European Union's Horizon 2020 research
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationIMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM
IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information
More informationA GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou
A GPU Implementation of Tiled Belief Propagation on Markov Random Fields Hassan Eslami Theodoros Kasampalis Maria Kotsifakou BP-M AND TILED-BP 2 BP-M 3 Tiled BP T 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 4 Tiled
More informationDynamical Exascale Entry Platform
DEEP Dynamical Exascale Entry Platform 2 nd IS-ENES Workshop on High performance computing for climate models 30.01.2013, Toulouse, France Estela Suarez The research leading to these results has received
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationExperiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor
Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain
More informationDigital Earth Routine on Tegra K1
Digital Earth Routine on Tegra K1 Aerosol Optical Depth Retrieval Performance Comparison and Energy Efficiency Energy matters! Ecological A topic that affects us all Economical Reasons Practical Curiosity
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationInteractive Analysis of Large Distributed Systems with Scalable Topology-based Visualization
Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization Lucas M. Schnorr, Arnaud Legrand, and Jean-Marc Vincent e-mail : Firstname.Lastname@imag.fr Laboratoire d Informatique
More informationENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION
ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION Vignesh Adhinarayanan Ph.D. (CS) Student Synergy Lab, Virginia Tech INTRODUCTION Supercomputers are constrained by power Power
More informationA Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED
A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and
More informationImplicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC
Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationExploring Non-Uniform Processing In-Memory Architectures
Exploring Non-Uniform Processing In-Memory Architectures ABSTRACT Kishore Punniyamurthy kishore.punniyamurthy@utexas.edu The University of Texas at Austin Advancements in packaging technology have made
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationD6.1 AllScale Computing Infrastructure
H2020 FETHPC-1-2014 An Exascale Programming, Multi-objective Optimisation and Resilience Management Environment Based on Nested Recursive Parallelism Project Number 671603 D6.1 AllScale Computing Infrastructure
More informationPorting CPU-based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing
GTC2014 S4470 Porting CPU-based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing Steve Jankly Halliburton Energy Services, Inc. Introduction Halliburton Halliburton is one of the world
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationTowards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH
Towards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH 1 Exascale Programming Models With the evolution of HPC architecture towards exascale, new approaches for programming these machines
More informationDescription of Power8 Nodes Available on Mio (ppc[ ])
Description of Power8 Nodes Available on Mio (ppc[001-002]) Introduction: HPC@Mines has released two brand-new IBM Power8 nodes (identified as ppc001 and ppc002) to production, as part of our Mio cluster.
More informationBarcelona Supercomputing Center
www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:
More informationExploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen
Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Jeffrey S. Vetter SPPEXA Symposium Munich 26 Jan 2016 ORNL is managed by UT-Battelle for the US Department of Energy
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationSCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH
Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH LAYER CAKE Application Runtime OS Kernel ISA Physical RAM 2 COMMODITY
More informationMemory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25
ROME Workshop @ IPDPS Vancouver Memory Footprint of Locality Information On Many- Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 Locality Matters to HPC Applications Locality Matters
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationAnalyses, Hardware/Software Compilation, Code Optimization for Complex Dataflow HPC Applications
Analyses, Hardware/Software Compilation, Code Optimization for Complex Dataflow HPC Applications CASH team proposal (Compilation and Analyses for Software and Hardware) Matthieu Moy and Christophe Alias
More information