: A new version of Supercomputing or life after the end of the Moore s Law

Similar documents
Cognitive-based Computation, Semantic Understanding, and Web Wisdom

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

The Future of High Performance Computing

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

Green Supercomputing

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

China Big Data and HPC Initiatives Overview. Xuanhua Shi

Practical Scientific Computing

IBM Power Systems HPC Cluster

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Advances of parallel computing. Kirill Bogachev May 2016

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

GPU-centric communication for improved efficiency

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

The Future of High- Performance Computing

System Software for Big Data and Post Petascale Computing

Data Movement & Tiering with DMF 7

The Future of High- Performance Computing

Cray XC Scalability and the Aries Network Tony Ford

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Motivation Goal Idea Proposition for users Study

Emerging Technologies for HPC Storage

AIST Super Green Cloud

Practical Scientific Computing

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Inauguration Cartesius June 14, 2013

Intel profiling tools and roofline model. Dr. Luigi Iapichino

Next-Generation Cloud Platform

Paving the Road to Exascale

Data Intensive Scalable Computing

A New NSF TeraGrid Resource for Data-Intensive Science

Overview of Tianhe-2

Fujitsu s Approach to Application Centric Petascale Computing

High Performance Computing. What is it used for and why?

GPUs and Emerging Architectures

KNL tools. Dr. Fabio Baruffa

Shared Services Canada Environment and Climate Change Canada HPC Renewal Project

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

Pervasive DataRush TM

Exascale: challenges and opportunities in a power constrained world

The way toward peta-flops

Brand-New Vector Supercomputer

HiTune. Dataflow-Based Performance Analysis for Big Data Cloud

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Modernize Your IT with Dell EMC Storage and Data Protection Solutions

Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009

Revealing Applications Access Pattern in Collective I/O for Cache Management

Toward Automated Application Profiling on Cray Systems

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization

Cycle Sharing Systems

Leveraging Flash in HPC Systems

Secure, scalable storage made simple. OEM Storage Portfolio

High Performance Computing. What is it used for and why?

What are Clusters? Why Clusters? - a Short History

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Best Practices for Setting BIOS Parameters for Performance

HPC Storage Use Cases & Future Trends

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Technologies for High Performance Data Analytics

The Fusion Distributed File System

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

VxRack FLEX Technical Deep Dive: Building Hyper-converged Solutions at Rackscale. Kiewiet Kritzinger DELL EMC CPSD Snr varchitect

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

MDHIM: A Parallel Key/Value Store Framework for HPC

Fast Forward I/O & Storage

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Performance Matters Scaling Integration Processes to Meet the Needs of Your Business. James Ahlborn, Chief Software Architect, Dell Boomi

HPC Technology Trends

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

CSCI-GA Multicore Processors: Architecture & Programming Lecture 3: The Memory System You Can t Ignore it!

Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

The Software Driven Datacenter

Analyzing I/O Performance on a NEXTGenIO Class System

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Arm Processor Technology Update and Roadmap

MOHA: Many-Task Computing Framework on Hadoop

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Building supercomputers from embedded technologies

Performance Analysis of Parallel Scientific Applications In Eclipse

High performance computing and numerical modeling

Medical practice: diagnostics, treatment and surgery in supercomputing centers

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

TECHNOLOGIES CO., LTD.

The FlashStack Data Center

Increasing Performance of Existing Oracle RAC up to 10X

HPC Architectures. Types of resource currently in use

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Performance Analysis and Prediction for distributed homogeneous Clusters

Software and Tools for HPE s The Machine Project

Transcription:

: A new version of Supercomputing or life after the end of the Moore s Law Dr.-Ing. Alexey Cheptsov SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov

OUTLINE About us Convergence of Supercomputing into Big Data Semantic Web as a new HPC domain? Mission Data-Centric Parallel Programming Models DreamCloud project approach SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 2

1 About HLRS Baden-Württemberg Turnover: ~1-2 bil. Staff: ~8000 Turnover: ~50 bil. Staff: ~300.000 Turnover: ~100 bil. Staff: 260.100 Turnover: ~5 bil. Staff: ~50.000 Turnover: ~2,5 bil. Staff: ~11.00 Turnover: ~14 bil. Staff: ~19.000 SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 3

1 About HLRS High Performance Computing Center Stuttgart First Cray system in Europe (Cray-2, 1986, 4 CPUs, 2GB RAM, approx. 2 GFLOPS) National HPC infrastructure provider since 1995 EU infrastructure provider since 2005 110M core hours delivered to industry in 2014 HORNET (Cray XC40, Intel Haswell CPU, Aries network) - 4.000 nodes (24 cores) - 4 PFLOPs performance - 128 GB RAM per node - 7,8 PB Disc - 1512 KW power consumption / 1.5M Euro SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 4

1 About HLRS TOP500 Total: - USA 233 - Japan - 39 - Germany 37 - China - 37 Newcomers in 2015: - USA 34 - Germany 12 - Japan - 11 SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 5

1 About HLRS XXL Application Portfolio Climate simulation on a high (a few kms) resolution University of Hohenheim 84.000 cores, 84 hours, 450 TB data Turbulent flow simulation for air- and gas dynamic analysis RWTH Aachen University 92.000 cores, 110 hours, 80 TB data See more at: http://www.gauss-centre.eu/gauss-centre/en/projects/xxl_projects_hornet/xxl_projects_hornet.html?nn=1236240 SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 6

1 About HLRS Maya, the bee ( みつばちマーヤの冒険 ) 115.000 pictures 2 hours each 200 pictures in parallel From 9600 days to 48 days 400.000 viewers in 2 weeks SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 7

1 About HLRS Challenges for New-Generation Systems Computation power Exascale is on his way, perhaps by 2020 More performance for less money Hazelhehn - approx. 8 PTFLOPs / 3M Euro power costs, approx. 1 PB RAM Sustainable performance - 1 PTFLOP Storage Spinning devices will go away by 2020 Flash is the core technology (aka NVM) Tapes will remain for data persistency Memory Source: http://insidehpc.com/2014/05/nvm-will-shake-supercomputing/ both high memory capacity and high memory bandwidth are required cannot guarantee same growth as for FLOPS extremely deep hierarchy (at the cache level) programming ease 3D stacked memory, NVM SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 8

OUTLINE About us Convergence of Supercomputing into Big Data Semantic Web as a new HPC domain? Mission Data-Centric Parallel Programming Models DreamCloud project approach SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 9

2 Convergence of Big Data into Supercomputing The modern HPC have to address a new class of computing-intensive applications from data-intensive domains in the internet, media, business, science, etc. Evolution of Computational Applications Traditional Computational Sciences Data-Intensive Sciences Structure static dynamic Locality spatially and temporally local no or little Volume fit into memory do not fit into memory Arithmetical Complexity High Precision Arithmetic variable precision or integer based SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 10

2 Convergence of Big Data into Supercomputing Hardware Infrastruktures e-services Applications Semantic Web Linked Data Data Facts Internet Intranet Information Knowledge Cloud Data Center HPC SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 11

OUTLINE About us Convergence of Supercomputing into Big Data Semantic Web as a new HPC domain? Mission Data-Centric Parallel Programming Models DreamCloud project approach SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 12

3 Semantic Web as a New HPC Domain? What are the challenges Infrastructure on demand - distributed memory parallel clusters with low-latency intercon. - multicore machines with shared memory - GPGPU devices - alltogether? SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 13

3 Semantic Web as a New HPC Domain? What are the challenges A Semantic Web Integration Platform for Large-Scale Reasoning Query (SPARQL) Selection Identification Reasoner Answer (RDF) SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 14

3 Semantic Web as a New HPC Domain? Development Platforms The idea of LarKC LarKC = an infrastructure for large scale, high performance incomplete reasoning Decider Selecter 1 Query Transformer Identifier Reasoner Selecter 2 Flexibility, Modularity Scalability, Performance 15 SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 15

3 Semantic Web as a New HPC Domain? Development Platforms LarKC architecture: High-Level Overview Workflow branching Plug-in parallelisation Query Identifier Workflow branching Decider Selecter Selecter Process 1 Reasoner Process 2 Process 3 Process 4 Multi-Threading MPI Map-Reduce SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 16

3 Semantic Web as a New HPC Domain? Development Platforms LarKC architecture: High-Level Overview Plug-in developers Workflow designers Semantic Web Service Farm / / Plug-in Marketplace Deciders Identifiers Transformers Plug-in Plug-in Selectors Reasoners Plug-in Plug-in Registry Plug-in Managers Workflow Support System End-points LarKC platform Execution Framework Data Layer (OWLIM) Remote Invocation Framework (GAT) Monitoring Service Application end-users Infrastructure RDF data base Storage Highperformance computer Computation Monitoring SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 17

OUTLINE About us Convergence of Supercomputing into Big Data Semantic Web as a new HPC domain? Mission Data-Centric Parallel Programming Models DreamCloud project approach SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 18

4 Data-Centric Parallel Programming Models State-Of-The-Art: Distributed Memory Processing over FS Programming models: MapReduce data-centric fault-tolerant poor sustainable performance restrictive key/value model SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 19

4 Data-Centric Parallel Programming Models The Message-Passing Interface Data-driven scenarios with MPI easy to integrate high performance SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 20

4 Data-Centric Parallel Programming Models The Message-Passing Interface Issues lack of implementations for the programming languages typically used in the data-centric communities, such as Java Java implementations (MPJExpress) native implementations (mpijava) full MPI-2 standard implementation issues with supporting new high-speed interconnects (e.g. Cray), related to the JVM scalability to a peta/exaflop? support of native tools for parallel computing, i.e. debuggers, error detectors, etc. JNI is used for calling communication libraries that are available in native codes (i.e. highly optimized MPI comm.) integration with a native MPI library is not easy but if you got it running, very enjoyable performance SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 21

4 Data-Centric Parallel Programming Models The Message-Passing Interface mpijava Architecture import mpi.*; Applications Java wrappers JNI C Interface Java MPI bindings Open MPI MPICH Native (C) MPI implementation SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 22

4 Data-Centric Parallel Programming Models The Message-Passing Interface Java bindings for Open MPI (ompijava) in the Open MPI s trunk! SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 23

4 Data-Centric Parallel Programming Models Developments @ HLRS ompijava performance P2P communication SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 24

4 Data-Centric Parallel Programming Models Application Scenarios Random Indexing for Large Texts A Statistical Distribution technique for word/text similarity analysis Docs Subsetting Terms Occurance vector Occurance vector Similarity index Query Expansion SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 25

OUTLINE About us Convergence of Supercomputing into Big Data Semantic Web as a new HPC domain? Mission Data-Centric Parallel Programming Models DreamCloud project approach SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 26

5 DreamCloud Project Approach Problem statement Application I/O bound Communication-bound HPC Resource Manager Infrastructure RAM ccnuma CPU NUMA node NUMA node I/O SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 27

5 DreamCloud Project Approach DreamCloud solution Application Scalable Low overhead Flexible Unified Performance Monitoring Framework Dynamic Scheduler Infrastructure SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 28

5 DreamCloud Project Approach Available application profiling tools Valgrind general analysis Likwid energy/power consumption Vampir/Paraver communication Wireshark networking infrastructure Infrastructure Monitoring Frameworks Zabbix Nagious Excess Problem: A very high user s involvement Key solution: Consolidation and Integration! SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 29

5 DreamCloud Project Approach Consolidated View Legenda: communication streams Total execution time Total energy consumption High-level view to be targeted by D3.2 Communicationintensive tasks 1 2 I/O-intensive tasks 4 Memoryintensive tasks 5 Pattern-based view 3 Disk RAM Detailed view SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 30

5 DreamCloud Project Approach Architecture Querying Interface (RESTful) Monitoring Server (Elastic Search) EXCESS monitoring agent PAPI RAPL... Node 1 3rd-party tools... Node N SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 31

5 DreamCloud Project Approach Performance Profiles Static (prior to the execution) Dynamic (collected at runtime) Performance Counters (PAPI etc.) Energy Counters (RAPL etc.) User-defined ones (e.g. progress tracker) SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 32

5 DreamCloud Project Approach Basic Metrics SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 33

5 DreamCloud Project Approach Power consumption SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 34

5 DreamCloud Project Approach Performance Profiles Representation in DreamCloud Performance Profiles(Example in CSV) timestamp;node;task;papi_tot_ins:cpu0;papi_tot_ins:cpu1 1418120193.4880380;node_1[perf];task_1;516680185;663789 1418120194.4996216;node_2[perf];task_1;663789;663714 1418120196.5121722;node_3[balanced];task_1;3805840;663405 1418120193.4880380;node_3[energy];task_2_1;516680185;623687 1418120194.4996216;node_9[perf];task_2_1;663789;655418 1418120196.5121722;node_3[perf];task_2_2;3805840;663789 1418120194.4996216;node_9[perf];task_2_2;663789;699712 Visualization SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 35

5 DreamCloud Project Approach Molecular Dynamics (MD) Simulation Code MS2 Simulation of movement of atoms and molecules Massively parallel, on a per-particle basis Persistent Task Adaptation Adaptation Phase Phase Adapt State Generation Generation Phase Phase Generate Individuals Evaluation Evaluation Phase Phase Generate Input Data Sets Compute Intensive (MPI) Exit Check Termination MS 2 MS 2 MS 2 MS 2 Rank Individuals Calculate Fitness Values SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 36

5 DreamCloud Project Approach Submission Request jdl Tasks Description Workflow Description Optimization Criteria Application 1 Submission Interface Deployment Workflow Manager RTE Scheduling Interface Progress Tracker 8 9 10 Resource Manager Scheduling 2 Request Deployment 7 Plan 3 4 Scheduling Advisor 5 6 Heuristics Module Node Node Node Node Node Node Node Node Node Node Monitoring Framework Performance Profiles Infrastructure SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 37

6 Conclusion Main Results HPC is going to face new challenges related to data-centric application expansion. Parallel programming models (mainly MapReduce and MPI) are the key enablers of HPC to data-centric applications Reaching near-peak performance is going to be the major challenge Future Work Promote existing technologies, such as MPI, to solving new challenges, such as Big Data. Making existing framework more data-centric. SEMAPRO 2015 :: 21.07.2015 :: Dr. Alexey Cheptsov 38