MPI Performance Analysis Trace Analyzer and Collector

Size: px
Start display at page:

Download "MPI Performance Analysis Trace Analyzer and Collector"

Transcription

1 MPI Performance Analysis Trace Analyzer and Collector Berk ONAT İTÜ Bilişim Enstitüsü 19 Haziran 2012

2 Outline MPI Performance Analyzing Defini6ons: Profiling Defini6ons: Tracing Intel Trace Analyzer Lab: How to use ITAC

3 Performance Problems Scalability Produc6vity Efficiency Performance technology Applica6on- specific and automa6c performance tools Cluster analysis

4 Role of Programmer How should we write our programs, given that we have a good op6mizing compiler? Write simpler codes: o Easy to read, o Easy to maintain and o Ensure correctness. Do: o o o Select best algorithm Write code that s readable & maintainable Eliminate op6miza6on blockers Allow compiler to do its job Focus on inner loops Use a profiler and an analyzer to find important ones with 6me consuming

5 Definitions Profiling: Recording of summary informa6on during execu6on Inclusive, exclusive 6me, Number of calls, Hardware sta6s6cs, (hardware counters ) Reflects performance behaviour of program en66es Func6ons, Loops, Basic blocks User- defined seman6c en66es Helps to expose performance bo\lenecks and hotspots Implemented through: Sampling: periodic OS interrupts or hardware counter traps Instrumenta6on: direct inser6on of measurement code

6 Definitions Profile Terminology Rou6ne int main() Inclusive 6me 100 secs Exclusive 6me =10 secs Number of Calls 1 call Number of Subrou6nes Child rou<nes called = 3 Inclusive 6me/call 100 secs int main( ) { /* takes 100 secs */ } /* f1(); /* takes 20 secs */ f2(); /* takes 50 secs */ f1(); /* takes 20 secs */ /* other work */ Time can be replaced by counts */

7 Definitions Tracing: Recording of informa6on about significant points (events) during program execu6on Entering/exi6ng code regions (func6on, loop, block, ) Thread/process interac6ons (e.g., send/receive message) Save informa6on in event record 6mestamp CPU iden6fier, thread iden6fier Event type and event- specific informa6on Event trace is a 6me- sequenced stream of event records Can be used to reconstruct dynamic program behavior Typically requires code instrumenta6on

8 Definitions Event Tracing: Instrumenta<on, Monitor, Trace

9 Definitions Event Tracing: Timeline Visualiza<on

10 Intel Trace Analyzer and Collector Intel Trace Analyzer and Collector: provide informa6on cri6cal to understanding and op6mizing MPI cluster performance by quickly finding performance bo\lenecks with MPI communica6on Interface and Displays Metrics Tracking Scalability Instrumenta6on and Tracing Compa6bility

11 Intel Trace Analyzer and Collector Compa<bility Intel compilers and GNU* compilers Intel MPI Library MPICH (and compa6ble deriva6ves) Red Hat Enterprise Linux 3.0 or 4.0 SUSE LINUX Enterprise Server 9 or 10 SGI Al6x

12 Intel Trace Analyzer and Collector Interface and Displays Timeline Views and Parallelism Display Displays concurrent behavior of Parallel applica6ons Calculates sta6s6cs for specific 6me Intervals, processes, or func6ons Displays applica6on ac6vi6es, event Source code loca6ons, and message passing along 6me axis

13 Intel Trace Analyzer and Collector Advanced GUI Display Scalability Detailed and Aggregate Views Examines aspects of applica6on run6me behavior, grouped by func6ons or processes Easily iden6fies the amount of 6me spent in MPI communica6on Easily see the performance differences between two program runs

14 Intel Trace Analyzer and Collector Execu<on Sta<s<cs Provides subrou6ne execu6on metrics or call- tree characteris6cs Profiling Library Records distributed, event- based trace data Sta<s<cs Readability Logs informa6on for func6on calls, sent messages, and collec6ve opera6ons

15 Intel Trace Analyzer and Collector Scalability Low Overhead Provides structured trace file (STF) format for scalability Generates trace files faster Allows random access to por6ons of a trace, making it suitable for analysis of large amounts of trace data Filtering and Memory Handling Caches trace data in memory to reduce run6me overhead and memory consump6on

16 How to Use ITAC Login to your UYBHM node using - X with ssh : bash: $ ssh - X du??@wsl- node??.uhem.itu.edu.tr or use your PuTTY program in your Windows with X11 forwarding in SSH sec<on. Copy example file tar to your directory bash: $ cd workshop bash: $ cp /RS/users/bonat/workshop/YAZOKULU/ tar. bash: $ tar - xvf tar bash: $ cd /mpi- analyze/traceanalyzer

17 How to Use ITAC Seeng Up Environmental Variables: Adding source ITAC line to your.bashrc and/or.bash_profile source /RS/progs/intel/itac/7.1/bin/itacvars.sh Use add- ITAC- to- my- PATH.sh script bash: $./add- ITAC- to- my- PATH.sh

18 How to Use ITAC Collec<ng Trace Data: First create the object files: bash: $ mpiicc bujerfly.c - c Link the object file with ITC libs: bash: $ mpiicc bujerfly.o - lvt - ldwarf - lelf - lvtunwind - lnsl - lm - ldl - lpthread - L/RS/progs/intel/itac/7.1/lib/ - o bujerfly.x You can also use the given ITACcompile.sh script: bash: $./ITACcompile.sh bujerfly.c

19 How to Use ITAC Collec<ng Trace Data: First create the object files: bash: $ mpirun - np 8./bujerfly.x # Iter: 1 # Stage = 3 0 (id): I'm at the barrier 5 (id): I'm at the barrier 7 (id): I'm at the barrier 2 (id): I'm at the barrier ## Calcula<on <me for 1 itera<ons : (id): I passed the barrier 4 (id): I passed the barrier 2 (id): I passed the barrier 0 (id): I passed the barrier

20 How to Use ITAC Analyzing Trace Data: Check tracing files (.ss): bash: $ ls bujerfly.c bujerfly.x bujerfly.x.ss Link the object file with ITC libs bash: $ traceanalyzer bujerfly.x.ss

21 How to Use ITAC Event Timeline Analyzing Trace Data: Func<on Profile Message Profile 21

22 How to Use ITAC Analyzing Trace Data: 22

23 How to Use ITAC Analyzing Trace Data: 23

24

Analysing OpenMP Programs Inspector XE and Amplifier XE

Analysing OpenMP Programs Inspector XE and Amplifier XE Analysing OpenMP Programs Inspector XE and Amplifier XE Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline OpenMP Overhead Tools for analyzing OpenMP programs Print statement (Conven@onal way!) Intel

More information

MPI & OpenMP Mixed Hybrid Programming

MPI & OpenMP Mixed Hybrid Programming MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why

More information

MPI Performance Analysis TAU: Tuning and Analysis Utilities

MPI Performance Analysis TAU: Tuning and Analysis Utilities MPI Performance Analysis TAU: Tuning and Analysis Utilities Berk ONAT İTÜ Bilişim Enstitüsü 19 Haziran 2012 Outline TAU Parallel Performance System Hands on: How to use TAU Tools of TAU Analysis and Visualiza

More information

Installing the Quantum ESPRESSO distribution

Installing the Quantum ESPRESSO distribution Joint ICTP-TWAS Caribbean School on Electronic Structure Fundamentals and Methodologies, Cartagena, Colombia (2012). Installing the Quantum ESPRESSO distribution Coordinator: A. D. Hernández-Nieves Installing

More information

OpenMP Programming 2 Advanced OpenMP Programming

OpenMP Programming 2 Advanced OpenMP Programming OpenMP Programming 2 Advanced OpenMP Programming Berk ONAT İTÜ Bilişim Enstitüsü 21 Haziran 2012 Outline OpenMP Synchroniza6on Constructs Single, Cri6cal, Atomic, Barrier OpenMP Data Scope Clauses Firstprivate,

More information

Implementing MPI on Windows: Comparison with Common Approaches on Unix

Implementing MPI on Windows: Comparison with Common Approaches on Unix Implementing MPI on Windows: Comparison with Common Approaches on Unix Jayesh Krishna, 1 Pavan Balaji, 1 Ewing Lusk, 1 Rajeev Thakur, 1 Fabian Tillier 2 1 Argonne Na+onal Laboratory, Argonne, IL, USA 2

More information

Profiling with TAU. Le Yan. 6/6/2012 LONI Parallel Programming Workshop

Profiling with TAU. Le Yan. 6/6/2012 LONI Parallel Programming Workshop Profiling with TAU Le Yan 6/6/2012 LONI Parallel Programming Workshop 2012 1 Three Steps of Code Development Debugging Make sure the code runs and yields correct results Profiling Analyze the code to identify

More information

Profiling with TAU. Le Yan. User Services LSU 2/15/2012

Profiling with TAU. Le Yan. User Services LSU 2/15/2012 Profiling with TAU Le Yan User Services HPC @ LSU Feb 13-16, 2012 1 Three Steps of Code Development Debugging Make sure the code runs and yields correct results Profiling Analyze the code to identify performance

More information

Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector

Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector Intel Parallel Studio XE Cluster Edition - Intel MPI - Intel Traceanalyzer & Collector A brief Introduction to MPI 2 What is MPI? Message Passing Interface Explicit parallel model All parallelism is explicit:

More information

Performance Measurement

Performance Measurement ECPE 170 Jeff Shafer University of the Pacific Performance Measurement 2 Lab Schedule Ac?vi?es Today Background discussion Lab 5 Performance Measurement Wednesday Lab 5 Performance Measurement Friday Lab

More information

Lecture 4: Build Systems, Tar, Character Strings

Lecture 4: Build Systems, Tar, Character Strings CIS 330:! / / / / (_) / / / / _/_/ / / / / / \/ / /_/ / `/ \/ / / / _/_// / / / / /_ / /_/ / / / / /> < / /_/ / / / / /_/ / / / /_/ / / / / / \ /_/ /_/_/_/ _ \,_/_/ /_/\,_/ \ /_/ \ //_/ /_/ Lecture 4:

More information

Xeon Phi Native Mode - Sharpen Exercise

Xeon Phi Native Mode - Sharpen Exercise Xeon Phi Native Mode - Sharpen Exercise Fiona Reid, Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents June 19, 2015 1 Aims 1 2 Introduction 1 3 Instructions 2 3.1 Log into yellowxx

More information

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE Tutorial: Analyzing MPI Applications Intel Trace Analyzer and Collector Intel VTune Amplifier XE Contents Legal Information... 3 1. Overview... 4 1.1. Prerequisites... 5 1.1.1. Required Software... 5 1.1.2.

More information

SEDA An architecture for Well Condi6oned, scalable Internet Services

SEDA An architecture for Well Condi6oned, scalable Internet Services SEDA An architecture for Well Condi6oned, scalable Internet Services Ma= Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium on Operating Systems Principles (SOSP), October

More information

Performance Analysis with Vampir

Performance Analysis with Vampir Performance Analysis with Vampir Johannes Ziegenbalg Technische Universität Dresden Outline Part I: Welcome to the Vampir Tool Suite Event Trace Visualization The Vampir Displays Vampir & VampirServer

More information

An Introduc+on to OpenACC Part II

An Introduc+on to OpenACC Part II An Introduc+on to OpenACC Part II Wei Feinstein HPC User Services@LSU LONI Parallel Programming Workshop 2015 Louisiana State University 4 th HPC Parallel Programming Workshop An Introduc+on to OpenACC-

More information

RaceMob: Crowdsourced Data Race Detec,on

RaceMob: Crowdsourced Data Race Detec,on RaceMob: Crowdsourced Data Race Detec,on Baris Kasikci, Cris,an Zamfir, and George Candea School of Computer & Communica3on Sciences Data Races to shared memory loca,on By mul3ple threads At least one

More information

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ, Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - fabio.baruffa@lrz.de LRZ, 27.6.- 29.6.2016 Architecture Overview Intel Xeon Processor Intel Xeon Phi Coprocessor, 1st generation Intel Xeon

More information

CSE Opera,ng System Principles

CSE Opera,ng System Principles CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which

More information

CSE 451: Operating Systems. Sec$on 2 Interrupts, system calls, and project 1

CSE 451: Operating Systems. Sec$on 2 Interrupts, system calls, and project 1 CSE 451: Operating Systems Sec$on 2 Interrupts, system calls, and project 1 Interrupts Ü Interrupt Ü Hardware interrupts caused by devices signaling CPU Ü Excep$on Ü Uninten$onal sobware interrupt Ü Ex:

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi ACES Aus)n, TX Dec. 04 2013 Kent Milfeld, Luke Wilson, John McCalpin, Lars Koesterke TACC What is it? Co- processor PCI Express card Stripped down Linux opera)ng system Dense, simplified

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi MIC Training Event at TACC Lars Koesterke Xeon Phi MIC Xeon Phi = first product of Intel s Many Integrated Core (MIC) architecture Co- processor PCI Express card Stripped down Linux

More information

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:

More information

Xeon Phi Native Mode - Sharpen Exercise

Xeon Phi Native Mode - Sharpen Exercise Xeon Phi Native Mode - Sharpen Exercise Fiona Reid, Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents April 30, 2015 1 Aims The aim of this exercise is to get you compiling and

More information

Using Intel VTune Amplifier XE for High Performance Computing

Using Intel VTune Amplifier XE for High Performance Computing Using Intel VTune Amplifier XE for High Performance Computing Vladimir Tsymbal Performance, Analysis and Threading Lab 1 The Majority of all HPC-Systems are Clusters Interconnect I/O I/O... I/O I/O Message

More information

Opera&ng Systems: Principles and Prac&ce. Tom Anderson

Opera&ng Systems: Principles and Prac&ce. Tom Anderson Opera&ng Systems: Principles and Prac&ce Tom Anderson How This Course Fits in the UW CSE Curriculum CSE 333: Systems Programming Project experience in C/C++ How to use the opera&ng system interface CSE

More information

CSE Opera*ng System Principles

CSE Opera*ng System Principles CSE 30341 Opera*ng System Principles Overview/Introduc7on Syllabus Instructor: Chris*an Poellabauer (cpoellab@nd.edu) Course Mee*ngs TR 9:30 10:45 DeBartolo 101 TAs: Jian Yang, Josh Siva, Qiyu Zhi, Louis

More information

Intel VTune Amplifier XE

Intel VTune Amplifier XE Intel VTune Amplifier XE Vladimir Tsymbal Performance, Analysis and Threading Lab 1 Agenda Intel VTune Amplifier XE Overview Features Data collectors Analysis types Key Concepts Collecting performance

More information

Shared- Memory Programming in OpenMP Advanced Research Computing

Shared- Memory Programming in OpenMP Advanced Research Computing Shared- Memory Programming in OpenMP Advanced Research Computing Outline What is OpenMP? How does OpenMP work? Architecture Fork- join model of parallelism Communica:on OpenMP constructs Direc:ves Run:me

More information

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino Performance analysis tools: Intel VTuneTM Amplifier and Advisor Dr. Luigi Iapichino luigi.iapichino@lrz.de Which tool do I use in my project? A roadmap to optimisation After having considered the MPI layer,

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi IXPUG 14 Lars Koesterke Acknowledgements Thanks/kudos to: Sponsor: National Science Foundation NSF Grant #OCI-1134872 Stampede Award, Enabling, Enhancing, and Extending Petascale

More information

Profiling & Tuning Applica1ons. CUDA Course July István Reguly

Profiling & Tuning Applica1ons. CUDA Course July István Reguly Profiling & Tuning Applica1ons CUDA Course July 21-25 István Reguly Introduc1on Why is my applica1on running slow? Work it out on paper Instrument code Profile it NVIDIA Visual Profiler Works with CUDA,

More information

Cellular Networks and Mobile Compu5ng COMS , Spring 2012

Cellular Networks and Mobile Compu5ng COMS , Spring 2012 Cellular Networks and Mobile Compu5ng COMS 6998-8, Spring 2012 Instructor: Li Erran Li (lierranli@cs.columbia.edu) hkp://www.cs.columbia.edu/~coms6998-8/ 2/27/2012: Radio Resource Usage Profiling and Op5miza5on

More information

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia.

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://www.cs.columbia. W1005 Intro to CS and Programming in MATLAB Brief History of Compu?ng Fall 2014 Instructor: Ilia Vovsha hip://www.cs.columbia.edu/~vovsha/w1005 Computer Philosophy Computer is a (electronic digital) device

More information

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505

Computer Architecture: Mul1ple Issue. Berk Sunar and Thomas Eisenbarth ECE 505 Computer Architecture: Mul1ple Issue Berk Sunar and Thomas Eisenbarth ECE 505 Outline 5 stages of RISC Type of hazards Sta@c and Dynamic Branch Predic@on Pipelining with Excep@ons Pipelining with Floa@ng-

More information

Spa$al Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Spa$al Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University Spa$al Analysis and Modeling (GIST 432/532) Guofeng Cao Department of Geosciences Texas Tech University Representa$on of Spa$al Data Representa$on of Spa$al Data Models Object- based model: treats the

More information

Introduction to Performance Tuning & Optimization Tools

Introduction to Performance Tuning & Optimization Tools Introduction to Performance Tuning & Optimization Tools a[i] a[i+1] + a[i+2] a[i+3] b[i] b[i+1] b[i+2] b[i+3] = a[i]+b[i] a[i+1]+b[i+1] a[i+2]+b[i+2] a[i+3]+b[i+3] Ian A. Cosden, Ph.D. Manager, HPC Software

More information

IBM High Performance Computing Toolkit

IBM High Performance Computing Toolkit IBM High Performance Computing Toolkit Pidad D'Souza (pidsouza@in.ibm.com) IBM, India Software Labs Top 500 : Application areas (November 2011) Systems Performance Source : http://www.top500.org/charts/list/34/apparea

More information

HPC Tools on Windows. Christian Terboven Center for Computing and Communication RWTH Aachen University.

HPC Tools on Windows. Christian Terboven Center for Computing and Communication RWTH Aachen University. - Excerpt - Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University PPCES March 25th, RWTH Aachen University Agenda o Intel Trace Analyzer and Collector

More information

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein

Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number

More information

Allinea Unified Environment

Allinea Unified Environment Allinea Unified Environment Allinea s unified tools for debugging and profiling HPC Codes Beau Paisley Allinea Software bpaisley@allinea.com 720.583.0380 Today s Challenge Q: What is the impact of current

More information

Microarchitectural Analysis with Intel VTune Amplifier XE

Microarchitectural Analysis with Intel VTune Amplifier XE Microarchitectural Analysis with Intel VTune Amplifier XE Michael Klemm Software & Services Group Developer Relations Division 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Accelerate HPC Development with Allinea Performance Tools

Accelerate HPC Development with Allinea Performance Tools Accelerate HPC Development with Allinea Performance Tools 19 April 2016 VI-HPS, LRZ Florent Lebeau / Ryan Hulguin flebeau@allinea.com / rhulguin@allinea.com Agenda 09:00 09:15 Introduction 09:15 09:45

More information

Systems Programming/ C and UNIX

Systems Programming/ C and UNIX Systems Programming/ C and UNIX Alice E. Fischer Lecture 6: Processes October 9, 2017 Alice E. FischerLecture 6: Processes Lecture 5: Processes... 1/26 October 9, 2017 1 / 26 Outline 1 Processes 2 Process

More information

Meteorology 5344, Fall 2017 Computational Fluid Dynamics Dr. M. Xue. Computer Problem #l: Optimization Exercises

Meteorology 5344, Fall 2017 Computational Fluid Dynamics Dr. M. Xue. Computer Problem #l: Optimization Exercises Meteorology 5344, Fall 2017 Computational Fluid Dynamics Dr. M. Xue Computer Problem #l: Optimization Exercises Due Thursday, September 19 Updated in evening of Sept 6 th. Exercise 1. This exercise is

More information

Intel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division

Intel VTune Amplifier XE. Dr. Michael Klemm Software and Services Group Developer Relations Division Intel VTune Amplifier XE Dr. Michael Klemm Software and Services Group Developer Relations Division Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS

More information

CSE Compilers. Reminders/ Announcements. Lecture 15: Seman9c Analysis, Part III Michael Ringenburg Winter 2013

CSE Compilers. Reminders/ Announcements. Lecture 15: Seman9c Analysis, Part III Michael Ringenburg Winter 2013 CSE 401 - Compilers Lecture 15: Seman9c Analysis, Part III Michael Ringenburg Winter 2013 Winter 2013 UW CSE 401 (Michael Ringenburg) Reminders/ Announcements Project Part 2 due Wednesday Midterm Friday

More information

Running LAMMPS on CC servers at IITM

Running LAMMPS on CC servers at IITM Running LAMMPS on CC servers at IITM Srihari Sundar September 9, 2016 This tutorial assumes prior knowledge about LAMMPS [2, 1] and deals with running LAMMPS scripts on the compute servers at the computer

More information

Macro Assembler. Defini3on from h6p://www.computeruser.com

Macro Assembler. Defini3on from h6p://www.computeruser.com The Macro Assembler Macro Assembler Defini3on from h6p://www.computeruser.com A program that translates assembly language instruc3ons into machine code and which the programmer can use to define macro

More information

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH Cli$anger: Scaling Performance Cliffs in Memory Caches [NSDI 2016] Cache OS: Data Center Dynamic Cache Management Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH 1 Key-Value Caches are Essen1al

More information

Introduc)on to High Performance Compu)ng Advanced Research Computing

Introduc)on to High Performance Compu)ng Advanced Research Computing Introduc)on to High Performance Compu)ng Advanced Research Computing Outline What cons)tutes high performance compu)ng (HPC)? When to consider HPC resources What kind of problems are typically solved?

More information

PhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015.

PhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015. PhD in Computer And Control Engineering XXVII cycle Torino February 27th, 2015. Parallel and reconfigurable systems are more and more used in a wide number of applica7ons and environments, ranging from

More information

Practical Introduction to

Practical Introduction to 1 2 Outline of the workshop Practical Introduction to What is ScaleMP? When do we need it? How do we run codes on the ScaleMP node on the ScaleMP Guillimin cluster? How to run programs efficiently on ScaleMP?

More information

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2 Illiac UPCRC Petascale computing Gigascale System Research Center Cloud Computing Testbed (CCT) 2 www.parallel.illinois.edu Mul2 Core: All Computers Are Now Parallel We con'nue to have more transistors

More information

Op#miza#on & Scalability

Op#miza#on & Scalability Op#miza#on & Scalability Carlos Rosales carlos@tacc.utexas.edu September 20 th, 2013 Parallel Compu#ng in Stampede What this talk is about Highlight main performance and scalability bo5lenecks Simple but

More information

Performance Op>miza>on

Performance Op>miza>on ECPE 170 Jeff Shafer University of the Pacific Performance Op>miza>on 2 Lab Schedule This Week Ac>vi>es Background discussion Lab 5 Performance Measurement Lab 6 Performance Op;miza;on Lab 5 Assignments

More information

Kaseya Fundamentals Workshop DAY FOUR. Developed by Kaseya University. Powered by IT Scholars

Kaseya Fundamentals Workshop DAY FOUR. Developed by Kaseya University. Powered by IT Scholars Kaseya Fundamentals Workshop DAY FOUR Developed by Kaseya University Powered by IT Scholars Kaseya Version 6.5 Last updated March, 2014 Day Three Review State Based Monitoring Event Based Monitoring Monitoring

More information

CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY

CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY CONTAINERIZING JOBS ON THE ACCRE CLUSTER WITH SINGULARITY VIRTUAL MACHINE (VM) Uses so&ware to emulate an en/re computer, including both hardware and so&ware. Host Computer Virtual Machine Host Resources:

More information

Fluxo. Improving the Responsiveness of Internet Services with Automa7c Cache Placement

Fluxo. Improving the Responsiveness of Internet Services with Automa7c Cache Placement Fluxo Improving the Responsiveness of Internet Services with Automac Cache Placement Alexander Rasmussen UCSD (Presenng) Emre Kiciman MSR Redmond Benjamin Livshits MSR Redmond Madanlal Musuvathi MSR Redmond

More information

Enterprise Architecture CS 4720 Web & Mobile Systems

Enterprise Architecture CS 4720 Web & Mobile Systems Enterprise Architecture Web & Mobile Systems The Concept of a Web Service Each service is built around a func=on/feature That func=on is surrounded by a specified set of protocols (SOAP, POX, WSDL, WSD,

More information

Performance Analysis with Vampir

Performance Analysis with Vampir Performance Analysis with Vampir Ronny Brendel Technische Universität Dresden Outline Part I: Welcome to the Vampir Tool Suite Mission Event Trace Visualization Vampir & VampirServer The Vampir Displays

More information

7- Reliability and performance

7- Reliability and performance 7- Reliability and performance (Herramientas Computacionales Avanzadas para la Inves6gación Aplicada) Rafael Palacios, Jaime Boal Contents Implemen3ng computa3onal tools 1. So:ware Reliability 2. So:ware

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

Message Passing Interface (MPI) on Intel Xeon Phi coprocessor

Message Passing Interface (MPI) on Intel Xeon Phi coprocessor Message Passing Interface (MPI) on Intel Xeon Phi coprocessor Special considerations for MPI on Intel Xeon Phi and using the Intel Trace Analyzer and Collector Gergana Slavova gergana.s.slavova@intel.com

More information

more uml: sequence & use case diagrams

more uml: sequence & use case diagrams more uml: sequence & use case diagrams uses of uml as a sketch: very selec)ve informal and dynamic forward engineering: describe some concept you need to implement reverse engineering: explain how some

More information

CS370 Opera;ng Systems Midterm Review. Yashwant K Malaiya Spring 2018

CS370 Opera;ng Systems Midterm Review. Yashwant K Malaiya Spring 2018 CS370 Opera;ng Systems Midterm Review Yashwant K Malaiya Spring 2018 1 1 Computer System Structures Computer System Opera2on Stack for calling func2ons (subrou2nes) I/O Structure: polling, interrupts,

More information

Tool for Analysing and Checking MPI Applications

Tool for Analysing and Checking MPI Applications Tool for Analysing and Checking MPI Applications April 30, 2010 1 CONTENTS CONTENTS Contents 1 Introduction 3 1.1 What is Marmot?........................... 3 1.2 Design of Marmot..........................

More information

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology Introduction to the SHARCNET Environment 2010-May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology available hardware and software resources our web portal

More information

R- installation and adminstration under Linux for dummie

R- installation and adminstration under Linux for dummie R- installation and adminstration under Linux for dummies University of British Columbia Nov 8, 2012 Outline 1. Basic introduction of Linux Why Linux (department servers)? Some terminology Tools for windows

More information

Debugging with TotalView

Debugging with TotalView Debugging with TotalView Le Yan HPC Consultant User Services Goals Learn how to start TotalView on Linux clusters Get familiar with TotalView graphic user interface Learn basic debugging functions of TotalView

More information

Linux Fundamentals (L-120)

Linux Fundamentals (L-120) Linux Fundamentals (L-120) Modality: Virtual Classroom Duration: 5 Days SUBSCRIPTION: Master, Master Plus About this course: This is a challenging course that focuses on the fundamental tools and concepts

More information

Op#miza#on & Scalability

Op#miza#on & Scalability Op#miza#on & Scalability Carlos Rosales carlos@tacc.utexas.edu May 5 th, 2015 Parallel Compu#ng in Stampede What this talk is about Highlight main performance and scalability bo5lenecks Simple but efficient

More information

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line

More information

7 Ways to Increase Your Produc2vity with Revolu2on R Enterprise 3.0. David Smith, REvolu2on Compu2ng

7 Ways to Increase Your Produc2vity with Revolu2on R Enterprise 3.0. David Smith, REvolu2on Compu2ng 7 Ways to Increase Your Produc2vity with Revolu2on R Enterprise 3.0 David Smith, REvolu2on Compu2ng REvolu2on Compu2ng: The R Company REvolu2on R Free, high- performance binary distribu2on of R REvolu2on

More information

Introduction to Parallel Performance Engineering

Introduction to Parallel Performance Engineering Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:

More information

High Performance Beowulf Cluster Environment User Manual

High Performance Beowulf Cluster Environment User Manual High Performance Beowulf Cluster Environment User Manual Version 3.1c 2 This guide is intended for cluster users who want a quick introduction to the Compusys Beowulf Cluster Environment. It explains how

More information

Ge#ng Started with Automa3c Compiler Vectoriza3on. David Apostal UND CSci 532 Guest Lecture Sept 14, 2017

Ge#ng Started with Automa3c Compiler Vectoriza3on. David Apostal UND CSci 532 Guest Lecture Sept 14, 2017 Ge#ng Started with Automa3c Compiler Vectoriza3on David Apostal UND CSci 532 Guest Lecture Sept 14, 2017 Parallellism is Key to Performance Types of parallelism Task-based (MPI) Threads (OpenMP, pthreads)

More information

Autonomic Mul,- Agents Security System for mul,- layered distributed architectures. Chris,an Contreras

Autonomic Mul,- Agents Security System for mul,- layered distributed architectures. Chris,an Contreras Autonomic Mul,- s Security System for mul,- layered distributed architectures Chris,an Contreras Agenda Introduc,on Mul,- layered distributed architecture Autonomic compu,ng system Mul,- System (MAS) Autonomic

More information

Lecture 17 Java Remote Method Invoca/on

Lecture 17 Java Remote Method Invoca/on CMSC 433 Fall 2014 Sec/on 0101 Mike Hicks (slides due to Rance Cleaveland) Lecture 17 Java Remote Method Invoca/on 11/4/2014 2012-14 University of Maryland 0 Recall Concurrency Several opera/ons may be

More information

What makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012

What makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012 What makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012 Who are we? Chris/an Campo How is so:ware experienced by end- users? What is Usability?

More information

Effec%ve So*ware. Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on. David Šišlák

Effec%ve So*ware. Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on. David Šišlák Effec%ve So*ware Lecture 9: JVM - Memory Analysis, Data Structures, Object Alloca=on David Šišlák david.sislak@fel.cvut.cz JVM Performance Factors and Memory Analysis» applica=on performance factors total

More information

Introduction to Linux

Introduction to Linux Introduction to Linux Prof. Jin-Soo Kim( jinsookim@skku.edu) TA Sanghoon Han(sanghoon.han@csl.skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Announcement (1) Please come

More information

ECS 165B: Database System Implementa6on Lecture 14

ECS 165B: Database System Implementa6on Lecture 14 ECS 165B: Database System Implementa6on Lecture 14 UC Davis April 28, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke, as well as slides by Zack Ives. Class Agenda

More information

OPT User Guide. Version 1.4

OPT User Guide. Version 1.4 Version 1.4 Table of Contents 1 Introduction...1 1.1 Reading this User Guide...2 2 Preparing Your Application...4 2.1 MPI Profiling...4 2.1.1 Supported MPI Libraries...4 2.1.2 Compiling For Shared MPI

More information

Lecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda

Lecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda Lecture 2: Processes CSE 120: Principles of Opera9ng Systems UC San Diego: Summer Session I, 2009 Frank Uyeda Announcements PeerWise accounts are now live. First PeerWise ques9ons/reviews due tomorrow

More information

How to get Access to Shaheen2? Bilel Hadri Computational Scientist KAUST Supercomputing Core Lab

How to get Access to Shaheen2? Bilel Hadri Computational Scientist KAUST Supercomputing Core Lab How to get Access to Shaheen2? Bilel Hadri Computational Scientist KAUST Supercomputing Core Lab Live Survey Please login with your laptop/mobile h#p://'ny.cc/kslhpc And type the code VF9SKGQ6 http://hpc.kaust.edu.sa

More information

Crea?ng Cloud Apps with Oracle Applica?on Builder Cloud Service

Crea?ng Cloud Apps with Oracle Applica?on Builder Cloud Service Crea?ng Cloud Apps with Oracle Applica?on Builder Cloud Service Shay Shmeltzer Director of Product Management Oracle Development Tools and Frameworks @JDevShay hpp://blogs.oracle.com/shay This App you

More information

Performance Analysis with Vampir. Joseph Schuchart ZIH, Technische Universität Dresden

Performance Analysis with Vampir. Joseph Schuchart ZIH, Technische Universität Dresden Performance Analysis with Vampir Joseph Schuchart ZIH, Technische Universität Dresden 1 Mission Visualization of dynamics of complex parallel processes Full details for arbitrary temporal and spatial levels

More information

Dynamic Web Development

Dynamic Web Development Dynamic Web Development Produced by David Drohan (ddrohan@wit.ie) Department of Computing & Mathematics Waterford Institute of Technology http://www.wit.ie MODULES, VIEWS, CONTROLLERS & ROUTES PART 2 Sec8on

More information

Parallel Job Support in the Spanish NGI! Enol Fernández del Cas/llo Ins/tuto de Física de Cantabria (IFCA) Spain

Parallel Job Support in the Spanish NGI! Enol Fernández del Cas/llo Ins/tuto de Física de Cantabria (IFCA) Spain Parallel Job Support in the Spanish NGI! Enol Fernández del Cas/llo Ins/tuto de Física de Cantabria (IFCA) Spain Introduction (I)! Parallel applica/ons are common in clusters and HPC systems Grid infrastructures

More information

Performance Tools for Technical Computing

Performance Tools for Technical Computing Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology

More information

Lecture 1 Introduc-on

Lecture 1 Introduc-on Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert

More information

Virtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons.

Virtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons. Virtualization Yifu Rong Introduction Virtualiza5on provide an abstract environment to run applica5ons. Virtualiza5on technologies have a long trail in the history of computer science. Why we interested?

More information

Intel Parallel Studio XE 2016

Intel Parallel Studio XE 2016 Intel Parallel Studio XE 2016 Installation Guide for Linux* OS 18 August 2015 Contents 1 Introduction...2 2 Prerequisites...2 3 Installation...6 3.1 Using Online Installer...6 3.2 Installation Through

More information

Performance Analysis of Parallel Scientific Applications In Eclipse

Performance Analysis of Parallel Scientific Applications In Eclipse Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains

More information

VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW

VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW 8th VI-HPS Tuning Workshop at RWTH Aachen September, 2011 Tobias Hilbrich and Joachim Protze Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität

More information

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab Designing experiments Performing experiments in Java Intel s Manycore Testing Lab High quality results that capture, e.g., How an algorithm scales Which of several algorithms performs best Pretty graphs

More information

Integra(ng an open source dynamic river model in hydrology modeling frameworks

Integra(ng an open source dynamic river model in hydrology modeling frameworks Integra(ng an open source dynamic river model in hydrology modeling frameworks Simula(on of Guadalupe and San Antonio River basin during a flood event with 1.3 x 10 5 computa(onal nodes at 100 m resolu(on.

More information

Objec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI

Objec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI Intro to IDA Pro 31/15 Objec0ves Gain understanding of what IDA Pro is and what it can do Expose students to the tool GUI Discuss some of the important func

More information

Urb- IoT Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home

Urb- IoT Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home Urb- IoT 2014 Developing a RESTful Communica>on Protocol and an Energy Op>miza>on Algorithm for a Connected Sustainable Home So$rios D. Kotsopoulos, Federico Casalegno, Wesley Graybill, Adrià Recasens

More information