Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization
|
|
- Georgiana Barrett
- 6 years ago
- Views:
Transcription
1 Interactive Analysis of Large Distributed Systems with Scalable Topology-based Visualization Lucas M. Schnorr, Arnaud Legrand, and Jean-Marc Vincent Firstname.Lastname@imag.fr Laboratoire d Informatique de Grenoble, University Grenoble-Alpes, France Austin, 2013, April / 25
2 Outline 1 Large Applications Analysis 2 The Scalable Topology-based Visualization 3 Analysis on Case Studies 4 Synthesis 2 / 25
3 Scalable Topology-based Visualization 1 Large Applications Analysis 2 The Scalable Topology-based Visualization 3 Analysis on Case Studies 4 Synthesis 3 / 25
4 Large-Scale Distributed Systems Large-scale parallel and distributed systems are in production today HPC (clusters, petascale systems, soon exascale...) Grid platforms Peer-to-peer file sharing Distributed volunteer computing Cloud Computing 562,960 cores Complex platforms with many open issues resource discovery and monitoring resource & data management energy consumption reduction resource economics application scheduling fault-tolerance and availability scalability and performance decentralized algorithms 4 / 25
5 Large-Scale Distributed Systems Large-scale parallel and distributed systems are in production today HPC (clusters, petascale systems, soon exascale...) Grid platforms Peer-to-peer file sharing Distributed volunteer computing Cloud Computing Complex platforms with many open issues resource discovery and monitoring resource & data management energy consumption reduction resource economics application scheduling fault-tolerance and availability scalability and performance decentralized algorithms 4 / 25
6 Large-Scale Distributed Systems Large-scale parallel and distributed systems are in production today HPC (clusters, petascale systems, soon exascale...) Grid platforms Peer-to-peer file sharing Distributed volunteer computing Cloud Computing Complex platforms with many open issues resource discovery and monitoring resource & data management energy consumption reduction resource economics application scheduling fault-tolerance and availability scalability and performance decentralized algorithms 4 / 25
7 Large-Scale Distributed Systems Large-scale parallel and distributed systems are in production today HPC (clusters, petascale systems, soon exascale...) Grid platforms Peer-to-peer file sharing Distributed volunteer computing Cloud Computing 540,000+ hosts 7.3+ PetaFLOPS Complex platforms with many open issues resource discovery and monitoring resource & data management energy consumption reduction resource economics application scheduling fault-tolerance and availability scalability and performance decentralized algorithms 4 / 25
8 Large Applications Analysis The Scalable Topology-based Visualization Analysis on Case Studies Synthesis Large-Scale Distributed Systems Large-scale parallel and distributed systems are in production today Google Data Center HPC (clusters, petascale systems, soon exascale...) Grid platforms Peer-to-peer file sharing Distributed volunteer computing Cloud Computing Complex platforms with many open issues resource discovery and monitoring application scheduling resource & data management fault-tolerance and availability energy consumption reduction scalability and performance resource economics decentralized algorithms 4 / 25
9 Large Applications Analysis The Scalable Topology-based Visualization Analysis on Case Studies Synthesis Large-Scale Distributed Systems Large-scale parallel and distributed systems are in production today Google Data Center HPC (clusters, petascale systems, soon exascale...) Grid platforms Peer-to-peer file sharing Distributed volunteer computing Cloud Computing Complex platforms with many open issues resource discovery and monitoring application scheduling resource & data management fault-tolerance and availability energy consumption reduction scalability and performance resource economics decentralized algorithms 4 / 25
10 Large Applications Analysis The Scalable Topology-based Visualization Analysis on Case Studies Synthesis Large-Scale Distributed Systems Large-scale parallel and distributed systems are in production today Google Data Center HPC (clusters, petascale systems, soon exascale...) Grid platforms Peer-to-peer file sharing Distributed volunteer computing Cloud Computing Complex platforms with many open issues resource discovery and monitoring application scheduling resource & data management fault-tolerance and availability energy consumption reduction scalability and performance resource economics decentralized algorithms 4 / 25
11 Large-Scale Distributed Systems Such applications and systems deserve very advanced analysis Their debugging and tuning are technically difficult Their use induce high methodological challenges Some important factors Very large number of components (Hundreds to millions of processing units) Organization in large hierarchical networks Application Performance depends on Resource characteristics (bandwidth, latency, power) Network Topology Keep in mind data locality, avoiding backbone congestion Or heavily use backbones if local networks are slow Observe intra and inter-application congestion Necessity for well-suited analysis tools Explicitly showing where the congestion is The moment (time) and the network links (space) 5 / 25
12 Large-Scale Distributed Systems Such applications and systems deserve very advanced analysis Their debugging and tuning are technically difficult Their use induce high methodological challenges Some important factors Very large number of components (Hundreds to millions of processing units) Organization in large hierarchical networks Application Performance depends on Resource characteristics (bandwidth, latency, power) Network Topology Keep in mind data locality, avoiding backbone congestion Or heavily use backbones if local networks are slow Observe intra and inter-application congestion Necessity for well-suited analysis tools Explicitly showing where the congestion is The moment (time) and the network links (space) 5 / 25
13 Analysis of Large-Scale Distributed Systems Behavior Textual or statistical analysis of traces is usually not sufficient (Topology is not depicted at all) Performance Visualization Analysis: Event Traces are Visually Depicted Gantt-charts :Detailed visualization of causality Space (vertical dimension) shows resources Time (horizontal dimension) show behavior Several tools have Gantt-Charts (to cite a few) Projections (Urbana), Jumpshot (among others) Vampir (Dresden), Paje (Grenoble), Paraver (Barcelona) Difficulty: lacks topological data (or flat representation of the network) Fixed Topology for HPC supercomputers Shows profiling information on the topology Tools: Scalasca (Juelich), ParaProf (Portland),... Difficulty: Visualization scalability? No temporal data. Topological Analysis for Distributed Systems Shows topology with application behavior Tools: Overview Difficulty: scales only to a few dozens of nodes 6 / 25
14 Our contribution Scalable technique for Interactive Visualization Analysis Methodology Multi-scale data aggregation Correlate application behavior with network topology and scale up to thousands of nodes Let analyst decide the aggregation policy Both in time and space Dynamic and interactive graph layout Force-directed algorithm (Barnes-Hut) User interacts with the graph Quick demonstration of the visualization 7 / 25
15 Scalable Topology-based Visualization 1 Large Applications Analysis 2 The Scalable Topology-based Visualization 3 Analysis on Case Studies 4 Synthesis 8 / 25
16 Mapping Trace Metrics to the Graph Nodes Nodes of the graph : Model monitored entities network links, computing nodes, routers,... Edges of the graph : Model a relationship between two monitored entities communications, synchronizations,... Attributes of nodes : Geometric shapes: Hosts are squares, Links are diamonds, Routers are dots Sizes: Depending on trace metrics associated to the node Filling: decomposition of the metric, qualitative information (color), / 25
17 Mapping Trace Metrics to the Graph Example Three different mappings (at timestamps A, B and C) Left shows the trace metrics associated Right shows the corresponding graph visualizations Filling is used as resource utilization Analysing an application timestamp by timestamp? not really useful, not scalable 10 / 25
18 Mapping Trace Metrics to the Graph Example Three different mappings (at timestamps A, B and C) Left shows the trace metrics associated Right shows the corresponding graph visualizations Filling is used as resource utilization Analysing an application timestamp by timestamp? not really useful, not scalable 10 / 25
19 Multi-Scale Data Aggregation Constraints Large amount of trace data Time (microseconds events for long observation periods) Space (structured large-scale hierarchy) Particular behavior and anomalies Appear on different scales Network congestion: Only a part of the topology presents contention Global temporal network congestion A network link is always the contention We need to easily navigate in all scales of trace data in time and space dimensions 11 / 25
20 Multi-Scale Data Aggregation How it works? Assume we measure a given quantity ρ on each resource: { R T R ρ : (r, t) ρ(r, t) Example : ρ(r, t) is the computing power of resource r at time t Assume we can define a neighborhood N Γ, (r, t) (eg hyperrectangle) Then define an approximation F of ρ on N Γ, by R T R F Γ, : (r, t) ρ(r, t )dr dt (1) Remarks : N Γ, (r,t) A summation of the quantity ρ in the neighborhood Γ and Correctly and dynamically deciding the neighborhood is a key that is left to the analyst that dynamically adjust the neiborhood parameter according to the visualization 12 / 25
21 Multi-Scale Data Aggregation How it works? Assume we measure a given quantity ρ on each resource: { R T R ρ : (r, t) ρ(r, t) Example : ρ(r, t) is the computing power of resource r at time t Assume we can define a neighborhood N Γ, (r, t) (eg hyperrectangle) Then define an approximation F of ρ on N Γ, by R T R F Γ, : (r, t) ρ(r, t )dr dt (1) Remarks : N Γ, (r,t) A summation of the quantity ρ in the neighborhood Γ and Correctly and dynamically deciding the neighborhood is a key that is left to the analyst that dynamically adjust the neiborhood parameter according to the visualization 12 / 25
22 Temporal Data Aggregation Temporal neighborhood is a time slice For example, considering a time slice from A 1 to A 2 Trace metrics, temporally aggregated, are mapped to HostA Possible operations Slide the time-slice Animation in time Increase and decrease its size looking for other scales In our approach Analyst can interactively select the time slice Visual feedback is instantaneous 13 / 25
23 Temporal Data Aggregation Temporal neighborhood is a time slice For example, considering a time slice from A 1 to A 2 Trace metrics, temporally aggregated, are mapped to HostA Possible operations Slide the time-slice Animation in time Increase and decrease its size looking for other scales In our approach Analyst can interactively select the time slice Visual feedback is instantaneous 13 / 25
24 Temporal Data Aggregation Temporal neighborhood is a time slice For example, considering a time slice from A 1 to A 2 Trace metrics, temporally aggregated, are mapped to HostA Possible operations Slide the time-slice Animation in time Increase and decrease its size looking for other scales In our approach Analyst can interactively select the time slice Visual feedback is instantaneous 13 / 25
25 Spatial Data Aggregation Spatial neighborhood is the resource hierarchy Grid, Cluster, Machine, Processor, Core Cluster, Rack, U, Processor, Core For example, considering two groups of resources Two spatial aggregation steps In our approach Analyst chooses the level of detail by clicking on nodes Visual feedback is instantaneous (but challenging) How to represent a few thousands nodes interactively? Force-directed graph layout 14 / 25
26 Spatial Data Aggregation Spatial neighborhood is the resource hierarchy Grid, Cluster, Machine, Processor, Core Cluster, Rack, U, Processor, Core For example, considering two groups of resources Two spatial aggregation steps In our approach Analyst chooses the level of detail by clicking on nodes Visual feedback is instantaneous (but challenging) How to represent a few thousands nodes interactively? Force-directed graph layout 14 / 25
27 Spatial Data Aggregation Spatial neighborhood is the resource hierarchy Grid, Cluster, Machine, Processor, Core Cluster, Rack, U, Processor, Core For example, considering two groups of resources Two spatial aggregation steps In our approach Analyst chooses the level of detail by clicking on nodes Visual feedback is instantaneous (but challenging) How to represent a few thousands nodes interactively? Force-directed graph layout 14 / 25
28 Implementing Interactivity Force-directed layout Force-directed parameters Example Barnes-hut algorithm for graph layout Coulomb s repulsion law All nodes have a (positive) charge Hooke s attraction attraction Connected nodes have an attraction force (spring) Changing charge and spring parameters 15 / 25
29 Implementing Interactivity Multiple Scales Dealing with multiple scales Example Changing the mapping to the visualization scale Note that from situation B to C, time-slice is the same But nodes size are different 16 / 25
30 Scalable Topology-based Visualization 1 Large Applications Analysis 2 The Scalable Topology-based Visualization 3 Analysis on Case Studies 4 Synthesis 17 / 25
31 Case Studies and Results Applications NAS-DT Benchmark Analysis Traces obtained with the Simulated MPI (SMPI) tool Non-cooperative Master Worker Applications Traces from the SimGrid toolkit Objective of the evaluation Illustrate the potential of the approach Pin-point network contention Evaluate load balancing Trace metrics mapping Host are squares Network links are diamonds Filling is resource utilization 18 / 25
32 NAS-DT Benchmark Analysis Scenario configuration Analysis NAS-DT Class A White Hole algorithm Two clusters: Adonis (left) and Griffon (right) Processes are allocated sequentially starting at Adonis Central backbone is always the point of contention Changing to smaller time-slices allows us to observe that Possible explanation: bad process deployment of MPI 19 / 25
33 NAS-DT Benchmark Analysis Scenario configuration Analysis NAS-DT Class A White Hole algorithm Two clusters: Adonis (left) and Griffon (right) Processes are allocated sequentially starting at Adonis Central backbone is always the point of contention Changing to smaller time-slices allows us to observe that Possible explanation: bad process deployment of MPI 19 / 25
34 NAS-DT Benchmark Analysis Scenario configuration Analysis NAS-DT Class A White Hole algorithm Two clusters: Adonis (left) and Griffon (right) Processes are allocated sequentially starting at Adonis Central backbone is always the point of contention Changing to smaller time-slices allows us to observe that Possible explanation: bad process deployment of MPI 19 / 25
35 NAS-DT Benchmark Analysis (Best Deployment) New execution with a better deployment Analysis Exploit locality, avoid backbone link Central backbone is no longer a contention point Smaller time-slices shows that it is only at the beginning Reduction of execution time of approximately 20% 20 / 25
36 NAS-DT Benchmark Analysis (Best Deployment) New execution with a better deployment Analysis Exploit locality, avoid backbone link Central backbone is no longer a contention point Smaller time-slices shows that it is only at the beginning Reduction of execution time of approximately 20% 20 / 25
37 NAS-DT Benchmark Analysis (Best Deployment) New execution with a better deployment Analysis Exploit locality, avoid backbone link Central backbone is no longer a contention point Smaller time-slices shows that it is only at the beginning Reduction of execution time of approximately 20% 20 / 25
38 Non-Cooperative Master Worker Applications Scenario configuration Two master-worker applications competing for resources Based on a realistic model of the Grid5000 platform 2170 computing hosts + network links and routers Each application uses a bandwidth-centric optimal strategy Tasks layout for each application First application: tasks are CPU bound cheap to transfer, hard to calculate Second application : tasks are network bound tasks are a little harder to transfer Mapping trace metrics Each application has its own resource utilization variable First application (light gray) Second application (dark gray) Three Expected Behaviors First application should achieve better resource utilization Form of locality from the second application Applications may interfere on computing resources 21 / 25
39 Non-Cooperative Master Worker Applications Scenario configuration Two master-worker applications competing for resources Based on a realistic model of the Grid5000 platform 2170 computing hosts + network links and routers Each application uses a bandwidth-centric optimal strategy Tasks layout for each application First application: tasks are CPU bound cheap to transfer, hard to calculate Second application : tasks are network bound tasks are a little harder to transfer Mapping trace metrics Each application has its own resource utilization variable First application (light gray) Second application (dark gray) Three Expected Behaviors First application should achieve better resource utilization Form of locality from the second application Applications may interfere on computing resources 21 / 25
40 Non-Cooperative Applications Spatial Fixed temporal aggregation, changing spatial aggregation From left to right Hosts (no aggregation), Clusters, Sites, Grid (full aggregation) Hosts Clusters Sites Grid Analysis Host level is too cumbersome Cluster, resource usage favors the CPU-bound application Site, enables to quantify perfectly how much it is Grid, gives the overview for this particular time slice 22 / 25
41 Non-Cooperative Applications Spatial Fixed temporal aggregation, changing spatial aggregation From left to right Hosts (no aggregation), Clusters, Sites, Grid (full aggregation) Hosts Clusters Sites Grid Analysis Host level is too cumbersome Cluster, resource usage favors the CPU-bound application Site, enables to quantify perfectly how much it is Grid, gives the overview for this particular time slice 22 / 25
42 Non-Cooperative Applications Temporal Clusters Fixed spatial aggregation to Cluster level From left to right, time slice sliding forward These are actually screenshots from an animation t 0 t 1 t 2 t 3 time slice time slice time slice time slice A B C Sites Grid Analysis Resource usage is not uniform in the platform for the CPU-bound app t 3 shows cluster A full of tasks, but not cluster B Site B is full of tasks at t 2 while Site C has to wait until time slice t 3 Possible explanation: single master process, big platform 23 / 25
43 Non-Cooperative Applications Temporal Clusters Fixed spatial aggregation to Cluster level From left to right, time slice sliding forward These are actually screenshots from an animation t 0 t 1 t 2 t 3 time slice time slice time slice time slice A B C Sites Grid Analysis Resource usage is not uniform in the platform for the CPU-bound app t 3 shows cluster A full of tasks, but not cluster B Site B is full of tasks at t 2 while Site C has to wait until time slice t 3 Possible explanation: single master process, big platform 23 / 25
44 Non-Cooperative Applications Temporal Clusters Fixed spatial aggregation to Cluster level From left to right, time slice sliding forward These are actually screenshots from an animation t 0 t 1 t 2 t 3 time slice time slice time slice time slice A B C Sites Grid Analysis Resource usage is not uniform in the platform for the CPU-bound app t 3 shows cluster A full of tasks, but not cluster B Site B is full of tasks at t 2 while Site C has to wait until time slice t 3 Possible explanation: single master process, big platform 23 / 25
45 Scalable Topology-based Visualization 1 Large Applications Analysis 2 The Scalable Topology-based Visualization 3 Analysis on Case Studies 4 Synthesis 24 / 25
46 Synthesis and Open Questions Proposition: Multi-scale aggregation with dynamic graph layout Interactive, analyst can freely choose level of detail Scalable to a few thousands Open-source software Viva Currently being packaged to Debian Issues yet to be addressed Aggregating link utilization is questionable Aggregation leads to data loss which may be harmful, evaluation of an aggregation quality Viva has only a few graphical objects : grammar of graphical objects (ggplot2 philosophy) Thank you for your attention 25 / 25
47 Synthesis and Open Questions Proposition: Multi-scale aggregation with dynamic graph layout Interactive, analyst can freely choose level of detail Scalable to a few thousands Open-source software Viva Currently being packaged to Debian Issues yet to be addressed Aggregating link utilization is questionable Aggregation leads to data loss which may be harmful, evaluation of an aggregation quality Viva has only a few graphical objects : grammar of graphical objects (ggplot2 philosophy) Thank you for your attention 25 / 25
48 Synthesis and Open Questions Proposition: Multi-scale aggregation with dynamic graph layout Interactive, analyst can freely choose level of detail Scalable to a few thousands Open-source software Viva Currently being packaged to Debian Issues yet to be addressed Aggregating link utilization is questionable Aggregation leads to data loss which may be harmful, evaluation of an aggregation quality Viva has only a few graphical objects : grammar of graphical objects (ggplot2 philosophy) Thank you for your attention 25 / 25
Analysis of Program Behavior
Analysis of Program Behavior High Performance Computing, Visualization Lucas Mello Schnorr probably soon (LIG-CNRS INF-UFRGS) 2 nd LICIA Workshop Grenoble, France September 5th, 2012 1/ 25 Introduction
More informationLarge-Scale Trace Visualization Analysis with Triva and Pajé the G5K case study
1 / 12 Large-Scale Trace Visualization Analysis with Triva and Pajé the G5K case study Lucas M. Schnorr, Arnaud Legrand Grid 5000 Spring School Reims, France 20 April 2011 2 / 12 Introduction Basic Concepts
More informationInteractive Analysis of Large Distributed Systems with Topology-based Visualization
Interactive Analysis of Large Distributed Systems with Topology-based Visualization Lucas Mello Schnorr, Arnaud Legrand, Jean-Marc Vincent To cite this version: Lucas Mello Schnorr, Arnaud Legrand, Jean-Marc
More informationVisual Mapping of Program Components to Resources Representation: a 3D Analysis of Grid Parallel Applications
Visual Mapping of Program Components to Resources Representation: a 3D Analysis of Grid Parallel Applications Lucas Mello Schnorr, Guillaume Huard, Philippe Olivier Alexandre Navaux Federal University
More informationSome Visualization Models applied to the Analysis of Parallel Applications
1 / 1 Some Visualization Models applied to the Analysis of Parallel Applications Lucas Mello Schnorr Advisors: Philippe O. A. Navaux & Denis Trystram & Guillaume Huard Federal University of Rio Grande
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction
ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow
More informationSystematic Cooperation in P2P Grids
29th October 2008 Cyril Briquet Doctoral Dissertation in Computing Science Department of EE & CS (Montefiore Institute) University of Liège, Belgium Application class: Bags of Tasks Bag of Task = set of
More informationInter-Domain Routing: BGP
Inter-Domain Routing: BGP Brad Karp UCL Computer Science (drawn mostly from lecture notes by Hari Balakrishnan and Nick Feamster, MIT) CS 3035/GZ01 4 th December 2014 Outline Context: Inter-Domain Routing
More informationOn the scalability of tracing mechanisms 1
On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica
More informationAnna Morajko.
Performance analysis and tuning of parallel/distributed applications Anna Morajko Anna.Morajko@uab.es 26 05 2008 Introduction Main research projects Develop techniques and tools for application performance
More informationTowards Visualization Scalability through Time Intervals and Hierarchical Organization of Monitoring Data
Towards Visualization Scalability through Time Intervals and Hierarchical Organization of Monitoring Data Lucas Mello Schnorr, Guillaume Huard, Philippe Olivier Alexandre Navaux Instituto de Informática
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationAPPLICATION NOTE. XCellAir s Wi-Fi Radio Resource Optimization Solution. Features, Test Results & Methodology
APPLICATION NOTE XCellAir s Wi-Fi Radio Resource Optimization Solution Features, Test Results & Methodology Introduction Multi Service Operators (MSOs) and Internet service providers have been aggressively
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate
More informationA Global Operating System for HPC Clusters
A Global Operating System Emiliano Betti 1 Marco Cesati 1 Roberto Gioiosa 2 Francesco Piermaria 1 1 System Programming Research Group, University of Rome Tor Vergata 2 BlueGene Software Division, IBM TJ
More informationCSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017
CSE 124: THE DATACENTER AS A COMPUTER George Porter November 20 and 22, 2017 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationBuilding petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N.
Building petabit/s data center network with submicroseconds latency by using fast optical switches Miao, W.; Yan, F.; Dorren, H.J.S.; Calabretta, N. Published in: Proceedings of 20th Annual Symposium of
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationNetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013
NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching
More informationLecture 7: Data Center Networks
Lecture 7: Data Center Networks CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Nick Feamster Lecture 7 Overview Project discussion Data Centers overview Fat Tree paper discussion CSE
More informationCentralized versus distributed schedulers for multiple bag-of-task applications
Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.
More informationMost real programs operate somewhere between task and data parallelism. Our solution also lies in this set.
for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into
More informationMotivation and goal Design concepts and service model Architecture and implementation Performance, and so on...
Motivation and goal Design concepts and service model Architecture and implementation Performance, and so on... Autonomous applications have a demand for grasping the state of hosts and networks for: sustaining
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationChallenges in HPC I/O
Challenges in HPC I/O Universität Basel Julian M. Kunkel German Climate Computing Center / Universität Hamburg 10. October 2014 Outline 1 High-Performance Computing 2 Parallel File Systems and Challenges
More informationProgramming Models for Supercomputing in the Era of Multicore
Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need
More informationOutline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems
Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears
More informationInter-Domain Routing: BGP
Inter-Domain Routing: BGP Stefano Vissicchio UCL Computer Science CS 3035/GZ01 Agenda We study how to route over the Internet 1. Context The Internet, a network of networks Relationships between ASes 2.
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationRouting Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011
Routing Domains in Data Centre Networks Morteza Kheirkhah Informatics Department University of Sussex Multi-Service Networks July 2011 What is a Data Centre? Large-scale Data Centres (DC) consist of tens
More informationDr. Gengbin Zheng and Ehsan Totoni. Parallel Programming Laboratory University of Illinois at Urbana-Champaign
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011 A function level simulator for parallel applications on peta scale machine An
More informationIntroduction to Parallel Performance Engineering
Introduction to Parallel Performance Engineering Markus Geimer, Brian Wylie Jülich Supercomputing Centre (with content used with permission from tutorials by Bernd Mohr/JSC and Luiz DeRose/Cray) Performance:
More informationCharacterizing Imbalance in Large-Scale Parallel Programs. David Bo hme September 26, 2013
Characterizing Imbalance in Large-Scale Parallel Programs David o hme September 26, 2013 Need for Performance nalysis Tools mount of parallelism in Supercomputers keeps growing Efficient resource usage
More informationLeveraging Flash in HPC Systems
Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,
More informationScaling Without Sharding. Baron Schwartz Percona Inc Surge 2010
Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node
More informationAdvanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED
Advanced Software for the Supercomputer PRIMEHPC FX10 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationLoad Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs
Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs Laércio Lima Pilla llpilla@inf.ufrgs.br LIG Laboratory INRIA Grenoble University Grenoble, France Institute of Informatics
More informationFunctional Requirements for Grid Oriented Optical Networks
Functional Requirements for Grid Oriented Optical s Luca Valcarenghi Internal Workshop 4 on Photonic s and Technologies Scuola Superiore Sant Anna Pisa June 3-4, 2003 1 Motivations Grid networking connection
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationVAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW
VAMPIR & VAMPIRTRACE INTRODUCTION AND OVERVIEW 8th VI-HPS Tuning Workshop at RWTH Aachen September, 2011 Tobias Hilbrich and Joachim Protze Slides by: Andreas Knüpfer, Jens Doleschal, ZIH, Technische Universität
More informationA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture B Y M O H A M M A D A L - F A R E S A L E X A N D E R L O U K I S S A S A M I N V A H D A T P R E S E N T E D B Y N A N X I C H E N M A Y. 5, 2 0
More informationPerformance Evaluation of Computer Systems
Jean-Marc Vincent and Bruno Gaujal 1 1 MESCAL Project Laboratory of Informatics of Grenoble (LIG) Universities of Grenoble {Jean-Marc.Vincent,Bruno.Gaujal}@imag.fr http://mescal.imag.fr/members.php Site
More informationNetwork-Aware Resource Allocation in Distributed Clouds
Dissertation Research Summary Thesis Advisor: Asst. Prof. Dr. Tolga Ovatman Istanbul Technical University Department of Computer Engineering E-mail: aralat@itu.edu.tr April 4, 2016 Short Bio Research and
More informationScaFaCoS and P 3 M. Olaf Lenz. Recent Developments. June 3, 2013
ScaFaCoS and P 3 M Recent Developments Olaf Lenz June 3, 2013 Outline ScaFaCoS ScaFaCoS Methods Performance Comparison Recent P 3 M developments Olaf Lenz ScaFaCoS and P 3 M 2/23 ScaFaCoS Scalable Fast
More informationScalable Distributed Training with Parameter Hub: a whirlwind tour
Scalable Distributed Training with Parameter Hub: a whirlwind tour TVM Stack Optimization High-Level Differentiable IR Tensor Expression IR AutoTVM LLVM, CUDA, Metal VTA AutoVTA Edge FPGA Cloud FPGA ASIC
More informationOptimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa
Optimizing Network Performance in Distributed Machine Learning Luo Mai Chuntao Hong Paolo Costa Machine Learning Successful in many fields Online advertisement Spam filtering Fraud detection Image recognition
More informationPerformance Analysis with Vampir
Performance Analysis with Vampir Johannes Ziegenbalg Technische Universität Dresden Outline Part I: Welcome to the Vampir Tool Suite Event Trace Visualization The Vampir Displays Vampir & VampirServer
More informationFACULTY OF ENGINEERING B.E. 4/4 (CSE) II Semester (Old) Examination, June Subject : Information Retrieval Systems (Elective III) Estelar
B.E. 4/4 (CSE) II Semester (Old) Examination, June 2014 Subject : Information Retrieval Systems Code No. 6306 / O 1 Define Information retrieval systems. 3 2 What is precision and recall? 3 3 List the
More informationHow to get realistic C-states latency and residency? Vincent Guittot
How to get realistic C-states latency and residency? Vincent Guittot Agenda Overview Exit latency Enter latency Residency Conclusion Overview Overview PMWG uses hikey960 for testing our dev on b/l system
More informationPRIMEHPC FX10: Advanced Software
PRIMEHPC FX10: Advanced Software Koh Hotta Fujitsu Limited System Software supports --- Stable/Robust & Low Overhead Execution of Large Scale Programs Operating System File System Program Development for
More informationDirections in Workload Management
Directions in Workload Management Alex Sanchez and Morris Jette SchedMD LLC HPC Knowledge Meeting 2016 Areas of Focus Scalability Large Node and Core Counts Power Management Failure Management Federated
More informationArcGIS Enterprise: Performance and Scalability Best Practices. Darren Baird, PE, Esri
ArcGIS Enterprise: Performance and Scalability Best Practices Darren Baird, PE, Esri dbaird@esri.com What is ArcGIS Enterprise What s Included with ArcGIS Enterprise ArcGIS Server the core web services
More informationApplications. Oversampled 3D scan data. ~150k triangles ~80k triangles
Mesh Simplification Applications Oversampled 3D scan data ~150k triangles ~80k triangles 2 Applications Overtessellation: E.g. iso-surface extraction 3 Applications Multi-resolution hierarchies for efficient
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationFault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware
University of Paris XIII INSTITUT GALILEE Laboratoire d Informatique de Paris Nord (LIPN) Université of Tunis École Supérieure des Sciences et Tehniques de Tunis Unité de Recherche UTIC Fault tolerance
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationComposite Metrics for System Throughput in HPC
Composite Metrics for System Throughput in HPC John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2003 Phoenix, AZ November 18, 2003 Overview The HPC Challenge Benchmark was announced last
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationDecentralized Distributed Storage System for Big Data
Decentralized Distributed Storage System for Big Presenter: Wei Xie -Intensive Scalable Computing Laboratory(DISCL) Computer Science Department Texas Tech University Outline Trends in Big and Cloud Storage
More informationDiscovering Dependencies between Virtual Machines Using CPU Utilization. Renuka Apte, Liting Hu, Karsten Schwan, Arpan Ghosh
Look Who s Talking Discovering Dependencies between Virtual Machines Using CPU Utilization Renuka Apte, Liting Hu, Karsten Schwan, Arpan Ghosh Georgia Institute of Technology Talk by Renuka Apte * *Currently
More informationGeneric Topology Mapping Strategies for Large-scale Parallel Architectures
Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous
More informationDenis TRYSTRAM Grenoble INP - IUF (labo LIG) octobre, 2013
Coopération en Informatique Parallèle Denis TRYSTRAM Grenoble INP - IUF (labo LIG) octobre, 2013 1 / 84 Outline 1 Introduction: parallelism and resource management 2 Classical Scheduling 3 Strip Packing
More informationVortex Whitepaper. Intelligent Data Sharing for the Business-Critical Internet of Things. Version 1.1 June 2014 Angelo Corsaro Ph.D.
Vortex Whitepaper Intelligent Data Sharing for the Business-Critical Internet of Things Version 1.1 June 2014 Angelo Corsaro Ph.D., CTO, PrismTech Vortex Whitepaper Version 1.1 June 2014 Table of Contents
More informationCommunication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures
Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart
More informationCRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos
CRESCENDO Implementation of Hierarchical Chord (Crescendo), according to Canon paradigm and evaluation, via simulation over realistic network topologies, of the Crescendo's advantages in comparison to
More informationOcelotl: Large Trace Overviews Based on Multidimensional Data Aggregation
Ocelotl: Large Trace Overviews Based on Multidimensional Data Aggregation Damien Dosimont, Youenn Corre, Lucas Mello Schnorr, Guillaume Huard, Jean-Marc Vincent To cite this version: Damien Dosimont, Youenn
More informationWireless Sensor Networks: Clustering, Routing, Localization, Time Synchronization
Wireless Sensor Networks: Clustering, Routing, Localization, Time Synchronization Maurizio Bocca, M.Sc. Control Engineering Research Group Automation and Systems Technology Department maurizio.bocca@tkk.fi
More informationMesh. Mesh Channels. Straight-forward Switching Requirements. Switch Delay. Total Switches. Total Switches. Total Switches?
ESE534: Computer Organization Previously Saw need to exploit locality/structure in interconnect Day 20: April 7, 2010 Interconnect 5: Meshes (and MoT) a mesh might be useful Question: how does w grow?
More informationPerformance and Power Co-Design of Exascale Systems and Applications
Performance and Power Co-Design of Exascale Systems and Applications Adolfy Hoisie Work with Kevin Barker, Darren Kerbyson, Abhinav Vishnu Performance and Architecture Lab (PAL) Pacific Northwest National
More informationCycle Sharing Systems
Cycle Sharing Systems Jagadeesh Dyaberi Dependable Computing Systems Lab Purdue University 10/31/2005 1 Introduction Design of Program Security Communication Architecture Implementation Conclusion Outline
More informationAn Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic
An Oracle White Paper September 2011 Oracle Utilities Meter Data Management 2.0.1 Demonstrates Extreme Performance on Oracle Exadata/Exalogic Introduction New utilities technologies are bringing with them
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Elevating the Edge to be a Peer of the Cloud Kishore Ramachandran Embedded Pervasive Lab, Georgia Tech August 2, 2018 Acknowledgements Enrique
More informationMeet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors
Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it
More informationOPTIMIZING MOBILITY MANAGEMENT IN FUTURE IPv6 MOBILE NETWORKS
OPTIMIZING MOBILITY MANAGEMENT IN FUTURE IPv6 MOBILE NETWORKS Sandro Grech Nokia Networks (Networks Systems Research) Supervisor: Prof. Raimo Kantola 1 SANDRO GRECH - OPTIMIZING MOBILITY MANAGEMENT IN
More informationMINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES
MINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES Divy Agrawal Department of Computer Science University of California at Santa Barbara Joint work with: Amr El Abbadi, Hatem Mahmoud, Faisal
More informationPARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS. Wentong CAI
PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS Wentong CAI Parallel & Distributed Computing Centre School of Computer Engineering Nanyang Technological University Singapore
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationONOS OVERVIEW. Architecture, Abstractions & Application
ONOS OVERVIEW Architecture, Abstractions & Application WHAT IS ONOS? Open Networking Operating System (ONOS) is an open source SDN network operating system (controller). Mission: to enable Service Providers
More informationDominating Set & Clustering Based Network Coverage for Huge Wireless Sensor Networks
Dominating Set & Clustering Based Network Coverage for Huge Wireless Sensor Networks Mohammad Mehrani, Ali Shaeidi, Mohammad Hasannejad, and Amir Afsheh Abstract Routing is one of the most important issues
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationBig Compute, Big Net & Big Data: How to be big
> 2014 HPC Advisory Council Brazil Conference Big Compute, Big Net & Big Data: How to be big Luiz Monnerat PETROBRAS 26/05/2014 > Agenda Big Compute (HPC) Commodity HW, free software, parallel processing,
More informationITTC High-Performance Networking The University of Kansas EECS 881 Architecture and Topology
High-Performance Networking The University of Kansas EECS 881 Architecture and Topology James P.G. Sterbenz Department of Electrical Engineering & Computer Science Information Technology & Telecommunications
More informationRollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello
Rollback-Recovery Protocols for Send-Deterministic Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Fault Tolerance in HPC Systems is Mandatory Resiliency is
More informationREMEM: REmote MEMory as Checkpointing Storage
REMEM: REmote MEMory as Checkpointing Storage Hui Jin Illinois Institute of Technology Xian-He Sun Illinois Institute of Technology Yong Chen Oak Ridge National Laboratory Tao Ke Illinois Institute of
More informationParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser
ParalleX A Cure for Scaling Impaired Parallel Applications Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 Tianhe-1A 2.566 Petaflops Rmax Heterogeneous Architecture: 14,336 Intel Xeon CPUs 7,168 Nvidia Tesla M2050
More informationHybrid Programming with MPI and SMPSs
Hybrid Programming with MPI and SMPSs Apostolou Evangelos August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract Multicore processors prevail
More informationVertical Profiling: Understanding the Behavior of Object-Oriented Applications
Vertical Profiling: Understanding the Behavior of Object-Oriented Applications Matthias Hauswirth, Amer Diwan University of Colorado at Boulder Peter F. Sweeney, Michael Hind IBM Thomas J. Watson Research
More informationChapter 3. Design of Grid Scheduler. 3.1 Introduction
Chapter 3 Design of Grid Scheduler The scheduler component of the grid is responsible to prepare the job ques for grid resources. The research in design of grid schedulers has given various topologies
More informationData Center Network Topologies II
Data Center Network Topologies II Hakim Weatherspoon Associate Professor, Dept of Computer cience C 5413: High Performance ystems and Networking April 10, 2017 March 31, 2017 Agenda for semester Project
More informationPerformance analysis basics
Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis
More informationStorage Hierarchy Management for Scientific Computing
Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction
More informationTowards Systematic Design of Enterprise Networks
Towards Systematic Design of Enterprise Networks Geoffrey Xie Naval Postgraduate School In collaboration with: Eric Sung, Xin Sun, and Sanjay Rao (Purdue Univ.) David Maltz (MSR) Copyright 2008 AT&T. All
More information