Architecture-Aware Graph Repartitioning for Data-Intensive Scientific Computing
|
|
- Lawrence Woods
- 5 years ago
- Views:
Transcription
1 Architecture-Aware Graph Repartitioning for Data-Intensive Scientific Computing Angen Zheng, Alexandros Labrinidis, Panos K. Chrysanthis Advanced Data Management Technologies Laboratory Department of Computer Science University of Pittsburgh 2014 BigGraphs Workshop
2 Graph Partitioning and Repartitioning in Scientific Simulations Computation and Communication Graph Vertices---Computational Units 1
3 Graph Partitioning and Repartitioning in Scientific Simulations Computation and Communication Graph Vertex weight: computational cost size: migration cost Vertices---Computational Units Edges---Communication Edge weight: communication cost 2
4 Graph Partitioning and Repartitioning in Scientific Simulations Graph Partitioning Balanced Partitioning Even Load Distribution Minimal Edge-Cuts Minimal Comm Cost 3
5 Graph Partitioning and Repartitioning in Scientific Simulations Time-Evolving Graph The optimal partitioning of the graph changes. 4
6 Graph Partitioning and Repartitioning in Scientific Simulations Graph Repartitioning Balanced Load Distribution Minimal Communication Cost Minimal Migration Cost Existing Graph (Re)Partitioners Assume uniform comm costs among partitions 5
7 Nonuniform Inter-Node Comm Costs Grouping the most communicating vertices to compute nodes as close as possible. 6
8 Nonuniform Intra-Node Comm Costs Grouping the most communicating vertices to cores sharing more cache levels. 7
9 AragonLB Overview AragonLB Architecture-Aware Graph RepartitiONing for Load Balancing 2-Level Repartitioner Inter-Node Repartitioning Regrouping Architecture-Agnostic Repartitioning Architecture-Aware Refinement (TopoFM) Intra-Node Repartitioning HierCacheLB FlatCacheLB 8
10 Roadmap AragonLB Internals Inter-Node Repartitioning Intra-Node Repartitioning Evaluation HierCacheLB FlatCacheLB Setup Results Conclusions Acknowledgements 9
11 Inter-Node Repartitioning cc=1 cc=1 cc=6 cc=6 14 Units Comm Cost (4 Edge-Cuts) 10
12 Inter-Node Repartitioning: Regrouping cc=1 cc=1 cc=6 cc=6 14 Units Comm Cost (4 Edge-Cuts) 11
13 Inter-Node Repartitioning: Repartitioning cc=1 cc=1 cc=6 8 Units Comm Cost (3 Edge-Cuts) 6 Units Migration Cost 12
14 Inter-Node Repartitioning: TopoFM 13
15 Inter-Node Repartitioning: TopoFM Compute initial gain 14
16 Inter-Node Repartitioning: TopoFM Compute initial gain a: P1->P2 gstd(a) = (1-2)*1 15
17 Inter-Node Repartitioning: TopoFM Compute initial gain a: P1->P2 gstd(a) = (1-2)*1 gtopo(a) = 1*(6-1) 16
18 Inter-Node Repartitioning: TopoFM Compute initial gain a: P1->P2 gstd(a) = (1-2)*1 gtopo(a) = 1*(6-1) gmig(a) = 1*(6-1) 17
19 Inter-Node Repartitioning: TopoFM 9 Compute initial gain a: P1->P2 gstd(a) = (1-2)*1 gtopo(a) = 1*(6-1) gmig(a) = 1*(6-1) g(a) =
20 Inter-Node Repartitioning: TopoFM Compute initial gain -3-2 gstd(a) = (1-2)*1 gtopo(a) = 1*(6-1) gmig(a) = 1*(6-1) g(a) =
21 Inter-Node Repartitioning: TopoFM Compute initial gain Select maximal gain vertex, a
22 Inter-Node Repartitioning: TopoFM Compute initial gain Select maximal gain vertex, a. Move a to P
23 Inter-Node Repartitioning: TopoFM Compute initial gain Select a maximal gain vertex, a. Move a to P2 Update the gain of a s nbors. 22
24 Inter-Node Repartitioning: TopoFM Compute initial gain Repeat Select a maximal gain vertex, v. Move v to P2 Update the gain of v s nbors. 23
25 Inter-Node Repartitioning: TopoFM 4 Units Comm Cost (4 Edge-Cuts) 1 Unit Migration Cost 24
26 Inter-Node Repartitioning: TopoFM 25
27 Roadmap AragonLB Internals Inter-Node Repartitioning Intra-Node Repartitioning Evaluation HierCacheLB FlatCacheLB Setup Results Conclusions Acknowledgements 26
28 Intra-Node Repartitioning: HierCacheLB 1. Tree Topology 2. Hierarchical Repartitioning 27
29 Intra-Node Repartitioning: FlatCacheLB Main Idea: Partition the subgraph directly to k parts. Explore all possible assignments. alpha: # of computation steps performed between 2 consecutive repartitioning steps w(pi, Pj): the amount of comm between Pi and Pj c(pi, Pj): inter-core comm cost between Pi and Pj 28
30 Intra-Node Repartitioning: FlatCacheLB Main Idea: Partition the subgraph directly to k parts. Explore all possible assignments. vs(pi, Pj): the amount of data migrated between Pi and Pj c(pi, Pj): inter-core comm cost between Pi and Pj 29
31 Roadmap AragonLB Internals Inter-Node Repartitioning Intra-Node Repartitioning Evaluation HierCacheLB FlatCacheLB Setup Results Conclusions Acknowledgements 30
32 Evaluation: Dataset Combustion Simulation Dataset Vertex Degree V E Min Max Avg. 115,351 2,865, Graph # of Partitions Degree of Imbalance Synthetic Datasets G8 G64 G128 G256 G
33 Evaluation: Platform Evaluation Platform 3-D Torus 5 * 5* 5 Compute Node Sockets L1 L2 L3 2-quad core Private Private Shared 32
34 Evaluation: Algorithms Baselines: ParmetisRepart[1] ZoltanRepart[2] AragonLB Inter-Node Repartitioning Intra-Node Repartitioning PTF Parmetis + TopoFM FlatCacheLB PTH Parmetis + TopoFM HierCacheLB ZTF Zoltan + TopoFM FlatCacheLB ZTH Zoltan + TopoFM HierCacheLB [1]. Parmetis [2]. Zoltan 33
35 Varying # of Partitions alpha=500 ZTH/F PTH/F Up to 60% improvement v.s. Zoltan Up to 46% improvement v.s. Parmetis 34
36 Varying # of Comp. Steps G512 ZTH/F PTH/F Up to 30% improvement 35
37 Varying Sized 3D-Torus G512, alpha=500 Up to 32% improvement 36
38 Breakdown Comm and Mig Volume G512, alpha=500 2-Level Repartitioning Intra-Node Repartitioning Lower inter-node volume (30%~35% Reduction) Bigger as # of hops increases 37
39 Conclusions Proposed a new architecture-aware graph repartitioner, AragonLB Considers the heterogeneity in inter-node communication intra-node communication Experimental study with a combustion simulation dataset Up to 60% improvement (v.s. Parmetis and Zoltan) More gains as heterogeneity increases 38
40 Acknowledgments Many thanks to our collaborators: Peyman Givi, Patrick Pisciuneri, Medhi Nik, Levent Yilmaz, and Esteban Meneses Work funded in part by NSF CBET NSF OIA
Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning
Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng, Alexandros Labrinidis, and Panos K. Chrysanthis University of Pittsburgh 1 Graph Partitioning Applications of
More informationArgo: Architecture- Aware Graph Par33oning
Argo: Architecture- Aware Graph Par33oning Angen Zheng Alexandros Labrinidis, Panos K. Chrysanthis, and Jack Lange Department of Computer Science, University of PiCsburgh hcp://db.cs.pic.edu/group/ hcp://www.prognosgclab.org/
More informationDNA Interaction Network
Social Network Web Network Social Network DNA Interaction Network Follow Network User-Product Network Nonuniform network comm costs Contentiousness of the memory subsystems Nonuniform comp requirement
More informationPlanar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning
Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng, Alexandros Labrinidis, Panos K. Chrysanthis Department of Computer Science, University of Pittsburgh {anz28, labrinid,
More informationLoad Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application
Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of
More informationNew Challenges In Dynamic Load Balancing
New Challenges In Dynamic Load Balancing Karen D. Devine, et al. Presentation by Nam Ma & J. Anthony Toghia What is load balancing? Assignment of work to processors Goal: maximize parallel performance
More informationPuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1
PuLP Complex Objective Partitioning of Small-World Networks Using Label Propagation George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania State
More informationParallel repartitioning and remapping in
Parallel repartitioning and remapping in Sébastien Fourestier François Pellegrini November 21, 2012 Joint laboratory workshop Table of contents Parallel repartitioning Shared-memory parallel algorithms
More informationPuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks
PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania
More informationPULP: Fast and Simple Complex Network Partitioning
PULP: Fast and Simple Complex Network Partitioning George Slota #,* Kamesh Madduri # Siva Rajamanickam * # The Pennsylvania State University *Sandia National Laboratories Dagstuhl Seminar 14461 November
More informationCommunication and Topology-aware Load Balancing in Charm++ with TreeMatch
Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop (IEEE Cluster 2013, Indianapolis, IN) Emmanuel Jeannot Esteban Meneses-Rojas Guillaume Mercier François
More informationTopology and affinity aware hierarchical and distributed load-balancing in Charm++
Topology and affinity aware hierarchical and distributed load-balancing in Charm++ Emmanuel Jeannot, Guillaume Mercier, François Tessier Inria - IPB - LaBRI - University of Bordeaux - Argonne National
More informationGraph Partitioning for Scalable Distributed Graph Computations
Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering
More informationLoad Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs
Load Balancing for Parallel Multi-core Machines with Non-Uniform Communication Costs Laércio Lima Pilla llpilla@inf.ufrgs.br LIG Laboratory INRIA Grenoble University Grenoble, France Institute of Informatics
More informationThe Potential of Diffusive Load Balancing at Large Scale
Center for Information Services and High Performance Computing The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September 2016 Matthias Lieber, Kerstin Gößner, Wolfgang
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationGeneric Topology Mapping Strategies for Large-scale Parallel Architectures
Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous
More informationPenalized Graph Partitioning for Static and Dynamic Load Balancing
Penalized Graph Partitioning for Static and Dynamic Load Balancing Tim Kiefer, Dirk Habich, Wolfgang Lehner Euro-Par 06, Grenoble, France, 06-08-5 Task Allocation Challenge Application (Workload) = Set
More informationOrder or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations
Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations George M. Slota 1 Sivasankaran Rajamanickam 2 Kamesh Madduri 3 1 Rensselaer Polytechnic Institute, 2 Sandia National
More informationImprovements in Dynamic Partitioning. Aman Arora Snehal Chitnavis
Improvements in Dynamic Partitioning Aman Arora Snehal Chitnavis Introduction Partitioning - Decomposition & Assignment Break up computation into maximum number of small concurrent computations that can
More informationPresented by: Terry L. Wilmarth
C h a l l e n g e s i n D y n a m i c a l l y E v o l v i n g M e s h e s f o r L a r g e - S c a l e S i m u l a t i o n s Presented by: Terry L. Wilmarth Parallel Programming Laboratory and Center for
More informationCenter for Networked Computing
Concept of mobile social networks (MSNs): People walk around with smartphones and communicate with each other via Bluetooth or Wi-Fi when they are within transmission range of each other. Characters: No
More informationGraph Operators for Coupling-aware Graph Partitioning Algorithms
Graph Operators for Coupling-aware Graph Partitioning lgorithms Maria Predari, urélien Esnard To cite this version: Maria Predari, urélien Esnard. Graph Operators for Coupling-aware Graph Partitioning
More informationIrregular Graph Algorithms on Parallel Processing Systems
Irregular Graph Algorithms on Parallel Processing Systems George M. Slota 1,2 Kamesh Madduri 1 (advisor) Sivasankaran Rajamanickam 2 (Sandia mentor) 1 Penn State University, 2 Sandia National Laboratories
More informationOn Smart Query Routing: For Distributed Graph Querying with Decoupled Storage
On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft
More informationScalable Dynamic Adaptive Simulations with ParFUM
Scalable Dynamic Adaptive Simulations with ParFUM Terry L. Wilmarth Center for Simulation of Advanced Rockets and Parallel Programming Laboratory University of Illinois at Urbana-Champaign The Big Picture
More informationExploring the Hidden Dimension in Graph Processing
Exploring the Hidden Dimension in Graph Processing Mingxing Zhang, Yongwei Wu, Kang Chen, *Xuehai Qian, Xue Li, and Weimin Zheng Tsinghua University *University of Shouthern California Graph is Ubiquitous
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/34 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationAccelerated Load Balancing of Unstructured Meshes
Accelerated Load Balancing of Unstructured Meshes Gerrett Diamond, Lucas Davis, and Cameron W. Smith Abstract Unstructured mesh applications running on large, parallel, distributed memory systems require
More informationGraph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static
More informationShape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI
Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large
More informationBig Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek
More informationA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale Parallel Programming Laboratory Euro-Par 2009 University of Illinois at Urbana-Champaign
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,
More informationHolistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of
More informationDynamic Load Partitioning Strategies for Managing Data of Space and Time Heterogeneity in Parallel SAMR Applications
Dynamic Load Partitioning Strategies for Managing Data of Space and Time Heterogeneity in Parallel SAMR Applications Xiaolin Li and Manish Parashar The Applied Software Systems Laboratory Department of
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationShared Cache Aware Task Mapping for WCRT Minimization
Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications,
More informationParallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0
PARMETIS Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0 George Karypis and Kirk Schloegel University of Minnesota, Department of Computer Science and Engineering Minneapolis,
More informationApplying Graph Partitioning Methods in Measurement-based Dynamic Load Balancing
Applying Graph Partitioning Methods in Measurement-based Dynamic Load Balancing HARSHITHA MENON, University of Illinois at Urbana-Champaign ABHINAV BHATELE, Lawrence Livermore National Laboratory SÉBASTIEN
More informationEfficient and Effective Clustering Methods for Spatial Data Mining. Raymond T. Ng, Jiawei Han
Efficient and Effective Clustering Methods for Spatial Data Mining Raymond T. Ng, Jiawei Han 1 Overview Spatial Data Mining Clustering techniques CLARANS Spatial and Non-Spatial dominant CLARANS Observations
More informationHierarchical Partitioning and Dynamic Load Balancing for Scientific Computation
Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation James D. Teresco 1 Department of Computer Science, Williams College Williamstown, MA 01267 USA 2 Computer Science Research
More informationNUMA-aware Graph-structured Analytics
NUMA-aware Graph-structured Analytics Kaiyuan Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, China Big Data Everywhere 00 Million Tweets/day 1.11
More informationMapping-Aware Constrained Scheduling for LUT-Based FPGAs
Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for
More informationTowards Systematic Design of Enterprise Networks
Towards Systematic Design of Enterprise Networks Geoffrey Xie Naval Postgraduate School In collaboration with: Eric Sung, Xin Sun, and Sanjay Rao (Purdue Univ.) David Maltz (MSR) Copyright 2008 AT&T. All
More informationBig Data in HPC. John Shalf Lawrence Berkeley National Laboratory
Big Data in HPC John Shalf Lawrence Berkeley National Laboratory 1 Evolving Role of Supercomputing Centers Traditional Pillars of science Theory: mathematical models of nature Experiment: empirical data
More informationReal-Time Cache Management for Multi-Core Virtualization
Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation
More informationPARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS
Technical Report of ADVENTURE Project ADV-99-1 (1999) PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS Hiroyuki TAKUBO and Shinobu YOSHIMURA School of Engineering University
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationPartitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA
Partitioning and Partitioning Tools Tim Barth NASA Ames Research Center Moffett Field, California 94035-00 USA 1 Graph/Mesh Partitioning Why do it? The graph bisection problem What are the standard heuristic
More informationParallel static and dynamic multi-constraint graph partitioning
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2002; 14:219 240 (DOI: 10.1002/cpe.605) Parallel static and dynamic multi-constraint graph partitioning Kirk Schloegel,,
More informationCritical Node Detection Problem. Panos Pardalos Distinguished Professor CAO, Dept. of Industrial and Systems Engineering, University of Florida
Critical Node Detection Problem ITALY May, 2008 Panos Pardalos Distinguished Professor CAO, Dept. of Industrial and Systems Engineering, University of Florida Outline of Talk Introduction Problem Definition
More informationParallel Multilevel Algorithms for Multi-constraint Graph Partitioning
Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Kirk Schloegel, George Karypis, and Vipin Kumar Army HPC Research Center Department of Computer Science and Engineering University
More informationAutomatic NUMA Balancing. Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP
Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP Automatic NUMA Balancing Agenda What is NUMA, anyway? Automatic NUMA balancing internals
More informationIOGP. an Incremental Online Graph Partitioning algorithm for distributed graph databases. Dong Dai*, Wei Zhang, Yong Chen
IOGP an Incremental Online Graph Partitioning algorithm for distributed graph databases Dong Dai*, Wei Zhang, Yong Chen Workflow of The Presentation A Use Case IOGP Details Evaluation Setup OLTP vs. OLAP
More informationDistributed Memory Parallel Markov Random Fields Using Graph Partitioning
Distributed Memory Parallel Markov Random Fields Using Graph Partitioning Colleen Heinemann, Talita Perciano, Daniela Ushizima, Wes Bethel December 11, 2017 Overview What is MRF-based image segmentation?
More informationFlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs
FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs Jian Huang Anirudh Badam Laura Caulfield Suman Nath Sudipta Sengupta Bikash Sharma Moinuddin K. Qureshi Flash Has
More informationExtreme-scale Graph Analysis on Blue Waters
Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State
More informationLHC and LSST Use Cases
LHC and LSST Use Cases Depots Network 0 100 200 300 A B C Paul Sheldon & Alan Tackett Vanderbilt University LHC Data Movement and Placement n Model must evolve n Was: Hierarchical, strategic pre- placement
More informationAsymmetry-aware execution placement on manycore chips
Asymmetry-aware execution placement on manycore chips Alexey Tumanov Joshua Wise, Onur Mutlu, Greg Ganger CARNEGIE MELLON UNIVERSITY Introduction: Core Scaling? Moore s Law continues: can still fit more
More informationParallel Greedy Matching Algorithms
Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background
More informationHigh-performance Graph Analytics
High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering The Pennsylvania State University madduri@cse.psu.edu Papers, code, slides at graphanalysis.info Acknowledgments NSF grants
More informationA2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications
A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas
More informationBACKEND DESIGN. Circuit Partitioning
BACKEND DESIGN Circuit Partitioning Partitioning System Design Decomposition of a complex system into smaller subsystems. Each subsystem can be designed independently. Decomposition scheme has to minimize
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationHybrid Shipping Architectures: A Survey
Hybrid Shipping Architectures: A Survey Ivan Bowman itbowman@acm.org http://plg.uwaterloo.ca/~itbowman CS748T 14 Feb 2000 Outline Partitioning query processing Partitioning client code Optimization of
More informationVisual Analysis of Lagrangian Particle Data from Combustion Simulations
Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang
More informationDeep condolence to Professor Mark Everingham
Deep condolence to Professor Mark Everingham Towards VOC2012 Object Classification Challenge Generalized Hierarchical Matching for Sub-category Aware Object Classification National University of Singapore
More informationVMware vsphere 4: The CPU Scheduler in VMware ESX 4 W H I T E P A P E R
VMware vsphere 4: The CPU Scheduler in VMware ESX 4 W H I T E P A P E R Table of Contents 1 Introduction..................................................... 3 2 ESX CPU Scheduler Overview......................................
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems
More informationShared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation
Shared Memory and Distributed Multiprocessing Bhanu Kapoor, Ph.D. The Saylor Foundation 1 Issue with Parallelism Parallel software is the problem Need to get significant performance improvement Otherwise,
More informationc Copyright by Tarun Agarwal, 2005
c Copyright by Tarun Agarwal, 2005 STRATEGIES FOR TOPOLOGY-AWARE TASK MAPPING AND FOR REBALANCING WITH BOUNDED MIGRATIONS BY TARUN AGARWAL B.Tech., Indian Institute of Technology, Delhi, 2003 THESIS Submitted
More informationScalable Community Detection Benchmark Generation
Scalable Community Detection Benchmark Generation Jonathan Berry 1 Cynthia Phillips 1 Siva Rajamanickam 1 George M. Slota 2 1 Sandia National Labs, 2 Rensselaer Polytechnic Institute jberry@sandia.gov,
More informationICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory
ICN for Cloud Networking Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory Information-Access Dominates Today s Internet is focused on point-to-point communication
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 11
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 11 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Clustering II Spectral
More informationScalable Clustering of Signed Networks Using Balance Normalized Cut
Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.
More informationCameron W. Smith, Gerrett Diamond, George M. Slota, Mark S. Shephard. Scientific Computation Research Center Rensselaer Polytechnic Institute
MS46 Architecture-Aware Graph Analytics Part II of II: Dynamic Load Balancing of Massively Parallel Graphs for Scientific Computing on Many Core and Accelerator Based Systems Cameron W. Smith, Gerrett
More informationCo-optimizing Application Partitioning and Network Topology for a Reconfigurable Interconnect
Co-optimizing Application Partitioning and Network Topology for a Reconfigurable Interconnect Deepak Ajwani a,, Adam Hackett b, Shoukat Ali c, John P. Morrison d, Stephen Kirkland b a Bell Labs, Alcatel-Lucent,
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationWeek 3: MPI. Day 04 :: Domain decomposition, load balancing, hybrid particlemesh
Week 3: MPI Day 04 :: Domain decomposition, load balancing, hybrid particlemesh methods Domain decompositon Goals of parallel computing Solve a bigger problem Operate on more data (grid points, particles,
More informationCOPYRIGHTED MATERIAL. Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges.
Chapter 1 Introduction: Enabling Large-Scale Computational Science Motivations, Requirements, and Challenges Manish Parashar and Xiaolin Li 1.1 MOTIVATION The exponential growth in computing, networking,
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationXtraPuLP. Partitioning Trillion-edge Graphs in Minutes. State University
XtraPuLP Partitioning Trillion-edge Graphs in Minutes George M. Slota 1 Sivasankaran Rajamanickam 2 Kamesh Madduri 3 Karen Devine 2 1 Rensselaer Polytechnic Institute, 2 Sandia National Labs, 3 The Pennsylvania
More informationFast Dynamic Load Balancing for Extreme Scale Systems
Fast Dynamic Load Balancing for Extreme Scale Systems Cameron W. Smith, Gerrett Diamond, M.S. Shephard Computation Research Center (SCOREC) Rensselaer Polytechnic Institute Outline: n Some comments on
More informationDistributed Clustering Method for Large-Scaled Wavelength-Routed Networks
Distributed Clustering Method for Large-Scaled Wavelength-Routed Networks Yukinobu Fukushima Graduate School of Information and Computer Science Osaka University, Japan 1 Background: Inter-Domain Wavelength-Routed
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationc 2012 Harshitha Menon Gopalakrishnan Menon
c 2012 Harshitha Menon Gopalakrishnan Menon META-BALANCER: AUTOMATED LOAD BALANCING BASED ON APPLICATION BEHAVIOR BY HARSHITHA MENON GOPALAKRISHNAN MENON THESIS Submitted in partial fulfillment of the
More informationRequirements of Load Balancing Algorithm
LOAD BALANCING Programs and algorithms as graphs Geometric Partitioning Graph Partitioning Recursive Graph Bisection partitioning Recursive Spectral Bisection Multilevel Graph partitioning Hypergraph Partitioning
More informationMultilevel Acyclic Partitioning of Directed Acyclic Graphs for Enhancing Data Locality
Multilevel Acyclic Partitioning of Directed Acyclic Graphs for Enhancing Data Locality Julien Herrmann 1, Bora Uçar 2, Kamer Kaya 3, Aravind Sukumaran Rajam 4, Fabrice Rastello 5, P. Sadayappan 4, Ümit
More informationOFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management
Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly
More informationLogGP: A Log-based Dynamic Graph Partitioning Method
LogGP: A Log-based Dynamic Graph Partitioning Method Ning Xu, Lei Chen, Bin Cui Department of Computer Science, Peking University, Beijing, China Hong Kong University of Science and Technology, Hong Kong,
More informationDistributed Clustering Method for Large-Scaled Wavelength Routed Networks
Distributed Clustering Method for Large-Scaled Wavelength Routed Networks Yukinobu Fukushima Graduate School of Information Science and Technology, Osaka University - Yamadaoka, Suita, Osaka 60-08, Japan
More informationAn Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms
An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger Parasol Laboratory Dept. of Computer Science and Engineering
More informationScalaIOTrace: Scalable I/O Tracing and Analysis
ScalaIOTrace: Scalable I/O Tracing and Analysis Karthik Vijayakumar 1, Frank Mueller 1, Xiaosong Ma 1,2, Philip C. Roth 2 1 Department of Computer Science, NCSU 2 Computer Science and Mathematics Division,
More informationApplying Graph Partitioning Methods in Measurement-based Dynamic Load Balancing
Applying Graph Partitioning Methods in Measurement-based Dynamic Load Balancing HARSHITHA MENON, University of Illinois at Urbana-Champaign ABHINAV BHATELE, Lawrence Livermore National Laboratory SÉBASTIEN
More informationCombinatorial problems in a Parallel Hybrid Linear Solver
Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop
More informationJinho Hwang and Timothy Wood George Washington University
Jinho Hwang and Timothy Wood George Washington University Background: Memory Caching Two orders of magnitude more reads than writes Solution: Deploy memcached hosts to handle the read capacity 6. HTTP
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio
More informationImproving Linear Algebra Computation on NUMA platforms through auto-tuned tuned nested parallelism
Improving Linear Algebra Computation on NUMA platforms through auto-tuned tuned nested parallelism Javier Cuenca, Luis P. García, Domingo Giménez Parallel Computing Group University of Murcia, SPAIN parallelum
More information