Grid Computing in Numerical Relativity and Astrophysics

Grid Computing in Numerical Relativity and Astrophysics Gabrielle Allen: gallen@cct.lsu.edu Depts Computer Science & Physics Center for Computation & Technology (CCT) Louisiana State University Challenge Problems Cosmology Black Hole and Neutron Star Models Supernovae Astronomical Databases Gravitational Wave Data Analysis Drive HEC & Grids 1

Gravitational Wave Physics Analysis & Insight Observations Models Complex Simulations 2

Computational Science Needs Requires incredible mix of technologies & expertise! Many scientific/engineering components Physics, astrophysics, CFD, engineering,... Many numerical algorithm components Finite difference? Finite volume? Finite elements? Elliptic equations: multigrid, Krylov subspace,... Mesh refinement Many different computational components Parallelism (HPF, MPI, PVM,???) Multipatch Architecture (MPP, DSM, Vector, PC Clusters, FPGA,???) I/O (generate TBs/simulation, checkpointing ) Visualization of all that comes out! New technologies Grid computing Steering, data archives Such work cuts across many disciplines, areas of CS Cactus Code Freely available, modular, portable and manageable environment for collaboratively developing parallel, high-performance multidimensional simulations Developed for Numerical Relativity, but now general framework for parallel computing (CFD, astrophysics, climate modeling, chemical eng, quantum gravity, ) Finite difference, adaptive mesh refinement (Carpet, Samrai, Grace), adding FE/FV, multipatch Active user and developer communities, main development now at LSU and AEI. Open source, documentation, etc 3

Cactus Einstein Cactus modules (thorns) for numerical relativity. Many additional thorns available from other groups (AEI, CCT, ) Agree on some basic principles (e.g. names of variables) and then can share evolution, analysis etc. Can choose whether or not to use e.g. gauge choice, macros, masks, matter coupling, conformal factor Over 100 relativity papers & 30 student theses: production research code Evolve ADM EvolSimple InitialData IDAnalyticBH IDAxiBrillBH IDBrillData IDLinearWaves IDSimple SpaceMask ADMMacros ADMBase Analysis ADMAnalysis ADMConstraints AHFinder Extract PsiKadelia TimeGeodesic Gauge Conditions CoordGauge Maximal ADMCoupling StaticConformal Grand Challenge Collaborations NASA Neutron Star Grand Challenge 5 US sites 3 years Colliding neutron star problem EU Astrophysics Network 10 EU sites 3 years Continuing these problems NSF Black Hole Grand Challenge 8 US Institutions 5 years Attack colliding black hole problem Examples of Future of Science & Engineering Require Large Scale Simulations, beyond reach of any machine Require Large Geo-distributed Cross-Disciplinary Collaborations Require Grid Technologies, but not yet using them! 4

New Paradigm: Grid Computing Computational resources across the world Compute servers (double each 18 months) File servers Networks (double each 9 months) Playstations, cell phones etc Grid computing integrates communities and resources How to take advantage of this for scientific simulations? Harness multiple sites and devices Models with new level of complexity and scale, interacting with data New possibilities for collaboration and advanced scenarios NLR and Louisiana Optical Network (LONI) State initiative ($40M) to support research: 40 Gbps optical network Connects 7 sites Grid resources (IBM P5) at sites LIGO/CAMD New possibilities: Dynamical provisioning and scheduling of network bandwidth Network dependent scenarios EnLIGHTened Computing (NSF) 5

Current Grid Application Types Community Driven Distributed communities share resources Video Conferencing Virtual Collaborative Environments Data Driven Remote access of huge data, data mining Eg. Gravitational wave analysis, particle physics, astronomy Process/Simulation Driven Demanding Simulations of Science and Engineering Task farming, resource brokering, distributed computations, workflow Remote visualization, steering and interaction, etc Typical scenario: Find remote resources (task farm, distribute) Launch jobs (static) Visualize, collect results Prototypes and demos: need to move to: Fault tolerance Robustness Scaling Easy to use Complete solutions New Paradigms for Dynamic Grids Addressing large, complex, multidisciplinary problems with collaborative teams of varied researchers... Code/User/Infrastructure should be aware of environment Discover and monitor resources available NOW What is my allocation on these resources? What is bandwidth/latency Code/User/Infrastructure should make decisions Slow part of simulation can run independently spawn it off! New powerful resources just became available migrate there! Machine went down reconfigure and recover! Need more memory (or less!), get by adding (dropping) machines! Dynamically provision and use new high end resources and networks 6

Future Dynamic Grid Computing We see something, but too weak. Please simulate to enhance signal! S S 1 S 2 P 1 P 2 S 1 S 2 P 1 P 2 Future Dynamic Grid Computing Free CPUs!! RZG Queue time over, find new machine Add more resources SDSC LRZ Archive data Clone job with steered parameter Calculate/Output Invariants SDSC Further Calculations Found a black hole, Load new component Find best resources Look for horizon Calculate/Output Grav. Waves AEI Archive to LIGO experiment NCSA 7

New Grid Scenarios Intelligent Parameter Surveys, speculative computing, monte carlo Dynamic Staging: move to faster/cheaper/bigger machine Multiple Universe: create clone to investigate steered parameter Automatic Component Loading: needs of process change, discover/load/execute new calc. component on approp.machine Automatic Convergence Testing Look Ahead: spawn off and run coarser resolution to predict likely future Spawn Independent/Asynchronous Tasks: send to cheaper machine, main simulation carries on Routine Profiling: best machine/queue, choose resolution parameters based on queue Dynamic Load Balancing: inhomogeneous loads, multiple grids Inject dynamically acquired data But Need Grid Apps and Programming Tools Need application programming tools for Grid environments Frameworks for developing Grid applications Toolkits providing Grid functionality Grid debuggers and profilers Robust, dependable, flexible Grid tools Challenging CS problems: Missing or immature grid services Changing environment Different and evolving interfaces to the grid Interfaces are not simple for scientific application developers Application developers need easy, robust and dependable tools 8

GridLab Project EU 5th Framework ($7M) Partners in Europe and US PSNC (Poland), AEI & ZIB (Germany), VU (Netherlands), MASARYK (Czech), SZTAKI (Hungary), ISUFI (Italy), Cardiff (UK), NTUA (Greece), Chicago, ISI & Wisconsin (US), Sun, Compaq/HP, LSU Application and test bed oriented (Cactus + Triana) Numerical relativity Dynamic use of grids Main goal: develop application programming environment for Grid www.gridlab.org Abstract programming interface between applications and Grid services Designed for applications (move file, run remote task, migrate, write to remote file) Led to GGF Simple API for Grid Applications Grid Application Toolkit (GAT) Main result from GridLab project www.gridlab.org/gat 9

Distributed Computation Harnessing Multiple Computers Why do this? Capacity: computers can t keep up with needs Throughput: combine resources Issues Bandwidth (increasing faster than CPU) Latency Communication needs, Topology Communication/computation Techniques to be developed Overlapping communication/computation Extra ghost zones to reduce latency Compression Algorithms to do this for scientist Dynamic Adaptive Distributed Computation 17 SDSC IBM SP 1024 procs 5x12x17 =1020 5 12 OC-12 line (But only 2.5MB/sec) NCSA Origin Array 256+128+128 5x12x(4+2+2) =480 GigE:100MB/sec 4 2 5 2 12 Cactus + MPICH-G2 Communications dynamically adapt to application and environment Any Cactus application Scaling: 15% -> 85% Gordon Bell Prize (With U. Chicago/Northern, Supercomputing 2001, Denver) 10

Remote Viz & Steering HTTP Any Viz Client: LCA Vision, OpenDX Streaming HDF5 Autodownsample Changing steerable parameters Parameters Physics, algorithms Performance Cactus Worm (SC2000) Cactus simulation starts, launched from portal Migrates itself to another site Grid technologies Registers new location User tracks/steers, using HTTP, streaming data, etc Continues around Europe 11

Task Spawning (SC2001) Cactus Spawner thorn automatically prepares analysis tasks for spawning Grid technologies find resources, manage tasks, collect data Intelligence to decide when to spawn SC2001: resources of GGTC testbed. Main Cactus BH simulation starts here Appropriate analysis tasks spawned automatically to free resources worldwide User only has to invoke Cactus Spawner thorn Global Grid Testbed Collaboration Supercomputing 2001 Cactus black hole simulations spawned apparent horizon finding tasks across the grid. Prizes for most heterogeneous and most distributed testbed 5 continents and over 14 countries. Around 70 machines, 7500+ processors Many hardware types, including PS2, IA32, IA64, MIPS, Many OSs, including Linux, Irix, AIX, OSF, True64, Solaris, Hitachi Many organizations: DOE, NSF, MPG, universities, vendors All ran same Grid infrastructure, and used for different applications 12

Black Hole Task Farming (SC2002) Black hole server controls tasks and steers main job Main Cactus BH Simulation started in California Error measure returned Dozens of low resolution jobs test corotation parameter Huge job generates remote data visualized in Baltimore Job Migration GridLab demonstration SC2003 13

Notification and Information Replica Catalog SMS Server GridSphere Portal The Grid IM Server Mail Server User details, notification prefs and simulation information Grid-enabled Gravitational Physics Adaptive, intelligent simulation codes able to adapt to environment Simulation data stored across geographically distributed spaces Organization, access, mining issues Analysis of federated data sets by virtual organizations Data analysis of LIGO, GEO, LISA signals Interacting with simulation data Managing parameter space/signal analysis Now working on domain specific information and knowledge based services: Gravitational physics description language Schema for describing, searching, encoding simulation results Automated logging of simulations: reproducibility Notification and data sharing services to enable collaboration Relativity services Remote servers running e.g. waveform extraction, horizon finding etc. Connection to publications and information Automated analysis 14

Credits This talk describes work carried out over a number of years by physicists, computer scientists, mathematicians etc by the joint AEI-LSU numerical relativity groups and colleagues. 15