Extreme scripting and other adventures in data-intensive computing
|
|
- Beverly Smith
- 5 years ago
- Views:
Transcription
1 Extreme scripting and other adventures in data-intensive computing Ian Foster Allan Espinosa, Ioan Raicu, Mike Wilde, Zhao Zhang Computation Institute Argonne National Lab & University of Chicago
2 How data analysis happens at dataintensive computing workshops 2
3 How data analysis really happens in scientific laboratories 3 % foo file1 > file2 % bar file2 > file3 % foo file1 bar > file3 % foreach f (f1 f2 f3 f4 f5 f6 f7 f100) foreach? foo $f.in bar > $f.out foreach? end % % Now where on earth is f98.out, and how did I generate it again? Now: command not found. %
4 Extreme scripting 4 Many activities Numerous files Complex data Data dependencies Many programs Preserving file system semantics, ability to call arbitrary executables Complex scripts Simple scripts Small computers Swift Big computers Many processors Storage hierarchy Failure Heterogeneity
5 Functional magnetic resonance imaging (fmri) data analysis 5
6 AIRSN program definition 6 (Run snr) functional ( Run r, NormAnat a, } Air shrink ) { Run yrorun = reorientrun( r, "y" ); Run rorun = reorientrun( yrorun, "x" ); Volume std = rorun[0]; Run rndr = random_select( rorun, 0.1 ); AirVector rndairvec = align_linearrun( rndr, std, 12, 1000, 1000, "81 3 3" ); Run reslicedrndr = reslicerun( rndr, rndairvec, "o", "k" ); Volume meanrand = softmean( reslicedrndr, "y", "null" ); Air mnqaair = alignlinear( a.nhires, meanrand, 6, 1000, 4, "81 3 3" ); Warp boldnormwarp = combinewarp( shrink, a.awarp, mnqaair ); Run nr = reslice_warp_run( boldnormwarp, rorun ); Volume meanall = strictmean( nr, "y", "null" ) Volume boldmask = binarize( meanall, "y" ); snr = gsmoothrun( nr, boldmask, "6 6 6" ); (Run or) reorientrun (Run ir, string direction) { foreach Volume iv, i in ir.v { or.v[i] = reorient(iv, direction); } }
7 Many many tasks: Identifying potential drug targets 7 Protein target(s) x 2M+ ligands Benoit Roux et al.
8 PDB protein descriptions ZINC 3-D structures 1 protein (1MB) 6 GB 2M structures (6 GB) Manually prep DOCK6 rec file DOCK6 Receptor (1 per protein: defines pocket to bind to) Manually prep FRED rec file FRED Receptor (1 per protein: defines pocket to bind to) NAB Script Template BuildNABScript NAB script parameters (defines flexible residues, #MDsteps) 8 FRED start DOCK6 ~4M x 60s x 1 cpu ~60K cpu-hrs NAB Script Amber prep: 2. AmberizeReceptor 4. perl: gen nabscript Select best ~5K Select best ~5K Amber Select best ~500 ~10K x 20m x 1 cpu ~3K cpu-hrs Amber Score: 1. AmberizeLigand 3. AmberizeComplex 5. RunNABScript report GCMC end ligands ~500 x 10hr x 100 cpu ~500K cpu-hrs complexes For 1 target: 4 million tasks 500,000 cpu-hrs (50 cpu-years)
9 9
10 10 IBM BG/P 570 Teraflop/s, 164,000 cores, 80 TB
11 11 DOCK on BG/P: ~1M tasks on 119,000 CPUs Ioan Raicu et al. Time (sec) cores tasks Elapsed time: 7257 sec Compute time: CPU years Average task: 667 sec Relative efficiency 99.7% (from 16 to 32 racks) Utilization: 99.6% sustained, 78.3% overall
12 Managing 160,000 cores 12 Falkon High-speed local disk Slower shared storage
13 Chirp (multicast) Large dataset Global file system ZOID IFS IFS seg ZOID on I/O node IFS compute node Staging Torus and tree interconnects CN-striped intermediate file system IFS seg IFS compute node Scaling Posix to petascale 13 Intermediate MosaStore (striping) LFS Compute node (local datasets)... LFS Compute node (local datasets) Local
14 Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors 14
15 Provisioning for data-intensive workloads 15 Example: on-demand stacking of arbitrary locations within ~10TB sky survey Challenges Random data access Much computing Time-varying load Solution Dynamic acquisition of compute & storage Data diffusion S = Sloan Data Ioan Raicu
16 Sine workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node 16 Ioan Raicu
17 Same scenario, but with dynamic resource provisioning 17
18 GPFS Data diffusion sine-wave workload: Summary 5.70 hrs, ~8Gb/s, 1138 CPU hrs DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs 18 DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs
19 19 Data-intensive Computation Institute: Example applications Astrophysics Cognitive science East Asian studies Economics Environmental science Epidemiology Genomic medicine Neuroscience Political science Sociology Solid state physics
20 20
21 Folker Meyer, Computation Institute BLAST On EC2, US$ $300,000 Sequencing outpaces Moore s law Bioinformatics Sequencing $240,000 $300,000 $600,000 $900, $30,000 $120, Solexa Next-gen Solexa $7,000 $3,000 $3, Gigabases
22 Data-intensive Computation Institute: Hardware PADS: Petascale Active Data Store (NSF MRI) 22 Diverse data sources Data ingest 1000 TB tape backup 500 TB reliable storage (data & metadata) P A D S 180 TB, 180 GB/s 17 Top/s analysis Dynamic provisioning Parallel analysis Remote access Diverse users Offload to remote data centers
23 Data-intensive Computation Institute: Software HPC systems software (MPICH, PVFS, ZeptOS) 23 Collaborative data tagging (GLOSS) Data integration (XDTM) HPC data analytics and visualization Loosely coupled parallelism (Swift, Hadoop) Dynamic provisioning (Falkon) Service authoring (Introduce, cagrid, gravi) Provenance recording and query (Swift) Service composition and workflow (Taverna) Virtualization management (Workspace Service) Distributed data management (GridFTP, etc.)
24 Data-intensive computing is an end-to-end problem 24 Low Chaos Agreement about outcomes Zone of complexity High Plan and control High Certainty about outcomes Ralph Stacey, Complexity and Creativity in Organizations, 1996 Low
25 We need to function in the zone of complexity 25 Low Chaos Agreement about outcomes High Plan and control High Certainty about outcomes Ralph Stacey, Complexity and Creativity in Organizations, 1996 Low
26 The Grid paradigm Principles and mechanisms for dynamic virtual organizations Leverage service oriented architecture Loose coupling of data and services Open software, architecture Computer science Physics Astronomy Engineering Biology Biomedicine Healthcare
27 27 As of Oct 19, 2008: 122 participants 105 services 70 data 35 analytical
28 Multi-center clinical cancer trials image capture and review 28 (Center for Health Informatics)
29 29 Summary Extreme scripting offers the potential for easy scaling of proven working practices Interesting technical problems relating to programming and I/O models Many wonderful applications Data-intensive computing is an end-to-end problem Data generation, integration, analysis, etc., is a continuous, loosely coupled process
30 Thank you! Computation Institute
Grid, cloud, and science: Accelerating discovery. A View and Practice from University of Chicago
Grid, cloud, and science: Accelerating discovery A View and Practice from University of Chicago Ian Foster Presented by Ioan Raicu Computation Institute Argonne National Lab & University of Chicago April
More informationTypically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times
Typically applied in clusters and grids Loosely-coupled applications with sequential jobs Large amounts of computing for long periods of times Measured in operations per month or years 2 Bridge the gap
More informationSynonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short
Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short periods of time Usually requires low latency interconnects
More informationIan Foster, An Overview of Distributed Systems
The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,
More informationArguably one of the most fundamental discipline that touches all other disciplines and people
The scientific and mathematical approach in information technology and computing Started in the 1960s from Mathematics or Electrical Engineering Today: Arguably one of the most fundamental discipline that
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Falkon, a Fast and Light-weight task execution framework for Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration
More informationIoan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago
Running 1 Million Jobs in 10 Minutes via the Falkon Fast and Light-weight Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian Foster,
More informationA Notation and System for Expressing and Executing Cleanly Typed Workflows on Messy Scientific Data
Zhao, Y., Dobson, J., Foster, I., Moreau, L., Wilde, M., A Notation and System for Expressing and Executing Cleanly Typed Workflows on Messy Scientific Data, SIGMOD Record, September 2005. A Notation and
More informationA Data Diffusion Approach to Large Scale Scientific Exploration
A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:
More informationManaging and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers
Managing and Executing Loosely-Coupled Large-Scale Applications on Clusters, Grids, and Supercomputers Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Collaborators:
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationNFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC
Segregated storage and compute NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC Co-located storage and compute HDFS, GFS Data
More informationWorkflow languages and systems
Swift is a system for the rapid and reliable specification, execution, and management of large-scale science and engineering workflows. It supports applications that execute many tasks coupled by disk-resident
More informationOverview Past Work Future Work. Motivation Proposal. Work-in-Progress
Overview Past Work Future Work Motivation Proposal Work-in-Progress 2 HPC: High-Performance Computing Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationExtreme-scale scripting: Opportunities for large taskparallel applications on petascale computers
Extreme-scale scripting: Opportunities for large taskparallel applications on petascale computers Michael Wilde, Ioan Raicu, Allan Espinosa, Zhao Zhang, Ben Clifford, Mihael Hategan, Kamil Iskra, Pete
More informationCase Studies in Storage Access by Loosely Coupled Petascale Applications
Case Studies in Storage Access by Loosely Coupled Petascale Applications Justin M Wozniak and Michael Wilde Petascale Data Storage Workshop at SC 09 Portland, Oregon November 15, 2009 Outline Scripted
More informationIoan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?
Ioan Raicu More information at: http://www.cs.iit.edu/~iraicu/ Everyone else Background? What do you want to get out of this course? 2 Data Intensive Computing is critical to advancing modern science Applies
More informationIntroduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work
Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work Today (2014):
More informationStorage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore
Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago DSL Seminar November st, 006 Analysis
More informationDesign and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming Zhao Zhang +, Allan Espinosa *, Kamil Iskra #, Ioan Raicu *, Ian Foster #*+, Michael Wilde #+ + Computation Institute,
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationIoan Raicu. Distributed Systems Laboratory Computer Science Department University of Chicago
The Quest for Scalable Support of Data Intensive Applications in Distributed Systems Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago In Collaboration with: Ian
More informationForming an ad-hoc nearby storage, based on IKAROS and social networking services
Forming an ad-hoc nearby storage, based on IKAROS and social networking services Christos Filippidis1, Yiannis Cotronis2 and Christos Markou1 1 Institute of Nuclear & Particle Physics, NCSR Demokritos,
More informationToday (2010): Multicore Computing 80. Near future (~2018): Manycore Computing Number of Cores Processing
Number of Cores Manufacturing Process 300 250 200 150 100 50 0 2004 2006 2008 2010 2012 2014 2016 2018 100 Today (2010): Multicore Computing 80 1~12 cores commodity architectures 70 60 80 cores proprietary
More informationLeveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands
Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing
More informationIME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
More informationData Management in Parallel Scripting
Data Management in Parallel Scripting Zhao Zhang 11/11/2012 Problem Statement Definition: MTC applications are those applications in which existing sequential or parallel programs are linked by files output
More informationAssistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)
Current position: Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Guest Research Faculty, Argonne National Laboratory
More informationIsilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team
Isilon: Raising The Bar On Performance & Archive Use Cases John Har Solutions Product Manager Unstructured Data Storage Team What we ll cover in this session Isilon Overview Streaming workflows High ops/s
More informationThe Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems
The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems Ioan Raicu, 1 Ian T. Foster, 1,2,3 Yong Zhao 4 Philip Little, 5 Christopher M. Moretti, 5 Amitabh Chaudhary, 5 Douglas
More informationAssistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)
Current position: Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Guest Research Faculty, Argonne National Laboratory
More informationSystem Software for Big Data and Post Petascale Computing
The Japanese Extreme Big Data Workshop February 26, 2014 System Software for Big Data and Post Petascale Computing Osamu Tatebe University of Tsukuba I/O performance requirement for exascale applications
More informationStorage for HPC, HPDA and Machine Learning (ML)
for HPC, HPDA and Machine Learning (ML) Frank Kraemer, IBM Systems Architect mailto:kraemerf@de.ibm.com IBM Data Management for Autonomous Driving (AD) significantly increase development efficiency by
More informationMetadata Ingestion and Processinng
biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch
More informationRutgers Discovery Informatics Institute (RDI2)
Rutgers Discovery Informatics Institute (RDI2) Manish Parashar h+p://rdi2.rutgers.edu Modern Science & Society Transformed by Compute & Data The era of Extreme Compute and Big Data New paradigms and prac3ces
More informationirods at TACC: Secure Infrastructure for Open Science Chris Jordan
irods at TACC: Secure Infrastructure for Open Science Chris Jordan What is TACC? Texas Advanced Computing Center Cyberinfrastructure Resources for Open Science University of Texas System 9 Academic, 6
More informationLife Sciences Oracle Based Solutions. June 2004
Life Sciences Oracle Based Solutions June 2004 Overview of Accelrys Leading supplier of computation tools to the life science and informatics research community: Bioinformatics Cheminformatics Modeling/Simulation
More informationComputer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research
Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationRevealing Applications Access Pattern in Collective I/O for Cache Management
Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationCo-existence: Can Big Data and Big Computation Co-exist on the Same Systems?
Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems? Dr. William Kramer National Center for Supercomputing Applications, University of Illinois Where these views come from Large
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationHPC Storage Use Cases & Future Trends
Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively
More informationHarnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets
Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress
More informationScalable Parallel Scripting for Scientific Computing
SWIFT Scalable Parallel Scripting for Scientific Computing Researchers at the University of Chicago and Argonne National Laboratory have been extending the timetested programming technique of scripting
More informationAnalytics in the cloud
Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA
More informationClouds: An Opportunity for Scientific Applications?
Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationData Intensive Scalable Computing. Thanks to: Randal E. Bryant Carnegie Mellon University
Data Intensive Scalable Computing Thanks to: Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Big Data Sources: Seismic Simulations Wave propagation during an earthquake Large-scale
More informationThe Blue Water s File/Archive System. Data Management Challenges Michelle Butler
The Blue Water s File/Archive System Data Management Challenges Michelle Butler (mbutler@ncsa.illinois.edu) NCSA is a World leader in deploying supercomputers and providing scientists with the software
More informationParallel Storage Systems for Large-Scale Machines
Parallel Storage Systems for Large-Scale Machines Doctoral Showcase Christos FILIPPIDIS (cfjs@outlook.com) Department of Informatics and Telecommunications, National and Kapodistrian University of Athens
More informationIBM Spectrum Scale IO performance
IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial
More informationA Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis
A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis Ashish Nagavaram, Gagan Agrawal, Michael A. Freitas, Kelly H. Telu The Ohio State University Gaurang Mehta, Rajiv. G. Mayani, Ewa Deelman
More informationSocial Informatics Data Grid
Social Informatics Data Grid Cyberinfrastructure for Collaborative Research in the Neural, Social and Behavioral Sciences Bennett I. Bertenthal Indiana University bbertent@indiana.edu Infrastructure for
More informationScientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute
Scientific Workflows and Cloud Computing Gideon Juve USC Information Sciences Institute gideon@isi.edu Scientific Workflows Loosely-coupled parallel applications Expressed as directed acyclic graphs (DAGs)
More informationKeywords: many-task computing; MTC; high-throughput computing; resource management; Falkon; Swift
Editorial Manager(tm) for Cluster Computing Manuscript Draft Manuscript Number: Title: Middleware Support for Many-Task Computing Article Type: HPDC Special Issue Section/Category: Keywords: many-task
More informationGPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations
GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations Argonne National Laboratory Argonne National Laboratory is located on 1,500
More informationVisualization for Scientists. We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape.
Visualization for Scientists We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape.org Transfer and synchronize files Easy fire-and-forget transfers
More informationSome Reflections on Advanced Geocomputations and the Data Deluge
Some Reflections on Advanced Geocomputations and the Data Deluge J. A. Rod Blais Dept. of Geomatics Engineering Pacific Institute for the Mathematical Sciences University of Calgary, Calgary, AB www.ucalgary.ca/~blais
More informationThe Data Exacell (DXC): Data Infrastructure Building Blocks for Integrating Analytics with Data Management
The Data Exacell (DXC): Data Infrastructure Building Blocks for Integrating Analytics with Data Management Nick Nystrom, Michael J. Levine, Ralph Roskies, and J Ray Scott Pittsburgh Supercomputing Center
More informationWhat is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?
Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation
More informationHealthGrids: In Search for Sustainable Solutions
HealthGrids: In Search for Sustainable Solutions Karl A. Stroetmann MBA PhD FRSM with Alexander Dobrev, Dainis Zegners empirica Communication & Technology Research, Bonn, Germany 1 Contents Definition
More informationThe Data exacell DXC. J. Ray Scott DXC PI May 17, 2016
The Data exacell DXC J. Ray Scott DXC PI May 17, 2016 DXC Leadership Mike Levine Co-Scientific Director Co-PI Nick Nystrom Senior Director of Research Co-PI Ralph Roskies Co-Scientific Director Co-PI Robin
More informationStorage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster
Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster. Overview Both the industry and academia have an increase demand for good policies and mechanisms to
More informationECE7995 (7) Parallel I/O
ECE7995 (7) Parallel I/O 1 Parallel I/O From user s perspective: Multiple processes or threads of a parallel program accessing data concurrently from a common file From system perspective: - Files striped
More informationBioGrid Australia - Health Through Information
Images and Oracle Database 11g BioGrid Australia - Health Through Information PRANABH JAIN and NAOMI RAFAEL Presented by Susan Mavris, Oracle Multimedia Agenda Purpose and Description of BioGrid Oracle
More informationGPFS for Life Sciences at NERSC
GPFS for Life Sciences at NERSC A NERSC & JGI collaborative effort Jason Hick, Rei Lee, Ravi Cheema, and Kjiersten Fagnan GPFS User Group meeting May 20, 2015-1 - Overview of Bioinformatics - 2 - A High-level
More informationEMC VMAX 400K SPC-2 Proven Performance. Silverton Consulting, Inc. StorInt Briefing
EMC VMAX 400K SPC-2 Proven Performance Silverton Consulting, Inc. StorInt Briefing EMC VMAX 400K SPC-2 PROVEN PERFORMANCE PAGE 2 OF 10 Introduction In this paper, we analyze all- flash EMC VMAX 400K storage
More informationdan.fay@microsoft.com Scientific Data Intensive Computing Workshop 2004 Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through
More informationDDN About Us Solving Large Enterprise and Web Scale Challenges
1 DDN About Us Solving Large Enterprise and Web Scale Challenges History Founded in 98 World s Largest Private Storage Company Growing, Profitable, Self Funded Headquarters: Santa Clara and Chatsworth,
More informationPERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS
PERFORMANCE ANALYSIS AND OPTIMIZATION OF MULTI-CLOUD COMPUITNG FOR LOOSLY COUPLED MTC APPLICATIONS V. Prasathkumar, P. Jeevitha Assiatant Professor, Department of Information Technology Sri Shakthi Institute
More informationProblems for Resource Brokering in Large and Dynamic Grid Environments
Problems for Resource Brokering in Large and Dynamic Grid Environments Cătălin L. Dumitrescu Computer Science Department The University of Chicago cldumitr@cs.uchicago.edu (currently at TU Delft) Kindly
More informationSAS workload performance improvements with IBM XIV Storage System Gen3
SAS workload performance improvements with IBM XIV Storage System Gen3 Including performance comparison with XIV second-generation model Narayana Pattipati IBM Systems and Technology Group ISV Enablement
More informationAccelerating Large Scale Scientific Exploration through Data Diffusion
Accelerating Large Scale Scientific Exploration through Data Diffusion Ioan Raicu *, Yong Zhao *, Ian Foster #*+, Alex Szalay - {iraicu,yongzh }@cs.uchicago.edu, foster@mcs.anl.gov, szalay@jhu.edu * Department
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationParallel and Distributed File Systems
CSE 710 Seminar Parallel and Distributed File Systems Tevfik Kosar, Ph.D. Week 1: January 29, 2014 Data Deluge Big Data in Science Scientific data outpaced Moore s Law! Demand for data brings demand for
More informationProfessor: Ioan Raicu. TA: Wei Tang. Everyone else
Professor: Ioan Raicu http://www.cs.iit.edu/~iraicu/ http://datasys.cs.iit.edu/ TA: Wei Tang http://mypages.iit.edu/~wtang6/ Everyone else Background? What do you want to get out of this course? 2 General
More informationCommercial Data Intensive Cloud Computing Architecture: A Decision Support Framework
Association for Information Systems AIS Electronic Library (AISeL) CONF-IRM 2014 Proceedings International Conference on Information Resources Management (CONF-IRM) 2014 Commercial Data Intensive Cloud
More informationTHE conventional architecture of high-performance
1 Towards Exploring Data-Intensive Scientific Applications at Extreme Scales through Systems and Simulations Dongfang Zhao, Ning Liu, Dries Kimpe, Robert Ross, Xian-He Sun, and Ioan Raicu Abstract The
More informationAUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT
AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT By Joshua Kwedar Sr. Systems Engineer By Steve Horan Cloud Architect ATS Innovation Center, Malvern, PA Dates: Oct December 2017 INTRODUCTION
More informationMathematics and Computer Science Division. Department of Agricultural and Biological Engineering
Mathematics and Computer Science Division Department of Science and Technologies University of Naples Parthenope FACE-IT: Earth science workflows made easy with Globus and Galaxy technologies (Provide
More informationDeep Learning mit PowerAI - Ein Überblick
Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s
More informationEducating a New Breed of Data Scientists for Scientific Data Management
Educating a New Breed of Data Scientists for Scientific Data Management Jian Qin School of Information Studies Syracuse University Microsoft escience Workshop, Chicago, October 9, 2012 Talk points Data
More informationChapter 1: Introduction to Parallel Computing
Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationData Life Cycle. Research. Access Collaborate. Acquire. Analyse. Comprehend. Plan. Manage Archive. Publish Reuse
Automated ingest and management Access Collaborate Dataset transfer Databases Web-based file sharing Collaborative sites Acquire Analyse Technical advice Costing Grant assistance Plan Research Data Life
More informationData Centres in the Virtual Observatory Age
Data Centres in the Virtual Observatory Age David Schade Canadian Astronomy Data Centre A few things I ve learned in the past two days There exist serious efforts at Long-Term Data Preservation Alliance
More informationMagellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009
Magellan Project Jeff Broughton NERSC Systems Department Head October 7, 2009 1 Magellan Background National Energy Research Scientific Computing Center (NERSC) Argonne Leadership Computing Facility (ALCF)
More informationIBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform
IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform A vendor-neutral medical-archive offering Dave Curzio IBM Systems and Technology Group ISV Enablement February
More informationSmart Trading with Cray Systems: Making Smarter Models + Better Decisions in Algorithmic Trading
Smart Trading with Cray Systems: Making Smarter Models + Better Decisions in Algorithmic Trading Smart Trading with Cray Systems Agenda: Cray Overview Market Trends & Challenges Mitigating Risk with Deeper
More informationUsing MPI One-sided Communication to Accelerate Bioinformatics Applications
Using MPI One-sided Communication to Accelerate Bioinformatics Applications Hao Wang (hwang121@vt.edu) Department of Computer Science, Virginia Tech Next-Generation Sequencing (NGS) Data Analysis NGS Data
More informationMoving e-infrastructure into a new era the FP7 challenge
GARR Conference 18 May 2006 Moving e-infrastructure into a new era the FP7 challenge Mário Campolargo European Commission - DG INFSO Head of Unit Research Infrastructures Example of e-science challenges
More informationIntroduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
More informationA Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing
A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas Institute of Computer Science (ICS) Foundation for Research and
More informationLife In The Flash Director - EMC Flash Strategy (Cross BU)
1 Life In The Flash Lane @SamMarraccini, Director - EMC Flash Strategy (Cross BU) CONSTANT 2 Performance = Moore s Law, Or Does It? MOORE S LAW: 100X PER DECADE FLASH Closes The CPU To Storage Gap FLASH
More informationAdvanced School in High Performance and GRID Computing November Introduction to Grid computing.
1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste
More informatione-infrastructure: objectives and strategy in FP7
"The views expressed in this presentation are those of the author and do not necessarily reflect the views of the European Commission" e-infrastructure: objectives and strategy in FP7 National information
More information