Dac-Man: Data Change Management for Scientific Datasets on HPC systems
|
|
- Catherine Mosley
- 5 years ago
- Views:
Transcription
1 Dac-Man: Data Change Management for Scientific Datasets on HPC systems Devarshi Ghoshal Lavanya Ramakrishnan Deborah Agarwal Lawrence Berkeley National Laboratory
2 Motivation Data Releases Storage Resources Compute Resources Scientific Discovery Large scientific datasets are frequently updated limited or no provenance Longer time to scientific discovery scientists delay updating the downstream data products due to complexity Inefficient use of compute and storage resources users often rerun data processing pipelines without understanding the impact of change 2
3 Limitations of Existing Tools Sequentially compare datasets Do not save/reuse change information Generates un-interpretable change results Lack information necessary to assess the impact of data change Unable to quantify change for scientific datasets Do not scale 3
4 Dac-Man: DAta Change MANagement A framework that identifies, captures and manages change in large scientific datasets, and enables plug-in of domain-specific change analysis with minimal user effort Is not a version control system Designed to scale on an HPC system Allows users to efficiently interpret and quantify changes 4
5 Architecture User interfacing components Change tracker: scans and compares data objects; manages data change Query manager: retrieves change information Change and metadata management Indexing area: stores indexes and filesystem metadata Caching area: stages file and data change information 5
6 Dac-Man Indexes Key: File Value: Hash(File data) Value: File Key: Hash(File data) Bi-directional indexing helps Dac-Man identify both data and metadata changes in files MPI workers build the indexes in parallel Saved on filesystem for portability and reuse 6
7 Comparators File comparator recursively compares files and directories, including subdirectories and symbolic links uses indexes to compare the files classifies file changes into different types modified, metadata-only, added/deleted Data comparator compares data within files Data adaptors transform different scientific data formats into Dac- Man records Dac-Man records are a collection of key-value pairs allows external scripts to be used as data comparators 7
8 Dac-Man Cache Dataset-1 Datapath-1 Datapath-2 Cache-id : Cached query Dataset-2 Cache entry: Cache-id Change metadata Similar to a staging area in Git Saves file change metadata, including subdirectory changes Used for improving subsequent change retrieval queries Invalid cache entries are updated by re-indexing and recomputing the changes 8
9 Change Capture Workflow Command-line: dacman diff [options] OLD NEW scan index compare diff crawls data directories, saves directory structures and associated file metadata indexes filesystem objects uses indexes to compare files and save the change results retrieves the changes 9
10 Data Provenance in Dac-Man Metadata information captured by Dac-Man contributes to the provenance of a dataset Tracks data provenance in workflows by correlating data changes between inputs and outputs Enables using provenance information to analyze the impact of changes V1 V2 V3 10
11 Evaluation System NERSC s Cori supercomputer 32 cores per node, 128 GB DDR4 memory Datasets Sloan Digital Sky Survey (SDSS) primarily consists of FITS files multiple data releases with approx. 9.7 million files Fluxnet consists of CSV files approx files Synthetic files contain arbitrary binary data different number of files and amount of data for controlled experiments Tools: Unix diff, Git diff, Python filecmp 11
12 Comparison to Existing Tools Performs 100x better than existing diff tools 12
13 Scalability of Indexing Indexing time reduces as more resources are allocated 13
14 Amount of Data Change Performance is constant irrespective of the amount of change 14
15 Change Retrieval Speeds up change retrieval by caching change information 15
16 Conclusions Provides a scalable framework for identifying and capturing changes in large scientific datasets Generates and shares data change summaries useful for data consumers Identifies, captures and tracks changes of different types and granularities Provides a portable solution to compare remote datasets Provides meaningful change results through the use of domainspecific change metrics Can be integrated with data processing pipelines and streaming datasets for real-time change analysis 16
17 Acknowledgments Deduce Project ( U.S. DOE Contract No. DE-AC02-05CH11231 Program Manager: Rich Carlson Stephen Bailey, Boris Faybishenko, Juliane Mueller, Alex Romosan, Craig Tull 17
Seeking Supernovae in the Clouds: A Performance Study
Seeking Supernovae in the Clouds: A Performance Study Keith R. Jackson, Lavanya Ramakrishnan, Karl J. Runge, Rollin C. Thomas Lawrence Berkeley National Laboratory Why Do I Care About Supernovae? The rate
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationPerformance and Energy Usage of Workloads on KNL and Haswell Architectures
Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationAutomating Real-time Seismic Analysis
Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows Rafael Ferreira da Silva, Ph.D. http://pegasus.isi.edu Do we need seismic analysis? Pegasus http://pegasus.isi.edu
More informationTigres: Template Interfaces for Agile Parallel Data-Intensive Science
Tigres: Template Interfaces for Agile Parallel Data-Intensive Science http://tigres.lbl.gov Lavanya Ramakrishnan LRamakrishnan@lbl.gov 1 (CS Biased) View of Workflow Challenges: Gene2Life Molecular Biology
More informationPegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.
Pegasus Automate, recover, and debug scientific computations. Rafael Ferreira da Silva http://pegasus.isi.edu Experiment Timeline Scientific Problem Earth Science, Astronomy, Neuroinformatics, Bioinformatics,
More informationWorkload Characterization using the TAU Performance System
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, and Alan Morris Performance Research Laboratory, Department of Computer and Information Science University of
More informationUsing a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales
Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales Margaret Lawson, Jay Lofstead Sandia National Laboratories is a multimission laboratory managed and operated
More informationManaging large-scale workflows with Pegasus
Funded by the National Science Foundation under the OCI SDCI program, grant #0722019 Managing large-scale workflows with Pegasus Karan Vahi ( vahi@isi.edu) Collaborative Computing Group USC Information
More informationescience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows
escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Jie Li1, Deb Agarwal2, Azure Marty Platform Humphrey1, Keith Jackson2, Catharine van Ingen3, Youngryel Ryu4
More informationSDS: A Framework for Scientific Data Services
SDS: A Framework for Scientific Data Services Bin Dong, Suren Byna*, John Wu Scientific Data Management Group Lawrence Berkeley National Laboratory Finding Newspaper Articles of Interest Finding news articles
More informationCase Studies in Storage Access by Loosely Coupled Petascale Applications
Case Studies in Storage Access by Loosely Coupled Petascale Applications Justin M Wozniak and Michael Wilde Petascale Data Storage Workshop at SC 09 Portland, Oregon November 15, 2009 Outline Scripted
More informationThe VGG Face Finder (VFF) Engine
The VGG Face Finder (VFF) Engine Performs a visual search over a dataset of images with faces Automatically finds images matching your query within the dataset Input can be a text string or an image It
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationArrayUDF Explores Structural Locality for Faster Scientific Analyses
ArrayUDF Explores Structural Locality for Faster Scientific Analyses John Wu 1 Bin Dong 1, Surendra Byna 1, Jialin Liu 1, Weijie Zhao 2, Florin Rusu 1,2 1 LBNL, Berkeley, CA 2 UC Merced, Merced, CA Two
More informationSpark and HPC for High Energy Physics Data Analyses
Spark and HPC for High Energy Physics Data Analyses Marc Paterno, Jim Kowalkowski, and Saba Sehrish 2017 IEEE International Workshop on High-Performance Big Data Computing Introduction High energy physics
More informationRENKU - Reproduce, Reuse, Recycle Research. Rok Roškar and the SDSC Renku team
RENKU - Reproduce, Reuse, Recycle Research Rok Roškar and the SDSC Renku team Renku-Reana workshop @ CERN 26.06.2018 Goals of Renku 1. Provide the means to create reproducible data science 2. Facilitate
More informationData storage on Triton: an introduction
Motivation Data storage on Triton: an introduction How storage is organized in Triton How to optimize IO Do's and Don'ts Exercises slide 1 of 33 Data storage: Motivation Program speed isn t just about
More informationirods for Data Management and Archiving UGM 2018 Masilamani Subramanyam
irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam Agenda Introduction Challenges Data Transfer Solution irods use in Data Transfer Solution irods Proof-of-Concept Q&A Introduction
More informationThe Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter
The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select
More informationScientific Cluster Deployment and Recovery Using puppet to simplify cluster management
Journal of Physics: Conference Series Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management To cite this article: Val Hendrix et al 2012 J. Phys.: Conf. Ser. 396 042027
More informationEngagement With Scientific Facilities
Engagement With Scientific Facilities Eli Dart, Network Engineer ESnet Science Engagement Lawrence Berkeley National Laboratory Global Science Engagement Panel Internet2 Technology Exchange San Francisco,
More informationMain Points. File layout Directory layout
File Systems Main Points File layout Directory layout File System Design Constraints For small files: Small blocks for storage efficiency Files used together should be stored together For large files:
More informationMassively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Mostofa Patwary 1, Nadathur Satish 1, Narayanan Sundaram 1, Jilalin Liu 2, Peter Sadowski 2, Evan Racah 2, Suren Byna 2, Craig
More informationStorage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster
Storage and Compute Resource Management via DYRE, 3DcacheGrid, and CompuStore Ioan Raicu, Ian Foster. Overview Both the industry and academia have an increase demand for good policies and mechanisms to
More informationApplication and System Memory Use, Configuration, and Problems on Bassi. Richard Gerber
Application and System Memory Use, Configuration, and Problems on Bassi Richard Gerber Lawrence Berkeley National Laboratory NERSC User Services ScicomP 13, Garching, Germany, July 17, 2007 NERSC is supported
More informationA Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017
A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council Perth, July 31-Aug 01, 2017 http://levlafayette.com Necessary and Sufficient Definitions High Performance Computing: High
More informationGrid Computing Systems: A Survey and Taxonomy
Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical
More informationStrategies for Sound Internet Measurement
Strategies for Sound Internet Measurement Vern Paxson Presented by Hossein Falaki Vern Paxson M.S. and Ph.D. degrees Berkeley Staff scientist at the Lawrence Berkeley National Laboratory Founder of the
More informationData publication and discovery with Globus
Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,
More informationComputational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich
Computational Databases: Inspirations from Statistical Software Linnea Passing, linnea.passing@tum.de Technical University of Munich Data Science Meets Databases Data Cleansing Pipelines Fuzzy joins Data
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationHigh Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove
High Performance Data Analytics for Numerical Simulations Bruno Raffin DataMove bruno.raffin@inria.fr April 2016 About this Talk HPC for analyzing the results of large scale parallel numerical simulations
More informationToward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure
Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Arnab K. Paul, Ryan Chard, Kyle Chard, Steven Tuecke, Ali R. Butt, Ian Foster Virginia Tech, Argonne National
More informationA Data Diffusion Approach to Large Scale Scientific Exploration
A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationFast Forward I/O & Storage
Fast Forward I/O & Storage Eric Barton Lead Architect 1 Department of Energy - Fast Forward Challenge FastForward RFP provided US Government funding for exascale research and development Sponsored by 7
More informationExecuting Evaluations over Semantic Technologies using the SEALS Platform
Executing Evaluations over Semantic Technologies using the SEALS Platform Miguel Esteban-Gutiérrez, Raúl García-Castro, Asunción Gómez-Pérez Ontology Engineering Group, Departamento de Inteligencia Artificial.
More informationdan.fay@microsoft.com Scientific Data Intensive Computing Workshop 2004 Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through
More informationA High-Level Distributed Execution Framework for Scientific Workflows
A High-Level Distributed Execution Framework for Scientific Workflows Jianwu Wang 1, Ilkay Altintas 1, Chad Berkley 2, Lucas Gilbert 1, Matthew B. Jones 2 1 San Diego Supercomputer Center, UCSD, U.S.A.
More informationA Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal
More informationKepler and Grid Systems -- Early Efforts --
Distributed Computing in Kepler Lead, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, (Joint work with Matthew Jones) 6th Biennial Ptolemy Miniconference Berkeley,
More informationExtreme I/O Scaling with HDF5
Extreme I/O Scaling with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group koziol@hdfgroup.org July 15, 2012 XSEDE 12 - Extreme Scaling Workshop 1 Outline Brief overview of
More informationScalable, Automated Characterization of Parallel Application Communication Behavior
Scalable, Automated Characterization of Parallel Application Communication Behavior Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory 12 th Scalable Tools Workshop
More informationEnabling a SuperFacility with Software Defined Networking
Enabling a SuperFacility with Software Defined Networking Shane Canon Tina Declerck, Brent Draney, Jason Lee, David Paul, David Skinner May 2017 CUG 2017-1 - SuperFacility - Defined Combining the capabilities
More informationIntroduction to Geodatabase and Spatial Management in ArcGIS. Craig Gillgrass Esri
Introduction to Geodatabase and Spatial Management in ArcGIS Craig Gillgrass Esri Session Path The Geodatabase - What is it? - Why use it? - What types are there? - What can I do with it? Query Layers
More informationHarnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets
Page 1 of 5 1 Year 1 Proposal Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets Year 1 Progress Report & Year 2 Proposal In order to setup the context for this progress
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More informationFile System Interface and Implementation
Unit 8 Structure 8.1 Introduction Objectives 8.2 Concept of a File Attributes of a File Operations on Files Types of Files Structure of File 8.3 File Access Methods Sequential Access Direct Access Indexed
More informationIntroduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
More informationIntroduction to The Storage Resource Broker
http://www.nesc.ac.uk/training http://www.ngs.ac.uk Introduction to The Storage Resource Broker http://www.pparc.ac.uk/ http://www.eu-egee.org/ Policy for re-use This presentation can be re-used for academic
More informationDistributed Memory Parallel Markov Random Fields Using Graph Partitioning
Distributed Memory Parallel Markov Random Fields Using Graph Partitioning Colleen Heinemann, Talita Perciano, Daniela Ushizima, Wes Bethel December 11, 2017 Overview What is MRF-based image segmentation?
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationPerformance Analysis of Parallel Scientific Applications In Eclipse
Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains
More informationOverview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB
Grids and Workflows Overview Scientific workflows and Grids Taxonomy Example systems Kepler revisited Data Grids Chimera GridDB 2 Workflows and Grids Given a set of workflow tasks and a set of resources,
More informationEnosis: Bridging the Semantic Gap between
Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationCommercial Data Intensive Cloud Computing Architecture: A Decision Support Framework
Association for Information Systems AIS Electronic Library (AISeL) CONF-IRM 2014 Proceedings International Conference on Information Resources Management (CONF-IRM) 2014 Commercial Data Intensive Cloud
More informationPegasus. Pegasus Workflow Management System. Mats Rynge
Pegasus Pegasus Workflow Management System Mats Rynge rynge@isi.edu https://pegasus.isi.edu Automate Why workflows? Recover Automates complex, multi-stage processing pipelines Enables parallel, distributed
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationCompilers and Compiler-based Tools for HPC
Compilers and Compiler-based Tools for HPC John Mellor-Crummey Department of Computer Science Rice University http://lacsi.rice.edu/review/2004/slides/compilers-tools.pdf High Performance Computing Algorithms
More informationShifter: Fast and consistent HPC workflows using containers
Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter
More informationHPC on Sun Today and Tomorrow VIRACOCHA: An Efficient Parallelization Framework Processing in Virtual Environments. Andreas Gerndt
HPC on Sun Today and Tomorrow VIRACOCHA: An Efficient Parallelization Framework for Large-Scale CFD Post-Processing Processing in Virtual Environments Andreas Gerndt Aachen University (RWTH), Germany Center
More informationOnline Monitoring of I/O
Introduction On-line Monitoring Framework Evaluation Summary References Research Group German Climate Computing Center 23-03-2017 Introduction On-line Monitoring Framework Evaluation Summary References
More informationICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013
ICAT Job Portal a generic job submission system built on a scientific data catalog IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013 Steve Fisher, Kevin Phipps and Dan Rolfe Rutherford Appleton Laboratory
More informationThe State and Needs of IO Performance Tools
The State and Needs of IO Performance Tools Scalable Tools Workshop Lake Tahoe, CA August 6 12, 2017 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National
More informationTaming Metadata Storms in Parallel Filesystems with MetaFS. Tim Shaffer
Taming Metadata Storms in Parallel Filesystems with MetaFS Tim Shaffer Motivation A (well-meaning) user tried to run a bioinformatics pipeline to analyze a batch of genomic data. 2 Motivation Shared filesystem
More informationGraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management
GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management Dong Dai, Philip Carns, Robert B. Ross, John Jenkins, Kyle Blauer, and Yong Chen Metadata Management Challenges in HPC
More informationThe Why and How of HPC-Cloud Hybrids with OpenStack
The Why and How of HPC-Cloud Hybrids with OpenStack OpenStack Australia Day Melbourne June, 2017 Lev Lafayette, HPC Support and Training Officer, University of Melbourne lev.lafayette@unimelb.edu.au 1.0
More informationData Analytics with HPC. Data Streaming
Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationParallel In-situ Data Processing Techniques
Parallel In-situ Data Processing Techniques Florin Rusu, Yu Cheng University of California, Merced Outline Background SCANRAW Operator Speculative Loading Evaluation Astronomy: FITS Format Sloan Digital
More informationData Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand
and the middleware CiGri for the Whisper project Use Case of Data-Intensive processing with irods Collaboration between: IT part of Whisper: Sofware development, computation () Platform Ciment: IT infrastructure
More informationINTRODUCTION TO DATA MINING
INTRODUCTION TO DATA MINING 1 Chiara Renso KDDLab - ISTI CNR, Italy http://www-kdd.isti.cnr.it email: chiara.renso@isti.cnr.it Knowledge Discovery and Data Mining Laboratory, ISTI National Research Council,
More informationPatternFinder is a tool that finds non-overlapping or overlapping patterns in any input sequence.
PatternFinder is a tool that finds non-overlapping or overlapping patterns in any input sequence. Pattern Finder Input Parameters: USAGE: PatternDetective.exe [ -help /? -f [filename] -min -max [minimum
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture
More informationVis: Online Analysis Tool for Lattice QCD
: Online Analysis Tool for Lattice QCD School of Computing - DePaul University - Chicago E-mail: mdipierro@cs.depaul.edu Yaoqian Zhong School of Computing - DePaul University - Chicago E-mail: ati_zhong@hotmail.com
More informationPortable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.
Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the
More informationFile Control System 1.0 Product Requirements Document (PRD)
File Control System 1.0 Product Requirements Document (PRD) Author: Ken Rodham Date: January 10, 2005 Revision: 2 Overview This document specifies the requirements for the File Control System 1.0 (FCS).
More informationPresented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY
Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory gmkurtzer@lbl.gov CONTAINERS IN HPC WITH SINGULARITY A QUICK REVIEW OF THE LANDSCAPE Many types of virtualization
More informationStarting small to go Big: Building a Living Database
Starting small to go Big: Building a Living Database Michael Sabbatino 1,2, Baker, D.V. Vic 3,4, Rose, K. 1, Romeo, L. 1,2, Bauer, J. 1, and Barkhurst, A. 3,4 1 US Department of Energy, National Energy
More informationPerformance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationThe Grid Architecture
U.S. Department of Energy Office of Science The Grid Architecture William E. Johnston Distributed Systems Department Computational Research Division Lawrence Berkeley National Laboratory dsd.lbl.gov What
More informationThe iplant Data Commons
The iplant Data Commons Using irods to Facilitate Data Dissemination, Discovery, and Reproducibility Jeremy DeBarry, jdebarry@iplantcollaborative.org Tony Edgin, tedgin@iplantcollaborative.org Nirav Merchant,
More informationHigh Performance Computing. Introduction to Parallel Computing
High Performance Computing Introduction to Parallel Computing Acknowledgements Content of the following presentation is borrowed from The Lawrence Livermore National Laboratory https://hpc.llnl.gov/training/tutorials
More informationThe Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects. Bob Rogers, Application Matrix
The Storage Networking Industry Association (SNIA) Data Preservation and Metadata Projects Bob Rogers, Application Matrix Overview The Self Contained Information Retention Format Rationale & Objectives
More informationThe Constellation Project. Andrew W. Nash 14 November 2016
The Constellation Project Andrew W. Nash 14 November 2016 The Constellation Project: Representing a High Performance File System as a Graph for Analysis The Titan supercomputer utilizes high performance
More informationIntroduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1
Introduction to OS File Management MOS Ch. 4 Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Introduction to OS 1 File Management Objectives Provide I/O support for a variety of storage device
More informationTagFS: A simple tag-based filesystem
TagFS: A simple tag-based filesystem Scott Bezek sbezek@mit.edu Raza (R07) 6.033 Design Project 1 March 17, 2011 1 Introduction TagFS is a simple yet effective tag-based filesystem. Instead of organizing
More informationSegmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)
Review Segmentation Segmentation Implementation Advantage of Segmentation Protection Sharing Segmentation with Paging Segmentation with Paging Segmentation with Paging Reason for the segmentation with
More informationAccelerating Parallel Analysis of Scientific Simulation Data via Zazen
Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation
More informationTHE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel
THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel National Center for Supercomputing Applications University of Illinois
More informationSimulation Data Shaping using a Protocol Independent Simulation Framework (PISF) on a Service Oriented Simulation Architecture (SOSA)
Simulation Shaping using a Protocol Independent Simulation Framework (PISF) on a Oriented Simulation Architecture (SOSA) Abhay Misra; Craig Pepper Systems Analysis Laboratory, Boeing Defence Australia
More informationSDSS Dataset and SkyServer Workloads
SDSS Dataset and SkyServer Workloads Overview Understanding the SDSS dataset composition and typical usage patterns is important for identifying strategies to optimize the performance of the AstroPortal
More informationEMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations
EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations Photos placed in horizontal position with even amount of white space between photos and header Margaret Lawson, Jay Lofstead,
More informationLog-structured files for fast checkpointing
Log-structured files for fast checkpointing Milo Polte Jiri Simsa, Wittawat Tantisiriroj,Shobhit Dayal, Mikhail Chainani, Dilip Kumar Uppugandla, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationPSICon Daniel G. A. Smith The Molecular Sciences Software molssi.org
PSICon 2018 Daniel G. A. Smith The Molecular Sciences Software Institute @dga_smith molssi.org MolSSI Education Initiatives How do we change the software practices of an entire field? Primary objectives:
More informationRicardo Rocha. Department of Computer Science Faculty of Sciences University of Porto
Ricardo Rocha Department of Computer Science Faculty of Sciences University of Porto Slides based on the book Operating System Concepts, 9th Edition, Abraham Silberschatz, Peter B. Galvin and Greg Gagne,
More information