Enabling Scalable Data Analysis for Large Computa9onal Structural Biology Datasets on Distributed Memory Systems

Size: px
Start display at page:

Download "Enabling Scalable Data Analysis for Large Computa9onal Structural Biology Datasets on Distributed Memory Systems"

Transcription

1 Enabling Scalable Data Analysis for Large Computa9onal Structural Biology Datasets on Distributed Memory Systems Michela Taufer Global Compu9ng Laboratory Computer and Informa9on Sciences University of Delaware

2 In- situ Analysis The perfect in- situ analysis algorithm: Avoids the need for moving data Uses a limited amount of memory Executes sufficiently fast Dataset NleNle - PDB ID 2F4K from Vijay Pande group Can we perform in- situ analysis on trajectories generated in protein folding, predic9on, or refinement simula9ons? 1

3 Start from unfolded conforma9ons of a protein with correct chemical bonds but random torsion angles Search for a conforma9on close to the observed na9ve (folded) conforma9on Protein Folding Process Reynaud, E. (21) Protein Misfolding and Degenera<ve Diseases. Nature Educa<on 3(9):28 2

4 Start from unfolded conforma9ons of a protein with correct chemical bonds but random torsion angles Search for a conforma9on close to the observed na9ve (folded) conforma9on Protein Folding Process Reynaud, E. (21) Protein Misfolding and Degenera<ve Diseases. Nature Educa<on 3(9):28 3

5 Scien9fic Problem Cluster folding trajectories into recurrent paverns based on geometrical varia9ons in 9me of the folding protein frames Node 1 Node 2 Trajectory segment 1 Trajectory segment RMSD RMSD RMSD RMSD frame / conformations Frame (snapshot) 2 frame / conformations Frame (snapshot) 4

6 Scien9fic Problem Cluster folding trajectories into recurrent paverns based on geometrical varia9ons in 9me of the folding protein conforma9ons Intra- trajectory analysis - > iden9fy meta- stable and transi9on stages within trajectory Meta- stable 1 8 RMSD RMSD 6 4 RMSD RMSD Frame (snapshot) 2 Transi9on Frame (snapshot) 5

7 Limits of Current Prac9ce Centralized clustering analysis [Phillips et al. 213] Wait for a job to end before to move its data (trajectory segment) to a centralize server Does not scale when the dataset is large May end up was9ng compu9ng resources e.g., by trying to further fold proteins that are already in a stable condi9on 6

8 From Protein Conforma9ons to Metadata From a single protein conforma9on to a 3D point capturing the atom distances within the frame distance matrix (DM) atoms three leading eigenvalues atoms (x,y,z) Conforma9on (Data from F@H) mul9- dimensional scaling metadata B. Zhang, T. Estrada, P. Cicoc, and M. Taufer. Enabling In- situ Data Analysis for Large Protein- folding Trajectory Datasets. In IPDPS, 214 7

9 From Metadata to Scien9fic Knowledge Given a set of confirma9ons and their 3D points in a segment of the trajectory: 5 3 largest eigenvalues as (x,y,z)

10 From Metadata to Scien9fic Knowledge Given a set of confirma9ons and their 3D points in a segment of the trajectory Par99on the 3D points into 2 clusters using fuzzy c- means (with c = 2)

11 From Metadata to Scien9fic Knowledge Given a set of confirma9ons and their 3D points in a segment of the trajectory: Par99on the 3D points into 2 clusters using fuzzy c- means (with c = 2) Test the equality of the 2 clusters using Welch s t- test with p- value <

12 From Metadata to Scien9fic Knowledge Given a set of confirma9ons and their 3D points in a segment of the trajectory: Par99on the 3D points into 2 clusters using fuzzy c- means (with c = 2) Test the equality of the 2 clusters using Welch s t- test with p- value <.1 Itera9ve par99on on the cluster with more points

13 From Metadata to Scien9fic Knowledge Given a set of confirma9ons and their 3D points in a segment of the trajectory: Par99on the 3D points into 2 clusters using fuzzy c- means (with c = 2) Test the equality of the 2 clusters using Welch s t- test with p- value <.1 Itera9ve par99on on the cluster with more points Finish when the 2 clusters are equal Stage 1 Stage 2 Stage 3 Representative

14 From Metadata to Scien9fic Knowledge Intra- trajectory analysis of an ensemble of - conforma9ons containing one meta- stable stage followed by a transi9on stage and another meta- stable stage Our clustering iden9fies two stable stages and one transi9on stage 13

15 From Metadata to Scien9fic Knowledge Intra- trajectory analysis of an ensemble of conforma9ons containing one meta- stable stage followed by a transi9on stage and another meta- stable stage 5 3 RMSD Stage 1 Stage 2 Stage 3 Representative Our clustering iden9fies two stable stages and one transi9on stage Protein conformations RMSDs of protein conforma9ons iden9fy the same three stages 14

16 From Metadata to Scien9fic Knowledge Intra- trajectory analysis of an ensemble of - conforma9ons containing one meta- stable stage only Our clustering iden9fies one single stage 15

17 From Metadata to Scien9fic Knowledge Intra- trajectory analysis of an ensemble of - conforma9ons containing one meta- stable stage only Stage 1 Representative Our clustering iden9fies one single stage RMSD Protein conformations RMSDs of protein conforma9ons iden9fy one single stage 16

18 Time (sec) Performance Compare our approach with tradi9onal clustering method proposed by Phillips et al s B Folding trajectories of villin headpiece subdomain (HP- 35 NleNle) Parallel MATLAB on 256 Gordon compute cores Time(sec) Memory usage (MB) Data movement (KB) 1116s B 1 Memory per core (MB) MB 1MB Data moved per analysis (KB) KB 539MB - - Our method Phillips' method Our method Phillips' method Our method Phillips' method Our method performs orders of magnitude bever in terms of 9me, memory usage and data movement 17

19 Lessons Learned and Future Direc9ons The distributed analysis of structural biology data is feasible and scalable Our approach transforms data proper9es into metadata concurrently and extracts scien9fic insights from metadata with small data movement Our approach makes the in- situ analysis feasible by: By avoiding the need for moving data, using a limited amount of memory, and execu9ng sufficiently fast Can we extend our approach to larger scales, i.e., larger datasets and more diverse datasets? 18

20 Global Compu9ng Lab: Boyu Zhang Travis Johnston Acknowledgments Collaborators: Trilce Estrada (UNM) Pietro Cicoc (SDSC) Roger Armen (TJU) Special thanks to: D.E. Shaw group Vijay Pande group (Stanford) volunteers volunteer Sponsors: 19

Enabling in-situ data analysis for large protein-folding trajectory datasets

Enabling in-situ data analysis for large protein-folding trajectory datasets Enabling in-situ data analysis for large protein-folding trajectory datasets Boyu Zhang, Trilce Estrada, Pietro Cicotti, Michela Taufer University of Delaware {bzhang, taufer}@udel.edu University of New

More information

BANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS

BANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS BANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS Bahman Javadi School of Computing, Engineering and Mathematics Western Sydney University, Australia 1 Boyu Zhang and Michela Taufer

More information

Accurate Scoring of Drug Conformations at the Extreme Scale

Accurate Scoring of Drug Conformations at the Extreme Scale Accurate Scoring of Drug Conformations at the Extreme Scale Boyu Zhang, Trilce Estrada, Pietro Cicotti, Pavan Balaji, Michela Taufer University of Delaware {bzhang, taufer}@udel.edu University of New Mexico

More information

Reengineering high-throughput molecular datasets for scalable clustering using MapReduce

Reengineering high-throughput molecular datasets for scalable clustering using MapReduce Reengineering high-throughput molecular datasets for scalable clustering using MapReduce Trilce Estrada, Boyu Zhang, Pietro Cicotti, Roger Armen 3, and Michela Taufer University of Delaware 3 Thomas Jefferson

More information

BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite

BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang and Jianfeng Zhan INSTITUTE

More information

Outline. In Situ Data Triage and Visualiza8on

Outline. In Situ Data Triage and Visualiza8on In Situ Data Triage and Visualiza8on Kwan- Liu Ma University of California at Davis Outline In situ data triage and visualiza8on: Issues and strategies Case study: An earthquake simula8on Case study: A

More information

Decentralized K-Means Clustering with Emergent Computing

Decentralized K-Means Clustering with Emergent Computing Decentralized K-Means Clustering with Emergent Computing Ryan McCune & Greg Madey University of Notre Dame, Computer Science & Engineering Spring Simula?on Mul?- Conference 2014, Tampa, FL Student Colloquium

More information

On the effectiveness of application-aware self-management for scientific discovery in volunteer computing systems

On the effectiveness of application-aware self-management for scientific discovery in volunteer computing systems On the effectiveness of application-aware self-management for scientific discovery in volunteer computing systems SC Trilce Estrada and Michela Taufer University of Delaware November, Trilce Estrada and

More information

An Arbitrary Precision Library for GPUs

An Arbitrary Precision Library for GPUs An Arbitrary Precision Library for GPUs Philip Saponaro, Omar Padron, and Michela Taufer Global Computing Lab Computer and Information Sciences University of Delaware 1 The Problem Constant energy MD simulations

More information

biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data

biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data Ilkay Al/ntas 1, Jianwu Wang 2, Daniel Crawl 1, Shweta Purawat 1 1 San Diego

More information

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal

More information

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS F. Campeotto 1,2 A. Dal Palù 3 A. Dovier 2 F. Fioretto 1 E. Pontelli 1 1. Dept. Computer Science, NMSU 2. Dept.

More information

Fault-Tolerant Parallel Analysis of Millisecond-scale Molecular Dynamics Trajectories. Tiankai Tu D. E. Shaw Research

Fault-Tolerant Parallel Analysis of Millisecond-scale Molecular Dynamics Trajectories. Tiankai Tu D. E. Shaw Research Fault-Tolerant Parallel Analysis of Millisecond-scale Molecular Dynamics Trajectories Tiankai Tu D. E. Shaw Research Anton: A Special-Purpose Parallel Machine for MD Simulations 2 Routine Data Analysis

More information

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons

Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Assefaw Gebremedhin Purdue University (Star/ng August 2014, Washington State University School of Electrical Engineering

More information

Metadata Zoo Dataset Metadata Rebecca Koskela Execu4ve Director, DataONE

Metadata Zoo Dataset Metadata Rebecca Koskela Execu4ve Director, DataONE Metadata Zoo Dataset Metadata Rebecca Koskela Execu4ve Director, DataONE eurocris September 9, 2013 Outline Data Challenges Metadata Solu=on DataONE addressing the Data Challenge Enabling Scien=fic Discovery

More information

Stages of (Batch) Machine Learning

Stages of (Batch) Machine Learning Evalua&on Stages of (Batch) Machine Learning Given: labeled training data X, Y = {hx i,y i i} n i=1 Assumes each x i D(X ) with y i = f target (x i ) Train the model: model ß classifier.train(x, Y ) x

More information

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful Karl Gutwin May 15, 2005 18.336 A Parallel Implementation of a Higher-order Self Consistent Mean Field Effectively solving the protein repacking problem is a key step to successful protein design. Put

More information

Nov 20, 2013: Intro to RNA & Topological Landscapes for Visualiza:on of Scalar- Valued Func:ons.

Nov 20, 2013: Intro to RNA & Topological Landscapes for Visualiza:on of Scalar- Valued Func:ons. MATH:7450 (22M:305) Topics in Topology: Scien:fic and Engineering Applica:ons of Algebraic Topology Nov 20, 2013: Intro to RNA & Topological Landscapes for Visualiza:on of Scalar- Valued Func:ons. Fall

More information

Machine Learning : supervised versus unsupervised

Machine Learning : supervised versus unsupervised Machine Learning : supervised versus unsupervised Neural Networks: supervised learning makes use of a known property of the data: the digit as classified by a human the ground truth needs a training set,

More information

Tutorial. Docking School SAnDReS Tutorial Cyclin-Dependent Kinases with K i Information (Structural Parameters)

Tutorial. Docking School SAnDReS Tutorial Cyclin-Dependent Kinases with K i Information (Structural Parameters) Tutorial Docking School SAnDReS Tutorial Cyclin-Dependent Kinases with K i Information (Structural Parameters) Prof. Dr. Walter Filgueira de Azevedo Jr. Laboratory of Computational Systems Biology azevedolab.net

More information

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove High Performance Data Analytics for Numerical Simulations Bruno Raffin DataMove bruno.raffin@inria.fr April 2016 About this Talk HPC for analyzing the results of large scale parallel numerical simulations

More information

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs Lifan Xu, Michela Taufer, Stuart Collins, Dionisios G. Vlachos Global Computing Lab University of Delaware Multiscale Modeling:

More information

Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN

Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN Venkatram Vishwanath, Mark Hereld and Michael E. Papka Argonne Na

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval CS276: Informa*on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 12: Clustering Today s Topic: Clustering Document clustering Mo*va*ons Document

More information

Ensemble- Based Characteriza4on of Uncertain Features Dennis McLaughlin, Rafal Wojcik

Ensemble- Based Characteriza4on of Uncertain Features Dennis McLaughlin, Rafal Wojcik Ensemble- Based Characteriza4on of Uncertain Features Dennis McLaughlin, Rafal Wojcik Hydrology TRMM TMI/PR satellite rainfall Neuroscience - - MRI Medicine - - CAT Geophysics Seismic Material tes4ng Laser

More information

Privacy-Preserving Shortest Path Computa6on

Privacy-Preserving Shortest Path Computa6on Privacy-Preserving Shortest Path Computa6on David J. Wu, Joe Zimmerman, Jérémy Planul, and John C. Mitchell Stanford University Naviga6on desired des@na@on current posi@on Naviga6on: A Solved Problem?

More information

Hypergraph Sparsifica/on and Its Applica/on to Par//oning

Hypergraph Sparsifica/on and Its Applica/on to Par//oning Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer

More information

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton

More information

Workshop #7: Docking. Suggested Readings

Workshop #7: Docking. Suggested Readings Workshop #7: Docking Protein protein docking is the prediction of a complex structure starting from its monomer components. The search space can be extremely large, so large amounts of computational resources

More information

Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications

Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications L. Zhang, M. Parashar, E. Gallicchio, R. Levy TASSL & BIOMAPS Rutgers University ICPP 06, Columbus, OH, Aug. 16,

More information

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Omid Mashayekhi Hang Qu Chinmayee Shah Philip Levis July 13, 2017 2 Cloud Frameworks SQL Streaming Machine Learning

More information

151 Project 3. Bison Management

151 Project 3. Bison Management Suppose you take over the management of a certain Bison popula8on. The popula8on dynamics are similar to those of the popula8on we considered in Project 2 but differ in a few important ways: The adult

More information

Dec 2, 2013: Hippocampal spa:al map forma:on

Dec 2, 2013: Hippocampal spa:al map forma:on MATH:7450 (22M:305) Topics in Topology: Scien:fic and Engineering Applica:ons of Algebraic Topology Dec 2, 2013: Hippocampal spa:al map forma:on Fall 2013 course offered through the University of Iowa

More information

Molecular Distance Geometry and Atomic Orderings

Molecular Distance Geometry and Atomic Orderings Molecular Distance Geometry and Atomic Orderings Antonio Mucherino IRISA, University of Rennes 1 Workshop Set Computation for Control ENSTA Bretagne, Brest, France December 5 th 2013 The Distance Geometry

More information

The Numerical Reproducibility Fair Trade: Facing the Concurrency Challenges at the Extreme Scale

The Numerical Reproducibility Fair Trade: Facing the Concurrency Challenges at the Extreme Scale The Numerical Reproducibility Fair Trade: Facing the Concurrency Challenges at the Extreme Scale Michela Taufer University of Delaware Michela Becchi University of Missouri From MulD- core to Many Cores

More information

On the Migration of the Scientific Code Dyana from SMPs to Clusters of PCs and on to the Grid

On the Migration of the Scientific Code Dyana from SMPs to Clusters of PCs and on to the Grid On the Migration of the Scientific Code Dyana from SMPs to Clusters of PCs and on to the Grid Michela Taufer Gérard Roos Thomas Stricker Peter Güntert Institute for Computer Systems Institute for Molecular

More information

An applica)on of Markov Chains: PageRank. Finding relevant informa)on on the Web

An applica)on of Markov Chains: PageRank. Finding relevant informa)on on the Web An applica)on of Markov Chains: PageRank Finding relevant informa)on on the Web Please Par)cipate h>p://www.st.ewi.tudelc.nl/~marco/lectures.html How much do you know about PageRank? 1) Nothing. 2) I

More information

Predicting Protein Folding Paths. S.Will, , Fall 2011

Predicting Protein Folding Paths. S.Will, , Fall 2011 Predicting Protein Folding Paths Protein Folding by Robotics Probabilistic Roadmap Planning (PRM): Thomas, Song, Amato. Protein folding by motion planning. Phys. Biol., 2005 Aims Find good quality folding

More information

NARCCAP: North American Regional Climate Change Assessment Program. Seth McGinnis, NCAR

NARCCAP: North American Regional Climate Change Assessment Program. Seth McGinnis, NCAR NARCCAP: North American Regional Climate Change Assessment Program Seth McGinnis, NCAR mcginnis@ucar.edu NARCCAP: North American Regional Climate Change Assessment Program Nest highresolution regional

More information

Assessing the homo- or heterogeneity of noisy experimental data. Kay Diederichs Konstanz, 01/06/2017

Assessing the homo- or heterogeneity of noisy experimental data. Kay Diederichs Konstanz, 01/06/2017 Assessing the homo- or heterogeneity of noisy experimental data Kay Diederichs Konstanz, 01/06/2017 What is the problem? Why do an experiment? because we want to find out a property (or several) of an

More information

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System Jianwu Wang, Ilkay Altintas Scientific Workflow Automation Technologies Lab SDSC, UCSD project.org UCGrid Summit,

More information

Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on

Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on Recommender Systems Collaborave Filtering and Matrix Factorizaon Narges Razavian Thanks to lecture slides from Alex Smola@CMU Yahuda Koren@Yahoo labs and Bing Liu@UIC We Know What You Ought To Be Watching

More information

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

hashfs Applying Hashing to Op2mize File Systems for Small File Reads hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design

More information

Tangible Visualiza.on. Andy Wu Synaesthe.c Media Lab GVU Center Georgia Ins.tute of Technology

Tangible Visualiza.on. Andy Wu Synaesthe.c Media Lab GVU Center Georgia Ins.tute of Technology Tangible Visualiza.on Andy Wu Synaesthe.c Media Lab GVU Center Georgia Ins.tute of Technology Introduc.on Informa.on Visualiza.on (Infovis) is the study of the visual representa.on of complex informa.on,

More information

SEDA An architecture for Well Condi6oned, scalable Internet Services

SEDA An architecture for Well Condi6oned, scalable Internet Services SEDA An architecture for Well Condi6oned, scalable Internet Services Ma= Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium on Operating Systems Principles (SOSP), October

More information

Parallel Graph Coloring For Many- core Architectures

Parallel Graph Coloring For Many- core Architectures Parallel Graph Coloring For Many- core Architectures Mehmet Deveci, Erik Boman, Siva Rajamanickam Sandia Na;onal Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated

More information

Spatializing GIS Commands with Self-Organizing Maps. Jochen Wendel Barbara P. Buttenfield Roland J. Viger Jeremy M. Smith

Spatializing GIS Commands with Self-Organizing Maps. Jochen Wendel Barbara P. Buttenfield Roland J. Viger Jeremy M. Smith Spatializing GIS Commands with Self-Organizing Maps Jochen Wendel Barbara P. Buttenfield Roland J. Viger Jeremy M. Smith Outline Introduction Characterizing GIS Commands Implementation Interpretation of

More information

Introduction to Parallel Computing. Lesson 1

Introduction to Parallel Computing. Lesson 1 Introduction to Parallel Computing. Lesson 1 Adriano FESTA Dipartimento di Ingegneria e Scienze dell Informazione e Matematica, L Aquila DISIM, L Aquila, 25.02.2019 adriano.festa@univaq.it What is Parallel

More information

Manifold Learning: Applications in Neuroimaging

Manifold Learning: Applications in Neuroimaging Your own logo here Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold learning for Atlas Propagation Multi-atlas segmentation Challenges LEAP Manifold learning for

More information

Data Curation Profile Cornell University, Biophysics

Data Curation Profile Cornell University, Biophysics Data Curation Profile Cornell University, Biophysics Profile Author Dianne Dietrich Author s Institution Cornell University Contact dd388@cornell.edu Researcher(s) Interviewed Withheld Researcher s Institution

More information

A First Introduction to Scientific Visualization Geoffrey Gray

A First Introduction to Scientific Visualization Geoffrey Gray Visual Molecular Dynamics A First Introduction to Scientific Visualization Geoffrey Gray VMD on CIRCE: On the lower bottom left of your screen, click on the window start-up menu. In the search box type

More information

Computational and Statistical Tradeoffs in VoI-Driven Learning

Computational and Statistical Tradeoffs in VoI-Driven Learning ARO ARO MURI MURI on on Value-centered Theory for for Adaptive Learning, Inference, Tracking, and and Exploitation Computational and Statistical Tradefs in VoI-Driven Learning Co-PI Michael Jordan University

More information

On Op%mality of Clustering by Space Filling Curves

On Op%mality of Clustering by Space Filling Curves On Op%mality of Clustering by Space Filling Curves Pan Xu panxu@iastate.edu Iowa State University Srikanta Tirthapura snt@iastate.edu 1 Mul%- dimensional Data Indexing and managing single- dimensional

More information

Ar#ficial Intelligence

Ar#ficial Intelligence Ar#ficial Intelligence Advanced Searching Prof Alexiei Dingli Gene#c Algorithms Charles Darwin Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 2011/2012 Week 3. Lecture 6 Previous Class Dimensions & Measures Dimensions: Item Time Loca0on Measures: Quan0ty Sales TransID ItemName ItemID Date Store Qty T0001 Computer I23

More information

Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey

Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey Applica@ons, Architectures, and Protocol Design Issues for Mobile Social Networks: A Survey N. Kayastha,D. Niyato, P. Wang and E. Hossain, Proceedings of the IEEEVol. 99, No. 12, Dec. 2011. Sabita Maharjan

More information

@ COUCHBASE CONNECT. Using Couchbase. By: Carleton Miyamoto, Michael Kehoe Version: 1.1w LinkedIn Corpora3on

@ COUCHBASE CONNECT. Using Couchbase. By: Carleton Miyamoto, Michael Kehoe Version: 1.1w LinkedIn Corpora3on @ COUCHBASE CONNECT Using Couchbase By: Carleton Miyamoto, Michael Kehoe Version: 1.1w Overview The LinkedIn Story Enter Couchbase Development and Opera3ons Clusters and Numbers Opera3onal Tooling Carleton

More information

Generating Better Conformations for Roadmaps in Protein Folding

Generating Better Conformations for Roadmaps in Protein Folding Generating Better Conformations for Roadmaps in Protein Folding Jeff May, Lydia Tapia, and Nancy M. Amato Parasol Lab, Dept. of Computer Science, Texas A&M University, College Station, TX 77843 {jmayx9ns}@umw.edu

More information

3.021J / 1.021J / J / J / 22.00J Introduction to Modeling and Simulation Spring 2008

3.021J / 1.021J / J / J / 22.00J Introduction to Modeling and Simulation Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 3.021J / 1.021J / 10.333J / 18.361J / 22.00J Introduction to Modeling and Simulation Spring 2008 For information about citing these materials or our Terms of Use,

More information

Protein Flexibility Predictions

Protein Flexibility Predictions Protein Flexibility Predictions Using Graph Theory by D.J. Jacobs, A.J. Rader et. al. from the Michigan State University Talk by Jan Christoph, 19th January 2007 Outline 1. Introduction / Motivation 2.

More information

Advanced Research Compu2ng Informa2on Technology Virginia Tech

Advanced Research Compu2ng Informa2on Technology Virginia Tech Advanced Research Compu2ng Informa2on Technology Virginia Tech www.arc.vt.edu Personnel Associate VP for Research Compu6ng: Terry Herdman (herd88@vt.edu) Director, HPC: Vijay Agarwala (vijaykag@vt.edu)

More information

Performance Analysis of Distributed Systems using BOINC

Performance Analysis of Distributed Systems using BOINC Performance Analysis of Distributed Systems using BOINC Muhammed Suhail T. S. ASE Tata Consultancy Services Pooranam House Nellikkunnu Kasaragod ABSTRACT In this paper, we are discussing about the performance

More information

CCW Workshop Technical Session on Mobile Cloud Compu<ng

CCW Workshop Technical Session on Mobile Cloud Compu<ng CCW Workshop Technical Session on Mobile Cloud Compu

More information

A Tes&ng Engine for High- Performance and Cost- Effec&ve Workflow Execu&on in the Cloud

A Tes&ng Engine for High- Performance and Cost- Effec&ve Workflow Execu&on in the Cloud A Tes&ng Engine for High- Performance and Cost- Effec&ve Workflow Execu&on in the Cloud V. K. Pallipuram 1, T. Estrada 2, M. Taufer 3 University of the Pacific 1 University of New Mexico 2 University of

More information

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation

More information

Arrival Scheduling with Shortcut Path Op6ons and Mixed Aircra: Performance. Shannon Zelinski and Jaewoo Jung NASA Ames Research Center

Arrival Scheduling with Shortcut Path Op6ons and Mixed Aircra: Performance. Shannon Zelinski and Jaewoo Jung NASA Ames Research Center Arrival Scheduling with Shortcut Path Op6ons and Mixed Aircra: Performance Shannon Zelinski and Jaewoo Jung NASA Ames Research Center Time- Based Arrival Management Fixed Path Speed Control More accurate

More information

Jialin Liu, Evan Racah, Quincey Koziol, Richard Shane Canon, Alex Gittens, Lisa Gerhardt, Suren Byna, Mike F. Ringenburg, Prabhat

Jialin Liu, Evan Racah, Quincey Koziol, Richard Shane Canon, Alex Gittens, Lisa Gerhardt, Suren Byna, Mike F. Ringenburg, Prabhat H5Spark H5Spark: Bridging the I/O Gap between Spark and Scien9fic Data Formats on HPC Systems Jialin Liu, Evan Racah, Quincey Koziol, Richard Shane Canon, Alex Gittens, Lisa Gerhardt, Suren Byna, Mike

More information

Energy Minimization on Manifolds for Docking Flexible Molecules

Energy Minimization on Manifolds for Docking Flexible Molecules pubs.acs.org/jctc Energy Minimization on Manifolds for Docking Flexible Molecules Hanieh Mirzaei, Shahrooz Zarbafian, Elizabeth Villar, Scott Mottarella, Dmitri Beglov, Sandor Vajda, Ioannis Ch. Paschalidis,

More information

Supervised and Unsupervised Learning. Ciro Donalek Ay/Bi 199 April 2011

Supervised and Unsupervised Learning. Ciro Donalek Ay/Bi 199 April 2011 Supervised and Unsupervised Learning Ciro Donalek Ay/Bi 199 April 2011 KDD and Data Mining Tasks Finding the op?mal approach Supervised Models Neural Networks Mul? Layer Perceptron Decision Trees Unsupervised

More information

Kinetically-aware Conformational Distances in Molecular Dynamics

Kinetically-aware Conformational Distances in Molecular Dynamics CCCG 211, Toronto ON, August 12, 211 Kinetically-aware Conformational Distances in Molecular Dynamics Chen Gu Xiaoye Jiang Leonidas Guibas Abstract In this paper, we present a novel approach for discovering

More information

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences

Energy Minimization -Non-Derivative Methods -First Derivative Methods. Background Image Courtesy: 3dciencia.com visual life sciences Energy Minimization -Non-Derivative Methods -First Derivative Methods Background Image Courtesy: 3dciencia.com visual life sciences Introduction Contents Criteria to start minimization Energy Minimization

More information

Overview of XSEDE for HPC Users Victor Hazlewood XSEDE Deputy Director of Operations

Overview of XSEDE for HPC Users Victor Hazlewood XSEDE Deputy Director of Operations October 29, 2014 Overview of XSEDE for HPC Users Victor Hazlewood XSEDE Deputy Director of Operations XSEDE for HPC Users What is XSEDE? XSEDE mo/va/on and goals XSEDE Resources XSEDE for HPC Users: Before

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing Chieh-Sen (Jason) Huang Department of Applied Mathematics National Sun Yat-sen University Thank Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar for providing

More information

Vulnerability Analysis (III): Sta8c Analysis

Vulnerability Analysis (III): Sta8c Analysis Computer Security Course. Vulnerability Analysis (III): Sta8c Analysis Slide credit: Vijay D Silva 1 Efficiency of Symbolic Execu8on 2 A Sta8c Analysis Analogy 3 Syntac8c Analysis 4 Seman8cs- Based Analysis

More information

Bioinformatics III Structural Bioinformatics and Genome Analysis

Bioinformatics III Structural Bioinformatics and Genome Analysis Bioinformatics III Structural Bioinformatics and Genome Analysis Chapter 3 Structural Comparison and Alignment 3.1 Introduction 1. Basic algorithms review Dynamic programming Distance matrix 2. SARF2,

More information

Introduc)on to High Performance Compu)ng Advanced Research Computing

Introduc)on to High Performance Compu)ng Advanced Research Computing Introduc)on to High Performance Compu)ng Advanced Research Computing Outline What cons)tutes high performance compu)ng (HPC)? When to consider HPC resources What kind of problems are typically solved?

More information

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM 25th March, GTC 2014, San Jose CA AnE- Pekka Hynninen ane.pekka.hynninen@nrel.gov NREL is a na*onal laboratory of the U.S. Department of Energy,

More information

Compiler: Control Flow Optimization

Compiler: Control Flow Optimization Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Reducing the Search Space in Protein Structure Prediction

Reducing the Search Space in Protein Structure Prediction FACULTY OF SCIENCE UNIVERSITY OF COPENHAGEN Ph.D. thesis Rasmus Fonseca Reducing the Search Space in Protein Structure Prediction Department of Computer Science Academic advisor: Pawel Winter Submitted:

More information

February 2010 Christopher Bruns

February 2010 Christopher Bruns Molecular Simulation with OpenMM Zephyr February 2010 Christopher Bruns What is OpenMM Zephyr? Graphical user interface for running GPU accelerated molecular dynamics simulations Automates running of gromacs

More information

TerraSwarm. A Machine Learning and Op0miza0on Toolkit for the Swarm. Ilge Akkaya, Shuhei Emoto, Edward A. Lee. University of California, Berkeley

TerraSwarm. A Machine Learning and Op0miza0on Toolkit for the Swarm. Ilge Akkaya, Shuhei Emoto, Edward A. Lee. University of California, Berkeley TerraSwarm A Machine Learning and Op0miza0on Toolkit for the Swarm Ilge Akkaya, Shuhei Emoto, Edward A. Lee University of California, Berkeley TerraSwarm Tools Telecon 17 November 2014 Sponsored by the

More information

Trajectory Analysis. Release Ahmet Bakan

Trajectory Analysis. Release Ahmet Bakan Trajectory Analysis Release 1.5.1 Ahmet Bakan December 24, 2013 CONTENTS 1 Introduction 2 1.1 Required Programs........................................... 2 1.2 Recommended Programs.......................................

More information

Practical 10: Protein dynamics and visualization Documentation

Practical 10: Protein dynamics and visualization Documentation Practical 10: Protein dynamics and visualization Documentation Release 1.0 Oliver Beckstein March 07, 2013 CONTENTS 1 Basics of protein structure 3 2 Analyzing protein structure and topology 5 2.1 Topology

More information

Lecture 13: Tracking mo3on features op3cal flow

Lecture 13: Tracking mo3on features op3cal flow Lecture 13: Tracking mo3on features op3cal flow Professor Fei- Fei Li Stanford Vision Lab Lecture 13-1! What we will learn today? Introduc3on Op3cal flow Feature tracking Applica3ons (Problem Set 3 (Q1))

More information

A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP

A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP Kai Tian Kai Tian, Yunlian Jiang and Xipeng Shen Computer Science Department, College of William and Mary, Virginia, USA 5/18/2009 Cache

More information

DataONE Cyberinfrastructure. Ma# Jones Dave Vieglais Bruce Wilson

DataONE Cyberinfrastructure. Ma# Jones Dave Vieglais Bruce Wilson DataONE Cyberinfrastructure Ma# Jones Dave Vieglais Bruce Wilson Foremost a Federa9on Member Nodes (MNs) Heart of the federa9on Harness the power of local cura9on Coordina9ng Nodes (CNs) Services to link

More information

Big Data Analytics! Special Topics for Computer Science CSE CSE Feb 9

Big Data Analytics! Special Topics for Computer Science CSE CSE Feb 9 Big Data Analytics! Special Topics for Computer Science CSE 4095-001 CSE 5095-005! Feb 9 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Clustering I What

More information

CSE Opera+ng System Principles

CSE Opera+ng System Principles CSE 30341 Opera+ng System Principles Lecture 2 Introduc5on Con5nued Recap Last Lecture What is an opera+ng system & kernel? What is an interrupt? CSE 30341 Opera+ng System Principles 2 1 OS - Kernel CSE

More information

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Clustering Chris Manning, Pandu Nayak, and Prabhakar Raghavan Today s Topic: Clustering Document clustering Mo*va*ons Document representa*ons Success criteria Clustering

More information

TiDA: High Level Programming Abstrac8ons for Data Locality Management

TiDA: High Level Programming Abstrac8ons for Data Locality Management h#p://parcorelab.ku.edu.tr TiDA: High Level Programming Abstrac8ons for Data Locality Management Didem Unat, Muhammed Nufail Farooqi, Burak Bastem Koç University, Turkey Tan Nguyen, Weiqun Zhang, George

More information

Introduction. IST557 Data Mining: Techniques and Applications. Jessie Li, Penn State University

Introduction. IST557 Data Mining: Techniques and Applications. Jessie Li, Penn State University Introduction IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Introduction Why Data Mining? What Is Data Mining? A Mul3-Dimensional View of Data Mining What Kinds of Data

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

ALP Interchange Review. Colorado Department of Educa2on Office of Gi7ed Educa2on Jessica Howard, Gi7ed Educa2on Specialist

ALP Interchange Review. Colorado Department of Educa2on Office of Gi7ed Educa2on Jessica Howard, Gi7ed Educa2on Specialist ALP Interchange Review Colorado Department of Educa2on Office of Gi7ed Educa2on Jessica Howard, Gi7ed Educa2on Specialist January 2017 Please sign in using the Chat Box (Click on the chat icon to access

More information

Using Sequen+al Run+me Distribu+ons for the Parallel Speedup Predic+on of SAT Local Search

Using Sequen+al Run+me Distribu+ons for the Parallel Speedup Predic+on of SAT Local Search Using Sequen+al Run+me Distribu+ons for the Parallel Speedup Predic+on of SAT Local Search Alejandro Arbelaez - CharloBe Truchet - Philippe Codognet JFLI University of Tokyo LINA, UMR 6241 University of

More information

Index construc-on. Friday, 8 April 16 1

Index construc-on. Friday, 8 April 16 1 Index construc-on Informa)onal Retrieval By Dr. Qaiser Abbas Department of Computer Science & IT, University of Sargodha, Sargodha, 40100, Pakistan qaiser.abbas@uos.edu.pk Friday, 8 April 16 1 4.3 Single-pass

More information

Easy manual for SIRD interface

Easy manual for SIRD interface Easy manual for SIRD interface What is SIRD interface (1) SIRDi is the web-based interface for SIRD (Structure-Interaction Relational Database) (2) You can search for protein structure models with sequence,

More information

Oh, Exascale! The effect of emerging architectures on scien1fic discovery. Kenneth Moreland, Sandia Na1onal Laboratories

Oh, Exascale! The effect of emerging architectures on scien1fic discovery. Kenneth Moreland, Sandia Na1onal Laboratories Photos placed in horizontal posi1on with even amount of white space between photos and header Oh, $#*@! Exascale! The effect of emerging architectures on scien1fic discovery Ultrascale Visualiza1on Workshop,

More information

High-throughput molecular dynamics simulation and Markov modeling. Frank Noé (FU Berlin)

High-throughput molecular dynamics simulation and Markov modeling. Frank Noé (FU Berlin) High-throughput molecular dynamics simulation and Markov modeling Frank Noé (FU Berlin) frank.noe@fu-berlin.de Molecular dynamics Molecular dynamics are stochastic Single trajectories are usually not representative

More information

VMD Documentation. Mehrdad Youse (CCIT Visualization Group)

VMD Documentation. Mehrdad Youse (CCIT Visualization Group) VMD Documentation Mehrdad Youse (CCIT Visualization Group) May 22, 2017 Abstract In this documentation, the basic and intermediate documentation of VMD molecular dynamics visualization software from University

More information