pcj-blast - highly parallel similarity search implementation

Similar documents
Performance evaluation of parallel computing and Big Data processing with Java and PCJ

PCJ-BLAST massively parallel sequence alignment using NCBI Blast and PCJ Java library

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

GPFS at EBI. Facing performance degradation when using mmap based applications. Jordi Valls Systems Infrastructure Group

Introduction to The Storage Resource Broker

Application of machine learning and big data technologies in OpenAIRE system

Hybrid MapReduce Workflow. Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US

Level-synchronous BFS algorithm implemented in Java using PCJ Library

A Cloud-based Dynamic Workflow for Mass Spectrometry Data Analysis

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

The BioHPC Nucleus Cluster & Future Developments

PCJ - a Java library for heterogenous parallel computing

Experiences with HP SFS / Lustre in HPC Production

Parallel I/O on JUQUEEN

Pervasive DataRush TM

CrocoBLAST: Running BLAST Efficiently in the Age of Next-Generation Sequencing

Using MPI One-sided Communication to Accelerate Bioinformatics Applications

HPC Architectures. Types of resource currently in use

The Fusion Distributed File System

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Grid Computing for Bioinformatics: An Implementation of a User-Friendly Web Portal for ASTI's In Silico Laboratory

Introduction to High-Performance Computing (HPC)

MDHIM: A Parallel Key/Value Store Framework for HPC

Advances of parallel computing. Kirill Bogachev May 2016

HPC-BLAST Scalable Sequence Analysis for the Intel Many Integrated Core Future

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

Extreme I/O Scaling with HDF5

Parallel File Systems for HPC

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Fast Forward I/O & Storage

Welcome! Virtual tutorial starts at 15:00 BST

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

MOHA: Many-Task Computing Framework on Hadoop

Dealing with Large Datasets. or, So I have 40TB of data.. Jonathan Dursi, SciNet/CITA, University of Toronto

Parallel File Systems. John White Lawrence Berkeley National Lab

Taming Parallel I/O Complexity with Auto-Tuning

Monitoring and Trouble Shooting on BioHPC

Data Movement & Tiering with DMF 7

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

CA485 Ray Walshe Google File System

Supercomputer and grid infrastructure! in Poland!

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Chapter 5. The MapReduce Programming Model and Implementation

Cloud Computing. Up until now

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

How to Run NCBI BLAST on zcluster at GACRC

IFS migrates from IBM to Cray CPU, Comms and I/O

Parallel File Systems Compared

High Performance Computing Cluster Advanced course

Smart Trading with Cray Systems: Making Smarter Models + Better Decisions in Algorithmic Trading

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

Grid Experiment and Job Management

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

PERFORMANCE OF PARALLEL IO ON LUSTRE AND GPFS

Azure-persistence MARTIN MUDRA

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Financed by the European Commission 7 th Framework Programme. biobankcloud.eu. Jim Dowling, PhD Assoc. Prof, KTH Project Coordinator

High- Throughput High- Resolution Cryo- EM on the Cheap

Lustre overview and roadmap to Exascale computing

Genomics on Cisco Metacloud + SwiftStack

PORTAL. A Case Study. Dr. Kristin Tufte Mark Wong September 23, Linux Plumbers Conference 2009

Shared Object-Based Storage and the HPC Data Center

Using the Galaxy Local Bioinformatics Cloud at CARC

Parallel Visualization At TACC. Greg Abram

Warehouse-Scale Computing

Exploring Architecture Options for a Federated, Cloud-based Systems Biology Knowledgebase. Ian Gorton, Jenny Liu, Jian Yin

Improved VariantSpark breaks the curse of dimensionality for machine learning on genomic data

GRID Application Portal

Triton file systems - an introduction. slide 1 of 28

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

Tree-Based Density Clustering using Graphics Processors

Parallel Visualization At TACC. Greg Abram

Distributed Systems CS6421

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Ian Foster, An Overview of Distributed Systems

Cluster Setup and Distributed File System

UMass High Performance Computing Center

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Outline. ASP 2012 Grid School

The Last Bottleneck: How Parallel I/O can improve application performance

Automating Real-time Seismic Analysis

Performance database technology for SciDAC applications

NCBI BLAST accelerated on the Mitrion Virtual Processor

mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction

Databases and Data Warehouses

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Rochester Institute of Technology. Making personalized education scalable using Sequence Alignment Algorithm

Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice, Slovak Republic)

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Write On Aws. Aws Tools For Windows Powershell User Guide using the aws tools for windows powershell (p. 19) this section includes information about

MetaPhyler Usage Manual

HPC Downtime Budgets: Moving SRE Practice to the Rest of the World

LBRN - HPC systems : CCT, LSU

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Transcription:

pcj-blast - highly parallel similarity search implementation Piotr Bała bala@icm.edu.pl ICM University of Warsaw Marek Nowicki faramir@mat.umk.pl ICM University of Warsaw N. Copernicus Univeristy Davit Bzhalava davit.bzhalava@ki.se Karolinska Instituet

Blast Sequence alignment is essential for NGS There is a number of software packages for sequence alignment based on various a similarity search methods BLAST Basic Local Alignment Search Tool (1991) The heruistic algorithm it uses is much faster than other approaches The search time can be long (days or weeks) for large datasets NCBI blast is the most widely used implementation There is strong interest in using large computer systems to run blast Blast runing on cloud or grid Parallel versions of blast running on HPC systems 2

Fasta input (2 reads out of 1 milion) >C1093377_2.0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAACACTTTGGGAGGCCAAGATGGGAAGATCTTTTGAGG CCAGGCGTTCAAGACCAGTCTGGGCAATATGGAGAGACCT GTCTCTACAAAAAAAATTAGCCAGACCTAGTGGCTGGCTGA GGCAGGAGGATCATCTGAGCTCAGGAGATTGAGAT >C1093379_2.0 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAATTTTAAGTTTTTCCTTATAAGGGAAATAAGCTTATTTCT TTTAGAAGCAGAAACTGCAAATGGGGGTGTATGGAATTCTTT TTTTTTTTTTTTTTTTTGCGACAAGGTCTCACTGTGTTGCCCA GACTGGAGTGCAGTGGTGCAATCATAGCT 3

NCBI blast performance (XC40 single node) 1000 Time of processing 48 sequences at once using 1 instance of different version of NCBI BLAST with different number of BLAST threads Time [s] 100 10 1 2 4 8 12 24 48 Number of BLAST threads ncbi-blast-2.2.28+ ncbi-blast-2.2.31+ ncbi-blast-2.3.0+ ncbi-blast-2.3.0+_gcc_5.3 4

Blast processing time of reads changes 800 Processing time [s] 700 600 500 400 300 200 100 0 sek1 sek4 sek7 sek10 sek13 sek16 sek19 sek22 sek25 sek28 sek31 sek34 sek37 sek40 sek43 sek46 sek49 sek52 sek55 sek58 5

Parallel blast implementations mpiblast uses old version (2012) of blast parallel_blast.py limited to single node, regular split of input Paracel BLAST (based on NCBI blast 2.2.26) costly! The BlueGene implementation limited to IBM BGQ systems pioblast parallel version which uses MPIIO memory problems while reference database is large 6

Pcj-blast Parallel version developed using PCJ (Parallel Computing in Java library developed at ICM) Java code is responsible for reading fasta input and execute NCBI-blast on set of sequences (reads) The NCBI blast (currently 2.2.28) is executed with the -num_threads option For 28 core nodes: 4 PCJ threads with num_threads=7 The solution can be executed on any multinode system with Java 8 installed The I/O performance is important thousands of blast instances reads blast database (52GB) concurrently lustra filesystem performs worse than nfs 7

Pcj-blast Reads input sequence (fasta) line by line Blocks of lines (reads) are used as input for blast query Pcj-blast can start hundreds of blast instances NCBI blast is used pcj-blast not bounded to particular version One or more NCBI blast instances are run on the hardware node Output from each node (text or XML) is gathered Output is postprocessed and is stored in a single file Postprocessing is parallelized Pcj-blast can be run on HPC systems, clusters Requires Java 8 and NCBI blast installed 8

Pcj-blast 9

Multiple NCBI blast running on single node (X86 cluster) 10

I/O performance (x86 single file read) 11

I/O performance (XC40 single file read) 12

Blast (with PCJ) performance (x86 cluster) nodes cores Time [hours] Time [days] 1 mln reads 1 28 1600.0 67.0 estimated 32 768 49.9 2.0 48 1152 34.1 1.4 64 1536 26.5 1.1 Performance [reads/s] @32 nodes 6.8 5.8 4.8 3.8 2.8 1.8 nfs lustre 0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00 35000.00 Sequence length (numer of reads) 13

Pcj-blast scalability (x86 cluster) 14

pcj-blast scalability (Cray XC40) 15

parallel_blast.py parallel_blast.py Workflow development Complicated workflow has been developed by Davit The bottleneck is sequence alignment with blast typical data to process consists of ca. 1 mln reads Input size ca. 300MB, output size (XML file) ca. 1TB Blast outputs is than processed by python scripts and R 16

New services for genetics Applications 17!

UNICORE RCP - Blast 18

UNICORE portal 19 29/11/2016 Piotr Bała

UNICORE portal 20

pcj-blast - highly parallel similarity search implementation PCJ: pcj.icm.edu.pl Piotr Bała bala@icm.edu.pl ICM University of Warsaw Marek Nowicki faramir@mat.umk.pl ICM University of Warsaw N. Copernicus Univeristy Davit Bzhalava davit.bzhalava@ki.se Karolinska Instituet