NovoalignMPI User Guide

Similar documents
NovoalignMPI User Guide

Practical Introduction to

How to for compiling and running MPI Programs. Prepared by Kiriti Venkat

A brief introduction to MPICH. Daniele D Agostino IMATI-CNR

Running OpenSees Parallel Applications under Windows/Mac OS X

HPC Architectures. Types of resource currently in use

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU

Tutorial on MPI: part I

Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse

Solution of Exercise Sheet 2

Introduction to MPP version of LS-DYNA

Approaches to Parallel Computing

MPI Mechanic. December Provided by ClusterWorld for Jeff Squyres cw.squyres.com.

Cluster Network Products

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Project: A RASPBERRY PI PARALLEL PROCESSOR

Beginner's Guide for UK IBM systems

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

Parallel DelPhi95 v5.1 User Guide

John the Ripper on a Ubuntu MPI Cluster

Tech Computer Center Documentation

Reusing this material

Module 4: Working with MPI

CUDA GPGPU Workshop 2012

Chapter 2 Operating-System Structures

Multiprocessor Systems Continuous need for faster computers Multiprocessors: shared memory model, access time nanosec (ns) Multicomputers: message pas

High Performance Beowulf Cluster Environment User Manual

Non-uniform memory access (NUMA)

Operating Systems, Fall Lecture 9, Tiina Niklander 1

Anomalies. The following issues might make the performance of a parallel program look different than it its:

Compute Node Linux (CNL) The Evolution of a Compute OS

Computing with the Moore Cluster

Advanced Job Launching. mapping applications to hardware

Moore s Law. Computer architect goal Software developer assumption

NUMA Control for Hybrid Applications. Hang Liu TACC February 7 th, 2011

CST STUDIO SUITE 2019

Programming with MPI. Pedro Velho

Parallel Programming. Libraries and Implementations

LS-DYNA Scalability Analysis on Cray Supercomputers

Practical Methods to Begin Parallel Processing Education and Research

Programming with MPI on GridRS. Dr. Márcio Castro e Dr. Pedro Velho

Compilation and Parallel Start

MPICH User s Guide Version Mathematics and Computer Science Division Argonne National Laboratory

Lecture 3: Intro to parallel machines and models

HPC-Reuse: efficient process creation for running MPI and Hadoop MapReduce on supercomputers

Univa Grid Engine Troubleshooting Quick Reference

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

Multicore Performance and Tools. Part 1: Topology, affinity, clock speed

NUMA-aware OpenMP Programming

Top500 Supercomputer list

Parallel Systems. Project topics

Message Passing Interface (MPI)

Bluemin: A Suite for Management of PC Clusters

DDT User Guide. Version 2.2.1

HPC Resources at Lehigh. Steve Anthony March 22, 2012

31.1 Introduction to Parallel Processing

Comsics: the parallel computing facility in the school of physics, USM.

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah

Introduction to Parallel Programming with MPI

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

Part One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary

Simple examples how to run MPI program via PBS on Taurus HPC

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

CS6401- Operating System QUESTION BANK UNIT-I

The MPI API s baseline requirements

Our new HPC-Cluster An overview

INTRODUCTION TO THE CLUSTER

SCALABLE HYBRID PROTOTYPE

Cluster Clonetroop: HowTo 2014

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED

MPI History. MPI versions MPI-2 MPICH2

Operating Systems Course 2 nd semester 2016/2017 Chapter 1: Introduction

NUMA Control for Hybrid Applications. Hang Liu & Xiao Zhu TACC September 20, 2013

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts

Orbital Integrator System Manual

ECE 574 Cluster Computing Lecture 4

Embedded Linux Architecture

Placement de processus (MPI) sur architecture multi-cœur NUMA

Linux multi-core scalability

A NUMA Aware Scheduler for a Parallel Sparse Direct Solver

COMP/CS 605: Introduction to Parallel Computing Topic : Distributed Memory Programming: Message Passing Interface

Parallel Architectures

Xeon Phi Native Mode - Sharpen Exercise

Docker task in HPC Pack

TotalView Release Notes

Cray Support of the MPICH ABI Compatibility Initiative

6.1 Multiprocessor Computing Environment

Programming with MPI

Illinois Proposal Considerations Greg Bauer

Parallelism and Concurrency. COS 326 David Walker Princeton University

Getting Started Guide. Installation and Setup Instructions. For version Copyright 2009 Code 42 Software, Inc. All rights reserved

To Install and Run NCS6

Practical Introduction to Message-Passing Interface (MPI)

High Performance Computing (HPC) Prepared By: Abdussamad Muntahi Muhammad Rahman

WHY PARALLEL PROCESSING? (CE-401)

Future Trends in Hardware and Software for use in Simulation

Transcription:

Bioinformatics Specialists Sequence Analysis Software DNA/RNA Sequencing Service Consulting User Guide is a messaging passing version of that allows the alignment process to be spread across multiple servers in a cluster or other network of computers 1. Multiple servers can be used to align one read file. 2. on each server is multi threaded to maximise server utilisation. 3. Linux and MAC OS X servers with X86 64 CPUs can be included in the network. takes the same command line arguments and options as and operation is almost identical except that reads are being distributed to multiple servers for alignment. The differences are: 1. The command line option o Sync has no effect. Read alignments are always reported in the order that alignments complete rather than in the order they appear in the read files. 2. When using Quality calibration with the k option without a calibration file (i.e. not k <file> ) the accumulation of mismatch counts and calibration is done per slave process. This does not affect the K <file> option which will still record total mismatch counts for all alignments. MPI API Version uses MPICH version 3.1, an open source high performance and widely portable implementation of the Message Passing Interface (MPI) standard which is available for download http://www. mpich.org Setting Up MPICH This is basic introduction to running MPI jobs using Hydra process manager, if you get stuck please refer to the MPICH Installation Guide. Setting up host names Every computer to be used must have a hostname that can be resolved via DNS server and/or /etc/hosts. The IP addresses must resolve identically on every server! One common problem is that a server Novocraft Technologies Sdn Bhd C 23A 05, 3 Two Square, Section 19, 46300 Petaling Jaya, Selangor, Malaysia Company No.: 827499 W Ph: +603 7960 0541 Fx: +603 7960 0540 email: office@novocraft.com

may have a /etc/hosts entry that uses the local loopback IP address ie. This can cause problems... /etc/hosts # # hosts This file describes a number of hostname-to-address # mappings for the TCP/IP subsystem. It is mostly # used at boot time, when no name servers are running. # On small systems, this file can be used instead of a # "named" name server. # Syntax: # # IP-Address Full-Qualified-Hostname Short-Hostname # 127.0.0.1 localhost 127.0.0.1 colinspc colinspc The entry for colinspc should have the IP address of the server, not the local loop back IP address. Also check /etc/hostname and ensure it matches the server name Any mismatches in names and IP addresses will surely mess up MPI even if the names can be resolved correctly via DNS and via /etc/hosts on other servers. You also need to create a file consisting of a list of server names, one per line. Name this file hosts.txt and save it into the folder you will be running from. hosts.txt server1 server2 server3 These host names will be used as targets for ssh or rsh, so include full domain names if necessary. Check that you can reach these machines with ssh or rsh without entering a password. You can test by doing ssh othermachine date Testing Hydra To test that MPICH2 is installed and working correctly mpiexec -np <number to start> -f hosts.txt hostname This will run hostname on each server and you can easily see that MPI is working on the servers in your hostfile.

Running Preparation For the above command to work you have to ensure novoalignmpi and novoalign.lic are available on the PATH on all servers and that the novoindex is available using the same file string. There's no need for the read files to be available on all servers as they are only read by the master process. If using quality calibration and the k <infile> option then infile must be available on all servers in the ring. Execution Run : mpiexec -f hosts.txt -np <number of jobs> novoalignmpi -d <novoindex> -f <readfiles> <options> >report.novo 2>run.log </dev/null The number of jobs must be at least 2. The first job is the master process and is responsible for fileio and sending reads to the slave processes for alignment. The master process does not do any alignments and does not use a lot of processor time. Other processes are slaves, they accept reads from the master process, align them and then returning the report lines to the master for output. You can have more or less jobs than the number of hosts specified in the hosts file. If you have more jobs then some servers will have more than one process. This makes it possible to have a slave process running on the same server as the master by starting two processes on the master server. One way to do this is to specify the master server first in the hosts.txt file and then start one more job than there are hosts. As jobs are started round robin on the servers the first server will have two jobs, the master and a slave. Note. That novoalignmpi is multi threaded and will use all CPU cores on a server unless the c option is used to restrict the number of threads. Threading and CPU Affinity has several features to improve maximum throughput. 1. The Master process uses multiple threads to read, inflate and build the MPI messages. These threads are set to have affinity to all CPUs. 2. If a slave is running on the same node as the master process then it sets nice(12) so that the master process gets the CPU time it needs. 3. If c option uses all cores then slaves set CPU affinity for the alignment threads.

NUMA Servers When using on servers with Non Uniform Access there can be contention for the index. NUMA servers have some memory directly attached to each processor chip and an interconnect that High Speed Interconnect connects memory between the processors. Accessing the memory directly attached to a processor chip is many times faster than accessing memory attached to a different processor. It's also possible to overload the interconnect bus. Consider case of using running with 32 threads on 8 quad core processors. The indexed reference genome will be loaded into physical RAM of the first process that accesses it. Interconnect This can lead to overloading of the memory interconnect and reduced performance as the index is used very heavily. Keeping a copy of the index in each processors local memory can improve performance. With you can force an index to be loaded into the local memory of each processor chip. Interconnect To do this we start one slave MPI job per processor chip and multi thread each task using c option to the number of cores in each processor. We also have to disable memory mapping of the index with the option mmapoff Note. We have seen some systems where this hasn't worked and all the slaves started on the same CPU so we suggest testing to see if it improves performance.

Example. For 8 quad core processors we might use: mpiexec np 9 f hosts novoalignmpi c4 mmapoff This assumes each processor has enough local memory to have it's own copy of the index. Resolving Problems 1. Try running mpiexec f hosts.txt np <number of hosts> pwd to see what folder your MPI jobs are running in. 2. Check novoalignmpi is in PATH and all other files are accessible on all servers mpirun f hosts.txt np <number of hosts> which novoalignmpi Linking As of V3.02.03 we have included in the release the files necessary to compile and link your own [CS]MPI. This should allow you to use different release of MPICH, MPICH2, Cray MPICH and perhaps OpenMPI and Intel MPI. It also allows you to configure MPI to use your high speed interconnects. Note. We have reports of novoalignmpi with openmpi hanging so please check and if you have a problem use MPICH. So if you are having trouble using the pre linked versions of the MPI programs you an try linking your own. In the release folder there is a sub folder buildmpi with the following files... README Makefile mpidriver.cc mpidriver.h trace.cc lib/novoaligncsmpi.a lib/novoalignmpi.a lib/bamtools.lic lib/bamtools.a lib/tcmalloc.lic Documentation on how to compile and link the programs Makefile MPI specific source code files. Other system specific routines. Object library for CSMPI Object library for Bamtools license Object library for Bamtools Tcmalloc License

lib/libtcmalloc.a lib/unwind.lic lib/libunwind.a Object library for tcmalloc libunwind License Object library for libunwind Requirements 1. Installed and working MPI development environment. 2. BZIP2 development libraries 3. OpenSSL development libraries 4. HDF5 libraries for novoaligncsmpi