Anthill User Group Meeting, 2015

Size: px
Start display at page:

Download "Anthill User Group Meeting, 2015"

Transcription

1 Agenda Anthill User Group Meeting, Introduction to the machines and the networks 2. Accessing the machines 3. Command line introduction 4. Setting up your environment to see the queues 5. The different queues on the system 6. Queues and jobs 7. Policies to ensure equitable availability of computes to all 8. Submitting simple jobs to the queue 9. Submitting array jobs to the queue 10. Local data vs. scratch data 11. Blast, lastal, rapsearch, diamond and similar programs 12. Assembly 13. prinseq 14. focus 15. Moving data on and off the cluster. 16. Backing up data Introduction to the machines and the networks There are only a few machines that you need to worry about: anthill.sdsu.edu, rambox, edwardsdata.sdsu.edu and possibly rohan.sdsu.edu. I'll explain how everything fits together and when to use each. For most things, you are just going to want to use anthill.sdsu.edu for all your work. Accessing the machines SSH Mac OSX / Linux use Terminal Windows use Putty: Accessing from on campus: ssh username@anthill.sdsu.edu or fill in username and anthill.sdsu.edu 1

2 Accessing from off campus, use edwards-data.sdsu.edu or rohan.sdsu.edu: ssh -P 7010 Note: you have to use port 7010 to access edwards-data from off campus. For the more advanced people don't worry about this if it is too confusing screen to run virtual terminals so that you can start a command and go away, come back later and it will be running. Use screen -DR to reconnect to an existing screen. The main screen commands you need to know are: ctrl-a ctrl-c ctrl-a ctrl-n ctrl-a ctrl-p create a new screen move to the next screen move to the previous screen Basic LINUX Commands All our servers run linux (almost all run CentOS version 6), and so you have to get used to moving around in LINUX. We have a couple of cheat sheets to share with you. Here are a couple of resources that it is worth working through: Learn UNIX in 10 minutes: UNIX Tutorials for beginners: Linux Commands: We will work through some of these commands in the workshop:

3 A simple text editor on almost every system is nano Setting up your environment to see the queues We have queues to work on the cluster, and you need to set it up so you can see the queue. The main command to see what is running on the queue is qstat or for a more detailed view you can use qstat -f The different queues on the system Anthill has three queues that you can use: default The default queue has 35 machines, each with 16 processors and 128 GB RAM. Each processor runs an independent job, so you can run 560 jobs simultaneously on these machines. This queue is in eternally friendly mode, and all jobs are run on a first-in first-out basis. important The important queue has 4 machines with 16 processors and 128 GB RAM. This queue is for single jobs only. Do not run array jobs on this queue or they will be terminated! The queue is for testing and running individual programs. smallmem This queue has 9 machines each with 8 processors (72 computes total), Each machine has 14 GB RAM, except node1 that has 24 GB RAM. People often forget about this queue, so sometimes it is worth checking! 3

4 Policies to ensure equitable availability of computes to all Be mellow. Be nice to others. Listen to your mother. Submitting simple jobs to the queue You need a simple shell script (which can be very easy) and then you need to use this command to submit the job: qsub -cwd./scriptname.sh You can also add some other commands. Common commands include: -q use a different queue (e.g. -q important or -q smallmem) -o where to put the output file -e where to put the error file Examples: Submit a job to the cluster using the default queue: qsub -cwd./scriptname.sh Submit a job to the cluster using the important queue: qsub -cwd -q important./scriptname.sh Submitting array jobs to the queue To submit an array job we add the flag -t to our qsub command -t 1-100:1 submit an array job. This will submit jobs from in increments of 1 With an array job, scriptname.sh gets passed a special variable called $SGE_TASK_ID that is the number of the job it is running. I will provide you with some template code to process every file in a directory. Submit an array job to the cluster, and redirect output and error files to a directory: mkdir sge qsub -cwd -e sge -o sge -t 1:540:1./scriptname.sh 4

5 Naming files SGE submission scripts: Please don't use run.sh or job.sh Limit file names to 10 meaningful characters Include some notion of the command (rapsearch, blast, etc) Potentially start everything with s_ Fasta files Use.fna for nucleotide Use.faa for protein Output files Append.blastp,.blastn,.rapsearch2,.lastal etc so that you remember what you did Remembering what you did Make a file called how-to.txt and copy and paste your commands into it. Annotate the file so you remember what the commands are doing. If you forget to do that, you can make one like this history > how-to.txt and then edit the how-to.txt file: nano how-to.txt NFS data vs. scratch data There are two ways of housing your data. Your home directory is on a different file server machine, and so anything in there will need to be transported to the machine where the computations are done, and then the results will need to be moved back again. As an alternative, each of our computers has a local hard drive with space in /scratch. This is common space, that anyone can use, and thus has the problem that it fills up. Opinions vary on the importance of using your home directory versus /scratch. I keep everything on the file server and compute off of that. Kate and Rob S. move their data to the /scratch space and compute off of that. Certainly if you have something large that you routinely compute against (e.g. the nr database) it would be good to house it on /scratch. Also, if you have a lot of jobs running that require network IO, occassionaly it is better to use /scratch to avoid network issues. However, if you don't 5

6 know what you are doing, don't worry about it. The time saving will be very small (I think insignificant in the overall scheme of things), so don't waste your time! Blast, lastal, rapsearch, diamond and similar programs Its time to move away from BLAST, there are some really good alternatives that we have installed on the cluster. For DNA-DNA (instead of blastn) comparisons we (currently) recommend: For protein-protein (instead of blastp) comparisons we (currently) recommend: LASTAL For protein-dna (instead of blastx) comparisons we (currently) recommend: rapsearch2 We will work through examples of all of these. BLAST Even though it is time to move away from it, I suppose people still want to do it. You will need to format your database using the makeblastdb command (unless you use one that is already formatted). The commands that you want to run are: /usr/local/blast+/bin/blastn, /usr/local/blast+/bin/blastp, /usr/local/blast+/bin/blastx, etc. We have a trivial blast solution for you, it is a script that takes your fasta file, splits it up into a series of smaller files, and then runs your specified blast program against that file. split_blast_queries_blastplus If you run the command without any options you get this help output: 6

7 /home3/redwards/bioinformatics/cluster/split_lastal_queries.pl <options> -f file to split -n number to break into -d destination directory (default = ".") -p matrix: BL62 (for protein/protein or BL80 for DNA/protein) -db lastal database -ex lastal executable location (default is /usr/local/last/bin/) -N job name (default is s_lastal) -rev reverse the order of files that are submitted to the queue. (i.e. so you can run twice and start from the end backwards!) -v verbose Other things will be used as lastal options. Unless -db is provided we will just split and stop Basically the main options you need are -f for the fasta file, -n for the number to break it into, -d for a directory to put the results, -p for the blast program to run, -db for the database to compare to. If you want to add options to blast+ you can add them at the end of the command. For example: split_blast_queries_blastplus -f Nudibranch_S7_L001_R1_001.fasta -n 200 -d nudi -p blastx -db /home/db/blast/nr/nr -evalue 1e-5 This creates a directory called nudi and outputs files into there, including the blast output files. Since this is blastx, the blast output files all end with blastx. (That will change if you use blastn or blastp, of course.) Once the blast is complete you can concatenate all the blast output files using the cat command: cat nudi/*.blastx > nudibranch.blastx Now I have a single file called nudibranch.blastx that has all the blast output. LASTAL To use LASTAL you first need to format the database. If you have a protein database you need to specify the the infile is protein with "-p". Then specify the new database name and the location for the fasta formatted input file. lastdb -p nr nr.faa 7

8 This will create several files, and also possibly break up the database into ~20GB blocks. To run LASTAL you need the location of the database, followed by the QUERYFILE, followed by which score matrix to use (I originally was using BL62, but the developer recommended BL80 for short sequences), followed by the output format (0 for tabular). lastal nr QUERYFILE.faa -p BL80 -f 0 A sample SGE script looks like: #!/bin/bash lastal /usr/data/kate/nr/nr QUERYFILE.faa -p BL80 -f 0 We also have a trivial lastal command that is based on the blast command above: split_lastal_queries which has a similar help profile: /home3/redwards/bin/split_lastal_queries <options> -f file to split -n number to break into -d destination directory (default = ".") -p matrix: BL62 (for protein/protein or BL80 for DNA/protein) -db lastal database -ex lastal executable location (default is /usr/local/last/bin/) -N job name (default is s_lastal) -rev reverse the order of files that are submitted to the queue. (i.e. so you can run twice and start from the end backwards!) -v verbose Other things will be used as lastal options. Unless -db is provided we will just split and stop 8

9 You use the command in the same way. although the options are slightly different (-p for the pairwise matrix): split_lastal_queries -f Nudibranch_S7_L001_R1_001.fasta -n 200 -d nudi -p BL80 -db /home/db/lastal/nr-lastal/nr Again, this will result in a directory of output files, and you can concatenate them as before. RAPSearch2, Reduced Alphabet based Protein similarity Search RAPSearch is about 100 times faster than BLAST and in single thread mode requires up to 2G memory. Its a great replacement for blastx, but is slightly less sensitive than blastx (especially in the fast mode) so you may miss some rare matches. There are two steps to running rapsearch2: Formating the database prerapsearch -d Fasta_File -n DatabaseName Running RAPSearch2 rapsearch -a 1 -q FASTA/FASTQ -d DatabaseName -o OUTPUT -v NumberDbSequences -z NumberThreads -e evalue -b 0 -s f -a 1 set the program to its fast mode -a 0 runs the program in its sensitive mode -b 0 sets the program to not write any sequence alignments -s f sets the program to use E value in the same format as BLAST Sequence assembly We currently recommend (and use) the St. Petersburg assembler, SPAdes: This runs fine on the cluster for most sequences, including metagenomes. If you run out of memory we can run it on rambox contact Rob and I'll work with you on that. to run this on the cluster, I put this in my script file and then submit it to the queue: /home3/redwards/bin/spades/spades linux/bin/spades.py -o spades.assembly --careful --dataset files.yaml 9

10 and files.yaml contains: [ ] { } orientation: "fr", type: "paired-end", right reads: [ "/home3/redwards/johnkirby/rob/1291/fastq/sample4-1291wt_s4_l001_r1_001.fastq", /home3/redwards/johnkirby/rob/1291/fastq/1_1291wt_4_cttgta_l001_r1_001.fastq", "/home3/redwards/johnkirby/rob/1291/fastq/2_1291_0177_4_cgatgt_l001_r1_001.fastq", ], left reads: [ "/home3/redwards/johnkirby/rob/1291/fastq/sample4-1291wt_s4_l001_r2_001.fastq", "/home3/redwards/johnkirby/rob/1291/fastq/1_1291wt_4_cttgta_l001_r2_001.fastq", "/home3/redwards/johnkirby/rob/1291/fastq/2_1291_0177_4_cgatgt_l001_r2_001.fastq", ] 10

11 Prinseq You can run prinseq-lite.pl on the cluster. For example, to generate the report, you need to put this in your file: perl prinseq-lite.pl -verbose -fastq test.fq -graph_data test.gd -out_good null -out_bad null You can then upload the test.gd file to the website to see the report. focus Geni will show you! Moving data on and off the cluster Use scp from the command line. For Windows try WinSCP ( SSH Secure ( For Mac try CyberDuck: ( Rbrowser ( Backing Up Data Your data is NOT backed up. It is your responsibility to back it up to an external hard drive or another source. DO NOT RELY ON US TO PRESERVE YOUR DATA!!! Parting thoughts! Be mellow, be nice to others, everyone uses the resources. 11

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:

More information

Sequence Alignment: BLAST

Sequence Alignment: BLAST E S S E N T I A L S O F N E X T G E N E R A T I O N S E Q U E N C I N G W O R K S H O P 2015 U N I V E R S I T Y O F K E N T U C K Y A G T C Class 6 Sequence Alignment: BLAST Be able to install and use

More information

How to Run NCBI BLAST on zcluster at GACRC

How to Run NCBI BLAST on zcluster at GACRC How to Run NCBI BLAST on zcluster at GACRC BLAST: Basic Local Alignment Search Tool Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 OVERVIEW What is BLAST?

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

Name Department/Research Area Have you used the Linux command line?

Name Department/Research Area Have you used the Linux command line? Please log in with HawkID (IOWA domain) Macs are available at stations as marked To switch between the Windows and the Mac systems, press scroll lock twice 9/27/2018 1 Ben Rogers ITS-Research Services

More information

New User Tutorial. OSU High Performance Computing Center

New User Tutorial. OSU High Performance Computing Center New User Tutorial OSU High Performance Computing Center TABLE OF CONTENTS Logging In... 3-5 Windows... 3-4 Linux... 4 Mac... 4-5 Changing Password... 5 Using Linux Commands... 6 File Systems... 7 File

More information

ChIP-seq Analysis Practical

ChIP-seq Analysis Practical ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how

More information

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011) UoW HPC Quick Start Information Technology Services University of Wollongong ( Last updated on October 10, 2011) 1 Contents 1 Logging into the HPC Cluster 3 1.1 From within the UoW campus.......................

More information

Linux Introduction to Linux

Linux Introduction to Linux Linux Introduction to Linux Most computational biologists use either Apple Macs or Linux machines. There are a couple of reasons for this: * Much of the software is free * Many of the tools require a command

More information

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.

More information

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin High Performance Computing (HPC) Club Training Session Xinsheng (Shawn) Qin Outline HPC Club The Hyak Supercomputer Logging in to Hyak Basic Linux Commands Transferring Files Between Your PC and Hyak Submitting

More information

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding

More information

CS CS Tutorial 2 2 Winter 2018

CS CS Tutorial 2 2 Winter 2018 CS CS 230 - Tutorial 2 2 Winter 2018 Sections 1. Unix Basics and connecting to CS environment 2. MIPS Introduction & CS230 Interface 3. Connecting Remotely If you haven t set up a CS environment password,

More information

Intro to Linux. this will open up a new terminal window for you is super convenient on the computers in the lab

Intro to Linux. this will open up a new terminal window for you is super convenient on the computers in the lab Basic Terminal Intro to Linux ssh short for s ecure sh ell usage: ssh [host]@[computer].[otheripstuff] for lab computers: ssh [CSID]@[comp].cs.utexas.edu can get a list of active computers from the UTCS

More information

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version... Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing

More information

PARALLEL COMPUTING IN R USING WESTGRID CLUSTERS STATGEN GROUP MEETING 10/30/2017

PARALLEL COMPUTING IN R USING WESTGRID CLUSTERS STATGEN GROUP MEETING 10/30/2017 PARALLEL COMPUTING IN R USING WESTGRID CLUSTERS STATGEN GROUP MEETING 10/30/2017 PARALLEL COMPUTING Dataset 1 Processor Dataset 2 Dataset 3 Dataset 4 R script Processor Processor Processor WHAT IS ADVANCED

More information

Whole genome assembly comparison of duplication originally described in Bailey et al

Whole genome assembly comparison of duplication originally described in Bailey et al WGAC Whole genome assembly comparison of duplication originally described in Bailey et al. 2001. Inputs species name path to FASTA sequence(s) to be processed either a directory of chromosomal FASTA files

More information

Joint High Performance Computing Exchange (JHPCE) Cluster Orientation.

Joint High Performance Computing Exchange (JHPCE) Cluster Orientation. Joint High Performance Computing Exchange (JHPCE) Cluster Orientation http://www.jhpce.jhu.edu/ Schedule - Introductions who are we, who are you? - Terminology - Logging in and account setup - Basics of

More information

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Contents User access, logging in Linux/Unix

More information

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th Unix Essentials BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th 2016 http://barc.wi.mit.edu/hot_topics/ 1 Outline Unix overview Logging in to tak Directory structure

More information

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011 Oregon State University School of Electrical Engineering and Computer Science CS 261 Recitation 1 Spring 2011 Outline Using Secure Shell Clients GCC Some Examples Intro to C * * Windows File transfer client:

More information

sftp - secure file transfer program - how to transfer files to and from nrs-labs

sftp - secure file transfer program - how to transfer files to and from nrs-labs last modified: 2017-01-20 p. 1 CS 111 - useful details: ssh, sftp, and ~st10/111submit You write Racket BSL code in the Definitions window in DrRacket, and save that Definitions window's contents to a

More information

LAB #5 Intro to Linux and Python on ENGR

LAB #5 Intro to Linux and Python on ENGR LAB #5 Intro to Linux and Python on ENGR 1. Pre-Lab: In this lab, we are going to download some useful tools needed throughout your CS career. First, you need to download a secure shell (ssh) client for

More information

RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL

RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL While you can probably write a reasonable program that carries out molecular dynamics (MD) simulations, it s sometimes more efficient

More information

Using Sapelo2 Cluster at the GACRC

Using Sapelo2 Cluster at the GACRC Using Sapelo2 Cluster at the GACRC New User Training Workshop Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Sapelo2 Cluster Diagram

More information

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Linux/Unix basic commands Basic command structure:

More information

By Ludovic Duvaux (27 November 2013)

By Ludovic Duvaux (27 November 2013) Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ==========

More information

For Dr Landau s PHYS8602 course

For Dr Landau s PHYS8602 course For Dr Landau s PHYS8602 course Shan-Ho Tsai (shtsai@uga.edu) Georgia Advanced Computing Resource Center - GACRC January 7, 2019 You will be given a student account on the GACRC s Teaching cluster. Your

More information

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 2: SEP. 8TH INSTRUCTOR: JIAYIN WANG

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 2: SEP. 8TH INSTRUCTOR: JIAYIN WANG CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 2: SEP. 8TH INSTRUCTOR: JIAYIN WANG 1 Notice Class Website http://www.cs.umb.edu/~jane/cs114/ Reading Assignment Chapter 1: Introduction to Java Programming

More information

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen Introduction to Unix The Windows User perspective Wes Frisby Kyle Horne Todd Johansen What is Unix? Portable, multi-tasking, and multi-user operating system Software development environment Hardware independent

More information

Read mapping with BWA and BOWTIE

Read mapping with BWA and BOWTIE Read mapping with BWA and BOWTIE Before We Start In order to save a lot of typing, and to allow us some flexibility in designing these courses, we will establish a UNIX shell variable BASE to point to

More information

Lab 1 Introduction to UNIX and C

Lab 1 Introduction to UNIX and C Name: Lab 1 Introduction to UNIX and C This first lab is meant to be an introduction to computer environments we will be using this term. You must have a Pitt username to complete this lab. NOTE: Text

More information

A Brief Introduction to The Center for Advanced Computing

A Brief Introduction to The Center for Advanced Computing A Brief Introduction to The Center for Advanced Computing May 1, 2006 Hardware 324 Opteron nodes, over 700 cores 105 Athlon nodes, 210 cores 64 Apple nodes, 128 cores Gigabit networking, Myrinet networking,

More information

A Brief Introduction to The Center for Advanced Computing

A Brief Introduction to The Center for Advanced Computing A Brief Introduction to The Center for Advanced Computing February 8, 2007 Hardware 376 Opteron nodes, over 890 cores Gigabit networking, Myrinet networking, Infiniband networking soon Hardware: nyx nyx

More information

Practical Linux examples: Exercises

Practical Linux examples: Exercises Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,

More information

Part I. UNIX Workshop Series: Quick-Start

Part I. UNIX Workshop Series: Quick-Start Part I UNIX Workshop Series: Quick-Start Objectives Overview Connecting with ssh Command Window Anatomy Command Structure Command Examples Getting Help Files and Directories Wildcards, Redirection and

More information

Programming introduction part I:

Programming introduction part I: Programming introduction part I: Perl, Unix/Linux and using the BlueHive cluster Bio472- Spring 2014 Amanda Larracuente Text editor Syntax coloring Recognize several languages Line numbers Free! Mac/Windows

More information

MetaPhyler Usage Manual

MetaPhyler Usage Manual MetaPhyler Usage Manual Bo Liu boliu@umiacs.umd.edu March 13, 2012 Contents 1 What is MetaPhyler 1 2 Installation 1 3 Quick Start 2 3.1 Taxonomic profiling for metagenomic sequences.............. 2 3.2

More information

CS 261 Recitation 1 Compiling C on UNIX

CS 261 Recitation 1 Compiling C on UNIX Oregon State University School of Electrical Engineering and Computer Science CS 261 Recitation 1 Compiling C on UNIX Winter 2017 Outline Secure Shell Basic UNIX commands Editing text The GNU Compiler

More information

ITCS 4145/5145 Assignment 2

ITCS 4145/5145 Assignment 2 ITCS 4145/5145 Assignment 2 Compiling and running MPI programs Author: B. Wilkinson and Clayton S. Ferner. Modification date: September 10, 2012 In this assignment, the workpool computations done in Assignment

More information

A Brief Introduction to The Center for Advanced Computing

A Brief Introduction to The Center for Advanced Computing A Brief Introduction to The Center for Advanced Computing November 10, 2009 Outline 1 Resources Hardware Software 2 Mechanics: Access Transferring files and data to and from the clusters Logging into the

More information

HORIZONTAL GENE TRANSFER DETECTION

HORIZONTAL GENE TRANSFER DETECTION HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all

More information

Migrating from Zcluster to Sapelo

Migrating from Zcluster to Sapelo GACRC User Quick Guide: Migrating from Zcluster to Sapelo The GACRC Staff Version 1.0 8/4/17 1 Discussion Points I. Request Sapelo User Account II. III. IV. Systems Transfer Files Configure Software Environment

More information

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016 Linux Boot Camp Jack, Matthew, Nishad, Stanley 6 Sep 2016 1 Connecting SSH Windows users: MobaXterm, PuTTY, SSH Tectia Mac & Linux users: Terminal (Just type ssh) andrewid@shark.ics.cs.cmu.edu 2 Let s

More information

CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C

CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C Welcome! Welcome to your CpSc 111 lab! For each lab this semester, you will be provided a document like this to guide you. This material, as

More information

These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data.

These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data. These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data. We have a few different choices for running jobs on DT2 we will explore both here. We need to alter

More information

OBTAINING AN ACCOUNT:

OBTAINING AN ACCOUNT: HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to

More information

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines Introduction to UNIX Logging in Basic system architecture Getting help Intro to shell (tcsh) Basic UNIX File Maintenance Intro to emacs I/O Redirection Shell scripts Logging in most systems have graphical

More information

Short Read Sequencing Analysis Workshop

Short Read Sequencing Analysis Workshop Short Read Sequencing Analysis Workshop Day 2 Learning the Linux Compute Environment In-class Slides Matt Hynes-Grace Manager of IT Operations, BioFrontiers Institute Review of Day 2 Videos Video 1 Introduction

More information

NBIC TechTrack PBS Tutorial

NBIC TechTrack PBS Tutorial NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen Visit our webpage at: http://www.nbic.nl/support/brs 1 NBIC PBS Tutorial

More information

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved. Minnesota Supercomputing Institute Introduction to Job Submission and Scheduling Andrew Gustafson Interacting with MSI Systems Connecting to MSI SSH is the most reliable connection method Linux and Mac

More information

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation

More information

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame 1 When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from

More information

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen 1 NBIC PBS Tutorial This part is an introduction to clusters and the PBS

More information

Using the computational resources at the GACRC

Using the computational resources at the GACRC An introduction to zcluster Georgia Advanced Computing Resource Center (GACRC) University of Georgia Dr. Landau s PHYS4601/6601 course - Spring 2017 What is GACRC? Georgia Advanced Computing Resource Center

More information

Parallel Computing with Matlab and R

Parallel Computing with Matlab and R Parallel Computing with Matlab and R scsc@duke.edu https://wiki.duke.edu/display/scsc Tom Milledge tm103@duke.edu Overview Running Matlab and R interactively and in batch mode Introduction to Parallel

More information

Seminar III: R/Bioconductor

Seminar III: R/Bioconductor Leonardo Collado Torres lcollado@lcg.unam.mx Bachelor in Genomic Sciences www.lcg.unam.mx/~lcollado/ August - December, 2009 1 / 25 Class outline Working with HTS data: a simulated case study Intro R for

More information

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege Sequence Alignment GBIO0002 Archana Bhardwaj University of Liege 1 What is Sequence Alignment? A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.

More information

COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas

COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas First of all connect once again to the CBS system: Open ssh shell client. Press Quick

More information

Notes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017

Notes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Notes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017

More information

Introduction to HPC Resources and Linux

Introduction to HPC Resources and Linux Introduction to HPC Resources and Linux Burak Himmetoglu Enterprise Technology Services & Center for Scientific Computing e-mail: bhimmetoglu@ucsb.edu Paul Weakliem California Nanosystems Institute & Center

More information

2 Algorithm. Algorithms for CD-HIT were described in three papers published in Bioinformatics.

2 Algorithm. Algorithms for CD-HIT were described in three papers published in Bioinformatics. CD-HIT User s Guide Last updated: 2012-04-25 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1 Contents 2 1

More information

Introduction: What is Unix?

Introduction: What is Unix? Introduction Introduction: What is Unix? An operating system Developed at AT&T Bell Labs in the 1960 s Command Line Interpreter GUIs (Window systems) are now available Introduction: Unix vs. Linux Unix

More information

2018/08/16 14:47 1/36 CD-HIT User's Guide

2018/08/16 14:47 1/36 CD-HIT User's Guide 2018/08/16 14:47 1/36 CD-HIT User's Guide CD-HIT User's Guide This page is moving to new CD-HIT wiki page at Github.com Last updated: 2017/06/20 07:38 http://cd-hit.org Program developed by Weizhong Li's

More information

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What

More information

Tips from the experts: How to waste a lot of time on this assignment

Tips from the experts: How to waste a lot of time on this assignment Com S 227 Spring 2018 Assignment 1 100 points Due Date: Friday, September 14, 11:59 pm (midnight) Late deadline (25% penalty): Monday, September 17, 11:59 pm General information This assignment is to be

More information

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line

More information

Protected Environment at CHPC. Sean Igo Center for High Performance Computing September 11, 2014

Protected Environment at CHPC. Sean Igo Center for High Performance Computing September 11, 2014 Protected Environment at CHPC Sean Igo Center for High Performance Computing Sean.Igo@utah.edu September 11, 2014 Purpose of Presentation Overview of CHPC environment / access Actually this is most of

More information

CS 460 Linux Tutorial

CS 460 Linux Tutorial CS 460 Linux Tutorial http://ryanstutorials.net/linuxtutorial/cheatsheet.php # Change directory to your home directory. # Remember, ~ means your home directory cd ~ # Check to see your current working

More information

Why You Should Consider Grid Computing

Why You Should Consider Grid Computing Why You Should Consider Grid Computing Kenny Daily BIT Presentation 8 January 2007 Outline Motivational Story Electric Fish Grid Computing Overview N1 Sun Grid Engine Software Use of UCI's cluster My Research

More information

Introduction in Unix. Linus Torvalds Ken Thompson & Dennis Ritchie

Introduction in Unix. Linus Torvalds Ken Thompson & Dennis Ritchie Introduction in Unix Linus Torvalds Ken Thompson & Dennis Ritchie My name: John Donners John.Donners@surfsara.nl Consultant at SURFsara And Cedric Nugteren Cedric.Nugteren@surfsara.nl Consultant at SURFsara

More information

Supercomputing environment TMA4280 Introduction to Supercomputing

Supercomputing environment TMA4280 Introduction to Supercomputing Supercomputing environment TMA4280 Introduction to Supercomputing NTNU, IMF February 21. 2018 1 Supercomputing environment Supercomputers use UNIX-type operating systems. Predominantly Linux. Using a shell

More information

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009 ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009 What is ACEnet? Shared resource......for research computing... physics, chemistry, oceanography, biology, math, engineering,

More information

Introduction to the Linux Command Line

Introduction to the Linux Command Line Introduction to the Linux Command Line May, 2015 How to Connect (securely) ssh sftp scp Basic Unix or Linux Commands Files & directories Environment variables Not necessarily in this order.? Getting Connected

More information

BLAST. Jon-Michael Deldin. Dept. of Computer Science University of Montana Mon

BLAST. Jon-Michael Deldin. Dept. of Computer Science University of Montana Mon BLAST Jon-Michael Deldin Dept. of Computer Science University of Montana jon-michael.deldin@mso.umt.edu 2011-09-19 Mon Jon-Michael Deldin (UM) BLAST 2011-09-19 Mon 1 / 23 Outline 1 Goals 2 Setting up your

More information

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced Sarvani Chadalapaka HPC Administrator University of California

More information

Introduction to Discovery.

Introduction to Discovery. Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging

More information

Introduction to Discovery.

Introduction to Discovery. Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging

More information

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux SHELL SCRIPTING Overview Bash, the shell we have used interactively in this course, is a full-fledged scripting language. Unlike Python, Bash is not a general-purpose

More information

diamond Requirements Time Torque/PBS Examples Diamond with single query (simple)

diamond Requirements Time Torque/PBS Examples Diamond with single query (simple) diamond Diamond is a sequence database searching program with the same function as BlastX, but 1000X faster. A whole transcriptome search of the NCBI nr database, for instance, may take weeks using BlastX,

More information

Introduction to UNIX. SURF Research Boot Camp April Jeroen Engelberts Consultant Supercomputing

Introduction to UNIX. SURF Research Boot Camp April Jeroen Engelberts Consultant Supercomputing Introduction to UNIX SURF Research Boot Camp April 2018 Jeroen Engelberts jeroen.engelberts@surfsara.nl Consultant Supercomputing Outline Introduction to UNIX What is UNIX? (Short) history of UNIX Cartesius

More information

Introduction to UNIX

Introduction to UNIX PURDUE UNIVERSITY Introduction to UNIX Manual Michael Gribskov 8/21/2016 1 Contents Connecting to servers... 4 PUTTY... 4 SSH... 5 File Transfer... 5 scp secure copy... 5 sftp

More information

Introduction to Scripting using bash

Introduction to Scripting using bash Introduction to Scripting using bash Scripting versus Programming (from COMP10120) You may be wondering what the difference is between a script and a program, or between the idea of scripting languages

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

Command-Line Data Analysis INX_S17, Day 10,

Command-Line Data Analysis INX_S17, Day 10, Command-Line Data Analysis INX_S17, Day 10, 2017-05-01 Assignment 4 (quiz). sort, head, tail Learning Outcome(s): Use `sort` to build filtering pipelines for bioinformatics data Matthew Peterson, OSU CGRB,

More information

High Performance Computing (HPC) Using zcluster at GACRC

High Performance Computing (HPC) Using zcluster at GACRC High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?

More information

CSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209

CSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209 CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System

More information

Introduction to Linux Environment. Yun-Wen Chen

Introduction to Linux Environment. Yun-Wen Chen Introduction to Linux Environment Yun-Wen Chen 1 The Text (Command) Mode in Linux Environment 2 The Main Operating Systems We May Meet 1. Windows 2. Mac 3. Linux (Unix) 3 Windows Command Mode and DOS Type

More information

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC Introduction to HPC Using zcluster at GACRC On-class PBIO/BINF8350 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What

More information

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220 Introduction to HPC Using zcluster at GACRC On-Class GENE 4220 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 OVERVIEW GACRC

More information

Assessing Transcriptome Assembly

Assessing Transcriptome Assembly Assessing Transcriptome Assembly Matt Johnson July 9, 2015 1 Introduction Now that you have assembled a transcriptome, you are probably wondering about the sequence content. Are the sequences from the

More information

Working with GIT. Florido Paganelli Lund University MNXB Florido Paganelli MNXB Working with git 1/47

Working with GIT. Florido Paganelli Lund University MNXB Florido Paganelli MNXB Working with git 1/47 Working with GIT MNXB01 2017 Florido Paganelli Lund University florido.paganelli@hep.lu.se Florido Paganelli MNXB01-2017 - Working with git 1/47 Required Software Git - a free and open source distributed

More information

Bioinformatics Facility at the Biotechnology/Bioservices Center

Bioinformatics Facility at the Biotechnology/Bioservices Center Bioinformatics Facility at the Biotechnology/Bioservices Center Co-Heads : J.P. Gogarten, Paul Lewis Facility Scientist : Pascal Lapierre Hardware/Software Manager: Jeff Lary Mandate of the Facility: To

More information

VERY SHORT INTRODUCTION TO UNIX

VERY SHORT INTRODUCTION TO UNIX VERY SHORT INTRODUCTION TO UNIX Tore Samuelsson, Nov 2009. An operating system (OS) is an interface between hardware and user which is responsible for the management and coordination of activities and

More information

Ruby on Rails Welcome. Using the exercise files

Ruby on Rails Welcome. Using the exercise files Ruby on Rails Welcome Welcome to Ruby on Rails Essential Training. In this course, we're going to learn the popular open source web development framework. We will walk through each part of the framework,

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support

More information

Unix basics exercise MBV-INFX410

Unix basics exercise MBV-INFX410 Unix basics exercise MBV-INFX410 In order to start this exercise, you need to be logged in on a UNIX computer with a terminal window open on your computer. It is best if you are logged in on freebee.abel.uio.no.

More information

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is HPC Concept? What is

More information

Introduction to Linux for BlueBEAR. January

Introduction to Linux for BlueBEAR. January Introduction to Linux for BlueBEAR January 2019 http://intranet.birmingham.ac.uk/bear Overview Understanding of the BlueBEAR workflow Logging in to BlueBEAR Introduction to basic Linux commands Basic file

More information

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program Using UNIX. UNIX is mainly a command line interface. This means that you write the commands you want executed. In the beginning that will seem inferior to windows point-and-click, but in the long run the

More information