Programming introduction part I:

Similar documents
Shell Programming. Introduction to Linux. Peter Ruprecht Research CU Boulder

Introduction to Perl. Perl Background. Sept 24, 2007 Class Meeting 6

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Basic Linux (Bash) Commands

New User Tutorial. OSU High Performance Computing Center

Programming Perls* Objective: To introduce students to the perl language.

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Shell Scripting. With Applications to HPC. Edmund Sumbar Copyright 2007 University of Alberta. All rights reserved

Introduction: What is Unix?

CSCI 4152/6509 Natural Language Processing. Perl Tutorial CSCI 4152/6509. CSCI 4152/6509, Perl Tutorial 1

Introduc)on to Unix and Perl programming

Welcome to Research Computing Services training week! November 14-17, 2011

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Introduction to HPC Resources and Linux

Shell. SSE2034: System Software Experiment 3, Fall 2018, Jinkyu Jeong

Perl. Many of these conflict with design principles of languages for teaching.

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

Introduction to Computing V - Linux and High-Performance Computing

Introduc)on to Unix and Perl programming

Perl. Perl. Perl. Which Perl

Linux shell & shell scripting - II

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

IT441. Network Services Administration. Perl: File Handles

Introduction to Linux Basics Part II. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala

Useful Unix Commands Cheat Sheet

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

Introduction to Linux Environment. Yun-Wen Chen

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

NBIC TechTrack PBS Tutorial

Windshield. Language Reference Manual. Columbia University COMS W4115 Programming Languages and Translators Spring Prof. Stephen A.

Introduction to Discovery.

Introduction to Perl. c Sanjiv K. Bhatia. Department of Mathematics & Computer Science University of Missouri St. Louis St.

Introduction to UNIX

Introduction to Discovery.

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Linux Bash Shell Scripting

Using the computational resources at the GACRC

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016

An Introduction to Cluster Computing Using Newton

1) Introduc,on to unix command line and perl. Ma5 Webster IMBIM, BMC

Parameter searches and the batch system

Part 1: Basic Commands/U3li3es

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala

User Guide of High Performance Computing Cluster in School of Physics

Lecture 5. Essential skills for bioinformatics: Unix/Linux

EECS 470 Lab 5. Linux Shell Scripting. Friday, 1 st February, 2018

Practical Linux examples: Exercises

Logging in to the CRAY

Command Interpreters. command-line (e.g. Unix shell) On Unix/Linux, bash has become defacto standard shell.

Introduction to Linux and Cluster Computing Environments for Bioinformatics

Advanced Linux Commands & Shell Scripting

Exercise 1: Basic Tools

unix intro Documentation

Indian Institute of Technology Kharagpur. PERL Part II. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

UNIX for Smar0es. compu0ng environments. Aaron J. Mackey Bill Pearson

Scripting Languages Perl Basics. Course: Hebrew University

Quick Guide for the Torque Cluster Manager

Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly

UF Research Computing: Overview and Running STATA

Unix Basics. Benjamin S. Skrainka University College London. July 17, 2010

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

Using Sapelo2 Cluster at the GACRC

A control expression must evaluate to a value that can be interpreted as true or false.

Introduction to Discovery.

Migrating from Zcluster to Sapelo

Introduction To. Barry Grant

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

Pathologically Eclectic Rubbish Lister

Shells and Shell Programming

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Programming Languages and Uses in Bioinformatics

DATA 301 Introduction to Data Analytics Command Line. Dr. Ramon Lawrence University of British Columbia Okanagan

Why learn the Command Line? The command line is the text interface to the computer. DATA 301 Introduction to Data Analytics Command Line

"Bash vs Python Throwdown" -or- "How you can accomplish common tasks using each of these tools" Bash Examples. Copying a file: $ cp file1 file2

Introduction to Linux for BlueBEAR. January

Assignment 3, Due October 4

Sharpen Exercise: Using HPC resources and running parallel applications

Name Department/Research Area Have you used the Linux command line?

Intermediate Programming, Spring Misha Kazhdan

CS 261 Recitation 1 Compiling C on UNIX

Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 -

By Ludovic Duvaux (27 November 2013)

1. Hello World Bash Shell Script. Last Updated on Wednesday, 13 April :03

Linux Command Line Interface. December 27, 2017

COMS 3101 Programming Languages: Perl. Lecture 2

CS Unix Tools & Scripting

The Unix Shell. Pipes and Filters

Introduction to Supercomputing

Introductory Perl. What is Perl?

Bash scripting Tutorial. Hello World Bash Shell Script. Super User Programming & Scripting 22 March 2013

CS 25200: Systems Programming. Lecture 10: Shell Scripting in Bash

Perl and R Scripting for Biologists

The DTU HPC system. and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results.

Bash scripting basics

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

CSE 374 Midterm Exam 11/2/15. Name Id #

Shells and Shell Programming

Transcription:

Programming introduction part I: Perl, Unix/Linux and using the BlueHive cluster Bio472- Spring 2014 Amanda Larracuente

Text editor Syntax coloring Recognize several languages Line numbers Free! Mac/Windows GNU emacs http://www.gnu.org/software/emacs/ Mac Xcode OR: nano Type text, ctrl+o to save

Log into BlueHive ssh username@bluehive.rochester.edu OR ssh bluehive.crc.rochester.edu l username Go to class directory: /scratch/bio472_2014/ Download problem set #1 from: http://blogs.rochester.edu/selfishdna/ (go to courses)

Basic Unix/Linux commands Reference sheet is under Courses tab at: http://blogs.rochester.edu/selfishdna/ cd dir (change directory to dir) cd.. (go up one directory) ls (list contents of the directory) ls *.txt (list all files ending in.txt) ls s (show file sizes) pwd (show path to current directory) du (show directory space usage) wc l (print the number of lines in a file) cat file.txt (print the contents of file.txt) cat file1.txt file2.txt > file3.txt (concatenate file1 and file2 into file3.txt) grep pattern file (find all instances of pattern in file) grep > test.fa wc l (count # of fasta sequences in test.fa)

Practical Extraction and Report Language (Perl) Free high-level programming language Do you have Perl v5.0 or later on your system? Open terminal and type: perl v

Using BlueHive Go here: https://www.circ.rochester.edu/wiki/index.php/getting_started For graphical applications (e.g. R): Mac: Open Xquartz Application->Terminal Login to BlueHive with Y user@bluehive.crc.rochester.edu Module load R-3.0.2 R Windows: Get Mobaxterm: http://mobaxterm.mobatek.net/ Use ssh to log into BlueHive For text applications: Use ssh to log into BlueHive and submit PBS script with qsub OR work interactively: e.g. qsub -I -q interactive -l nodes=1:ppn=1 -l walltime=1:00:00

Hello World! Open a text editor, type the following lines and save as a file called Hello_world.pl : #!/usr/bin/perl w print "Hello, world!\n"; Run the program in your terminal by typing: perl Hello_world.pl

Scalars Strings of characters: hello Numbers (integers, floating points): 10 or 10.3458 or 10e7 Can be acted on with operators and will return a scalar: + addition (2+3=5) * multiplication (3*12=36) - Subtraction (5.1-2.4=2.7) % modulus (remainder) (10%3=1) / division (14/2=7) ** exponentiation (2**3=8) Store in scalar variable Declare scalar variable with my : my $scalar

Special characters and comparison operators Special characters \n newline \t tab \s space Comparison operators: Comparison Numeric String Equal == eq Not equal!= ne Less than < lt Greater than > gt Less than or equal to <= le Greater than or equal to >= ge

Loops Perl counts from zero! for (my $i=0; $i<10;$i++) { } print $i, \t ; my $i=0; while ($i<10) { print $i, \n ; $i++; } my $i=0; if ($i <= 10 && $i>6) { print High\n ; } elsif ($i<=6 && $i>3) { print Mid\n ; } else { print Low\n ; }

Arrays Variable that contains a list Create an array called people with elements 0-3 and values Fisher, Wright, Haldane, Mayr. my @people; $people[0]= Fisher ; $people[1]= Wright ; $people[2]= Haldane ; $people[3]= Mayr ; #Get the size of the array my $size=$#people+1; print size:,$size, \n ; #Print the names stored in the array for (my $i=0; $i<$size; $i++) { print $people[$i],"\n ; }

Hashes Hold values indexed by strings Look up values with keys (the index) Create a hash called names, with the keys Fisher, Wright, Haldane and Mayr and the values Ronald, Sewall, J.B.S. and Ernst. my %names $names{ Fisher }= Ronald ; $names{ Wright }= Sewall ; $names{ Haldane }= J.B.S. ; $names{ Mayr }= Ernst ; #Print the names stored in the hash for my $key (keys %names) { print $key,",", $names{$key},"\n ; }

Split Create a string called line with the following text: There is grandeur in this view of life Split the line on spaces and store in an array Print the elements of the array my $line="there is grandeur in this view of life..."; my @array=split(/\s/,$line); for (my $j=0; $j<$#array+1;$j++) { print $array[$j],"\n"; }

Substring Grab a subset of characters in a string substr(string,position,length) Example: Extract the word grandeur from the the following string: "There is grandeur in this view of life... ; my $line="there is grandeur in this view of life..."; my $subset=substr($line,9,8); print $subset, \n ;

Regular expressions Substitutions ~s/target/replacement/ Matches ~m/string/ Char Meaning ^ Beginning of string $ End of string. Any character (except newline) * Match 0+ times + Match 1+ times? Match 0 or 1 times, or shortest match alternative \ Special character \w Matches an alphanumeric character \d Matches a digit \s Matches a whitespace

Some examples if ($line=~m/^>/) #if the string starts with > if ($line=~m/[atcg]/) #if the string contains A or T or C or G if($line=~m/\w/) #if the string matches a word if($line=~m/^\w\d+) #if the string starts with a word and one or more digits

Input/Output and Filehandles Filehandle: an I/O connection between you and Perl Special filehandle names: -STDIN -DATA -STDOUT -ARGV -STDERR -ARGVOUT Write a program that 1. Takes the name of a file in on the command line 2. Opens the file and iterates through each line 3. For each line, creates a new string that substitutes most for MOST

Example 1: Input and Output #!/usr/bin/perl -w ############################################################################### # # Amanda Larracuente # Program written to play with I/O and filehandles # # example usage: IO_lesson.pl file.txt file.out # ############################################################################### my $file=$argv[0]; #name of input file my $outfile=$argv[1]; #name of output file #open the input/output files or die and report the error open(file, "$file") die ("Can't open $file!\n"); open(out,">>$outfile") die ("Can't open $outfile!\n"); foreach my $line(<file>) #for each line in the input file { chomp($line); #remove terminal \n $line=~s/most/most/g; #replace most with MOST print $line,"\n"; #print the new line to the screen print OUT $line,"\n"; #write the new line to an output file } #close input/output files close(file); close(out);

Example 2: While #!/usr/bin/perl -w ############################################################################### # # Amanda Larracuente # Program written to play with I/O and filehandles # # example usage: IO_lesson.pl file.txt # ############################################################################### my $file=$argv[0]; #name of input file #open the input file or die and report the error open(file, "$file") die ("Can't open $file!\n"); while (<FILE>) #for each line in the input file { chomp($_); #remove terminal \n my $new=$_; $new=~s/most/most/g; #replace most with MOST globally print $new,"\n";#print the new line to the screen } close(file); #close input file

Example 3: Grab_reads_from sam.pl Reconstruct a fastq file from an alignment file (SAM file) Type more TestGene.sam (we ll learn more about SAM files in the next lecture) This is a tab-delimited file containing alignment information Each line includes the read sequence and quality in the 10 th and 11 th column, respectively Split each line on the \t and store elements in an array Print out the columns that you need

#!/usr/bin/perl use warnings; use strict; ############################################################################### # # Amanda Larracuente 11/21/13 # Program written to recreate fastq from sam file # # example usage: perl Grab_reads_from_sam.pl MappedReads.sam # ############################################################################### my $samfile=$argv[0]; #name of fasta file to fetch from open(file, "$samfile") print ("Can't open $samfile!\n"); print "File ",$samfile," opened...\n"; my $outfile=$samfile; $outfile=~s/.sam/.reads.fq/g; #make output file name by substituting.sam for READS.fq open(out,">>$outfile") die ("Can't open $outfile!\n"); foreach my $line(<file>) #for each line in sam file { chomp($line); #get rid of "\n" at the end of each line if ($line=~m/^@/) {next;} #if the line starts with @, skip because this is a comment in the sam file else #this must be the lines containing alignment information { read name my @linearray=split(/\t/,$line); #split the line on tabs and store in an array my $read_name=$linearray[0]; #so now the first element of the array corresponds to t my $seq=$linearray[9]; #get the read sequence my $qual=$linearray[10]; #get the base qualities #Make a new fastq file containing the reads print OUT "@",$read_name,"\n",$seq,"\n+\n",$qual,"\n"; } } close(out); close(file);

Some useful and quick commands What do these do? Try it with TestGene.READS.fq! cat test.fastq perl -e '$i=0;while(<>){if(/^\@/&&$i==0){s/^\@/\>/;print;} elsif($i==1){print;$i=-3}$i++;}' > test.fasta cat test.fastq perl -e '$i=0;while(<>){if(/^\+/&&$i==2){print;}elsif ($i==3){print;$i=-1}$i++;}' > test.qual

Perl resources O Reilly Perl Books Perl monks website: http://www.perlmonks.org/

Shell scripting Interface between the user and the Linux/Unix system BASH Use PBS scripts to submit computationally intensive jobs to BlueHive

BlueHive: a typical PBS script To run bowtie2: #!/bin/bash #PBS -q standard #PBS -l nodes=1:ppn=4 #PBS -l walltime=4:00:00 #PBS -l pvmem=4000mb #PBS -j oe #PBS -N bowtie2.align #PBS -o bowtie2.align.srr189053.log cd $PBS_O_WORKDIR source /usr/local/modules/init/bash module load bowtie-2.0.6 mkdir bowtie_srr189053 bowtie2 --phred64 --sensitive -p 4 x RanGAP_generegion -q -1 SRR189053_1_val_1.fq -2 SRR189053_2_val_2.fq U SRR189053_unpaired.fq S bowtie_srr189053/srr189053.sam

BlueHive: a typical PBS script To run bowtie2: Use the bash shell #!/bin/bash #PBS -q standard #PBS -l nodes=1:ppn=4 #PBS -l walltime=4:00:00 #PBS -l pvmem=4000mb #PBS -j oe #PBS -N bowtie2.align #PBS -o bowtie2.align.srr189053.log cd $PBS_O_WORKDIR source /usr/local/modules/init/bash module load bowtie-2.0.6 mkdir bowtie_srr189053 Change to pwd Load bowtie2 and dependencies Request 4 processors on a single standard node Request 4GB of RAM and 4 hours of wall time Make a directory to store your output Qstat will show your job as bowtie2.align and create this log file when completed, with run details bowtie2 --phred64 --sensitive -p 4 x RanGAP_generegion -q -1 SRR189053_1_val_1.fq -2 SRR189053_2_val_2.fq U SRR189053_unpaired.fq S bowtie_srr189053/srr189053.sam Your bowtie command

BlueHive: the queue The more resources you request, the longer you will wait in the queue To submit job: Type: qsub jobname.pbs To check on jobs: Type: qstat u user_name To kill a job: Type: qdel job_id

BlueHive dos and don ts DO: Use PBS scripts and qsub to run all jobs Store all of your output in /scratch/username Remember that /scratch is not backed up, so move files that you need DON T: Run a script on the command line (this uses the head node) unless in interactive mode Store intermediate output in ~/username (limited space)

System commands in Perl system( command"); e.g. system( rm file ); system( mv file dir );

File scripting in Perl my $filevar = <<ENDFILE; File contents ENDFILE Example: /scratch/bio472_2014/example_scripts/example_scripter.pl

sed Compact, but powerful! sed 's/string1/string2/g Replace string1 with string2 sed 's/[ \t]*$//' eliminate whitespace at end of line sed -n '10p Print 10 th line

Awk Compact, but powerful! Print every other line in file: awk '!(NR % 2)' testmatrix.txt Print average of 2 nd column: cat testmatrix.txt awk 'BEGIN {max=0} {sum+=$2} END {print "Average qual: "sum/nr}'