Maruyama et al. SUPPLEMENTARY SCRIPTS. Script S1: PeakMarker.plx Script S2: SiteWriter_CFD.plx
|
|
- Milton Boone
- 5 years ago
- Views:
Transcription
1 Maruyama et al. SUPPLEMENTARY SCRIPTS Script S1: PeakMarker.plx Script S2: SiteWriter_CFD.plx To use: cut all text between (but not including) tracts and paste into a new file using the code/text editor of your choice. Save As using the script name. Create /in and /out directories and edit paths in SET THE VARIABLES BELOW AS REQUIRED section. Set other variables as required. Script S1: !/usr/bin/perl Written: Nick Kent, Aug 2010 Last updated: Nick Kent, 8th Apr 2012 USAGE:- perl PeakMarker.plx This script takes an.sgr file as an input, and calls peak centre/summit bins above a single, but scalable, noise threshold. It is, therefore a very simple peak calling program. It outputs an.sgr listing these bin positions with a y-axis value proportional to the scaled summit bin read frequency. The scaling value can be altered to reflect differences in read depth between two experiments. use strict; use warnings; use Math::Round; SET THE VARIABLES BELOW AS REQUIRED $indir_path - The directory containing the.sgr files to be processed $outdir_path - The directory to store the.sgr peak output files $thresh - The aligned read number noise threshold value $scale_factor - A proportion based on differences in read depth my $indir_path ="/sgr_in"; my $outdir_path ="/peaks_out";
2 my $thresh = 10; my $scale_factor = 1.00; MAIN PROGRAM define some variables my (@files, $infile, $outfile, store input file names in an array opendir(dir, $indir_path) die "Unable to access file at: $indir_path = readdir(dir); process each input file within the indir_path in turn foreach $infile (@files){ ignore hidden files and only get those ending.sgr if (($infile!~ /^\.+/) && ($infile =~ /.*\.sgr/)){ define outfile name from infile name $outfile = substr($infile,0,-4)."_peak_t".$thresh; $outfile.= '.sgr'; print out some useful info print ("\nprocessing '".$infile."'\n"); open(in, "$indir_path/$infile") die "Unable to open $infile: $!"; define three new arrays to store required values from infile loop through infile to get values while(<in>){ split line by delimiter and store elements in an = split('\t',$_); store the columns we want in two new arrays push(@chr,$line[0]); push(@bins,$line[1]); push(@freq,$line[2]);
3 close in file handle close(in); store size of array my $size try and open output file open(out,"> $outdir_path/$outfile") die "Unable to open $outfile: $!"; need a variable to store line count my $count = 0; this calls the peaks - giving an x-axis bin value ONLY for the peak centre and a y-axis value as the peak hight scaled to some value proportionate to relative read depth for a relevant pair-wise comparison. The logic here is the most simple definition of a"peak". You can fiddle here to make the rules stricter. while ($count < $size){ if (($freq[$count]>=$freq[$count-1]) && ($freq[$count]>=$freq[$count+1]) && ($freq[$count]*$scale_factor>=$thresh)){ print(out $count++; $chr[$count]."\t". $bins[$count]."\t". round($freq[$count]*$scale_factor)."\n"); else{ $count++; close out file handle close(out); Script S2:
4 !/usr/bin/perl Written: Nick Kent, 12th Sept 2010 Last updated: Nick Kent, 19th Apr 2012 USAGE:- perl SiteWriter_CFD.plx FUNCTION: This script takes.txt files containing a list of sites/genomic features (these could be TSSs or TF sites or whatever you want) and compares it with whole-genome, Partn.sgr files. It then outputs CUMULATIVE FREQUENCY DISTRIBUTION values over a user-specified bin range centered on, and surrounding the sites. The output file can be used to plot average chromatin particle environments for different sorts of TSS for example. Sites close to chromosome ends, which would not yield the full range of data are ignored, but reported at the command line. INPUT AND OUTPUT (all tab-delimited): The input.txt files should have four columns: chrn; Site ID; site dyad pos; strand. The input.sgr files should have three columns: chrn; bin pos; pairedread dyad freq. The output.txt file has an input file header and column headers and returns 5 columns: Bin (relative to Site); F strand cumulative freq; R strand cumulative freq; summed F+R cumulative freq; normalised cumulative freq. The idea is to plot the first and last columns as a line graph to produce a TREND GRAPH for the data. Each bins F+R frequencies are normalised to the average F+R frequency for the entire bin window. Note: Use multiple.sgrs and then cat the CFD.txt files for processing in R or Excel - particularly useful for plotting surface landscape graphs. Note: The script handles F and R strand data separately. If you give it all F (or all R) strand sites it will work just fine, however, it will also throw a load of uninitialised variable warnings at the command line. If you find this upsetting, stick a in front of use warnings (below)
5 For development see: Kent et al.,(2011) Chromatin particle spectrum analysis: a method for comparative chromatin structure analysis using paired-end mode next-generation DNA sequencing. NAR 39: e26. use strict; use warnings; use Cwd; use List::Util; SET THE VARIABLES BELOW AS REQUIRED $sgr_indir_path - The directory containing the full genome Partn.sgr files $siteid_indir_path - The directory containing the site list.txt file $outdir_path - The directory to store the output files $bin_window - number of bins surrounding the site of interest. E.g. if you set this to 40 then you will get 40 bins either side of your site - 400bp if you were using 10bp binned data. $bin_size - binning interval of.sgr file in base pairs. $output_scale - controls how many bins are included in the output file. If set to 1 you will get every bin (use this). Set to 3 to output only every third bin in the series.you can use this feature to scale output files derived from input.sgr data with different bin intervals. my $sgr_indir_path ="/Sgr_in"; my $siteid_indir_path ="/Site_in"; my $outdir_path ="/CFD_out"; my $bin_window = 40; my $bin_size = 10; my $output_scale = 1; MAIN PROGRAM
6 define some variables my $cwd = getcwd; my $infile_sgr; my $infile_siteid; my $cfd_outfile; my $sgr_size; my $F_siteID_size; my $R_siteID_size; my %bin_map; my $chr_count; my $descriptor; Get site list and write to an array - from.txt format with four columns: chrn;siteid; site position; F/R store input file name in an array opendir(dir,$siteid_indir_path) die "Unable to access file at: $siteid_indir_path = readdir(dir); process the input file within siteid_indir_path foreach $infile_siteid (@files_siteid){ ignore hidden files and only get those ending.txt if (($infile_siteid!~ /^\.+/) && ($infile_siteid =~ /.*\.txt/)){ $descriptor = substr($infile_siteid,0, -4); print "Found, and processing, $infile_siteid \n"; open(in, "$siteid_indir_path/$infile_siteid") die "Unable to open $infile_siteid: $!"; define strand-specific arrays to store site chromosome no., and position
7 loop through infile to get values while(<in>){ chomp; split line by delimiter and store elements in an = split('\t',$_); store the required chrn, position in two pairs of strandspecific arrays if($line_siteid[3] =~ "F"){ infile if 1 push(@f_site_chr,$line_siteid[0]); push(@f_site_pos,$line_siteid[2]); elsif($line_siteid[3] =~ "R"){ else{ push(@r_site_chr,$line_siteid[0]); push(@r_site_pos,$line_siteid[2]); print "Failed to match strand at $line_siteid[0], $line_siteid[1], $line_siteid[2]\n"; infile if 1 closer close in file handle close(in); closedir(dir); store sizes of the arrays $F_siteID_size $R_siteID_size print "Contains: $F_siteID_size forward strand site IDs; $R_siteID_size reverse strand site IDs\n"; Read in the.sgr file values to three enormous arrays
8 opendir(dir,$sgr_indir_path) die "Unable to access file at: $sgr_indir_path = readdir(dir); process the input file within sgr_indir_path foreach $infile_sgr (@files_sgr){ define some arrays that will be reset during each iteration ignore hidden files and only get those ending.sgr if (($infile_sgr!~ /^\.+/) && ($infile_sgr =~ /.*\.sgr/)){ print "Found, and processing, $infile_sgr \n"; open(in, "$sgr_indir_path/$infile_sgr") die "Unable to open $infile_sgr: $!"; define three new arrays to store the.sgr values from infile loop through infile to get values while(<in>){ chomp; split line by delimiter and store elements in an = split('\t',$_); store the columns we want in the three new arrays push(@sgr_chr,$line_sgr[0]); push(@sgr_bin,$line_sgr[1]); push(@sgr_freq,$line_sgr[2]); close in file handle close(in); store size of bin array $sgr_size print "Contains a whopping: $sgr_size bin values\n";
9 BUILD THE BIN MAP my $map_count = 0; a counter variable Set bottom $bin_map{$sgr_chr[$map_count] = 0; $map_count ++; scan through array and mark the bins where each new chromsomome starts until ($map_count == $sgr_size){ if ($sgr_chr[$map_count] ne $sgr_chr[$map_count-1]){ $bin_map{$sgr_chr[$map_count] = $map_count; $map_count ++; else{ $map_count ++; output the number of chromosome types found as the number of hash keys. $chr_count = keys %bin_map; print "The sgr file contains values for: $chr_count chromosomes\n"; FORWARD STRAND.sgr calculations: some counter variables my $site_count = 0; Counter for each site ID my $bin_count = 0; Counter.sgr bin numbers my $cfd_count = 0; Counter for the cfd arrays my $top_limit = 0; A top limit for $bin_window F.sgr output array for chr F.sgr output array for bin pos F.sgr output array for read freq my $F_out_size = 0; Size of F.sgr output arrays my $i=0; An iterator variable until ($site_count == $F_siteID_size){ until 1 Use %bin_map to jump to correct region of sgr arrays
10 $bin_count = (int($f_site_pos[$site_count]/$bin_size) + $bin_map{$f_site_chr[$site_count]) - 3; this looks mad, but it allows me to recycle all the code from the last version, and takes up any rounding slack which would come from different $bin_size values find an.sgr bin which contains the current site until ($F_site_chr[$site_count] eq $sgr_chr[$bin_count] && $F_site_pos[$site_count] >= $sgr_bin[$bin_count] && $F_site_pos[$site_count] < $sgr_bin[$bin_count +1]){ until 2 until 2 closer $bin_count ++; now that we've found the match, let's write values to the output files set the bin_counter BACK $bin_window places and set the $top_limit $bin_count -= $bin_window; $top_limit = $bin_count + ($bin_window*2); Better test to see if match is close to ends of a chromosome. If so, the reported bins and read freqs will be chaemeric - we don't want this so we will ditch such matches if($f_site_chr[$site_count] ne $sgr_chr[$bin_count] $F_site_chr[$site_count] ne $sgr_chr[$top_limit]){ if 1 print "Can't output forward strand values for $F_site_chr[$site_count] site: $F_site_pos[$site_count]\n"; if 1 closer else { else 1 Push the chrn, bin and freq values to the F.sgr arrays and add values to F cfd freq array until ($bin_count == $top_limit+1){ until 3 push (@F_out_chr,$sgr_chr[$bin_count]); push (@F_out_bin,$sgr_bin[$bin_count]); push (@F_out_freq,$sgr_freq[$bin_count]); $F_cfd_freqsum[$cfd_count] += $sgr_freq[$bin_count];
11 $bin_count ++; $cfd_count ++; until 3 closer else 1 closer $cfd_count = 0; $bin_count = 0; $site_count ++; until 1 closer $F_out_size REVERSE STRAND.sgr calculations: reset the counter variables and define some more arrays $site_count = 0; Counter for each site ID $cfd_count = 0; Counter for the cfd arrays $bin_count = 0; R.sgr output array for chr R.sgr output array for bin pos R.sgr output array for read freq my $R_out_size = 0; Size of F.sgr output arrays until ($site_count == $R_siteID_size){ until 1 Use %bin_map to jump to correct region of sgr arrays $bin_count = (int($r_site_pos[$site_count]/$bin_size) + $bin_map{$r_site_chr[$site_count]) - 3; find an.sgr bin which contains the current site until ($R_site_chr[$site_count] eq $sgr_chr[$bin_count] && $R_site_pos[$site_count] >= $sgr_bin[$bin_count] && $R_site_pos[$site_count] < $sgr_bin[$bin_count +1]){ until 2 until 2 closer $bin_count ++; now that we've found the match, let's write values to the output files set the bin_counter BACK $bin_window places and set the $top_limit $bin_count -= $bin_window; $top_limit = $bin_count + ($bin_window*2);
12 Better test to see if match is close to ends of a chromosome. If so, the reported bins and read freqs will be chaemeric - we don't want this so we will ditch such matches if($r_site_chr[$site_count] ne $sgr_chr[$bin_count] $R_site_chr[$site_count] ne $sgr_chr[$top_limit]){ if 1 print "Can't output reverse strand values for $R_site_chr[$site_count] site: $R_site_pos[$site_count]\n"; if 1 closer else { else 1 Push the chrn, bin and freq values to the R.sgr arrays and add values to R cfd freq array until ($bin_count == $top_limit+1){ until 3 push (@R_out_chr,$sgr_chr[$bin_count]); push (@R_out_bin,$sgr_bin[$bin_count]); push (@R_out_freq,$sgr_freq[$bin_count]); $R_cfd_freqsum[$cfd_count] += $sgr_freq[$bin_count]; $bin_count ++; $cfd_count ++; until 3 closer else 1 closer $cfd_count = 0; $bin_count = 0; $site_count ++; until 1 closer $R_out_size The output file define outfile name and set correct endings $cfd_outfile = substr($infile_sgr,0,-4)."_".$descriptor."_cfd";
13 $cfd_outfile.= '.txt'; try and open the.cfd output file open(out,"> $outdir_path/$cfd_outfile") die "Unable to open $cfd_outfile: $!"; print "Have just created $cfd_outfile\n"; Set counter variables and define new arrays $bin_count = 0; $cfd_count = 0; my $cfd_sum = 0; a sum of sums for normalizing the data my $norm_factor = 0; calced from $cfd_sum my $R_cfd_count = $bin_window*2; array to hold summed F and R strand CFD values array to hold ordered R strand CFD values $bin_count -= $bin_window; until ($bin_count == $bin_window+1){ until 4 re-order reverse strand cfd freqsum values push (@R_cfd, $R_cfd_freqsum[$R_cfd_count]); calculate summed value for both F and R cfd freqsums push (@FandR_cfd, $F_cfd_freqsum[$cfd_count] + $R_cfd_freqsum[$R_cfd_count]); $bin_count ++; $cfd_count ++; $R_cfd_count --; until 4 closer Need to find average read values over bin_window to normalize data $cfd_sum += $_ $norm_factor = $cfd_sum/(($bin_window*2)+1); reset counters once more $bin_count = (0-$bin_window); $cfd_count = 0; print a header for the CFD.txt file so you can read it in Excel print (OUT "Values from $cfd_outfile\n"); print (OUT "CFD sum: $cfd_sum\n"); print (OUT "Normalization Factor: $norm_factor\n"); print column headers
14 print (OUT "Bin"."\t"."F Freq"."\t"."R Freq"."\t"."Comb Freq"."\t"."Norm Freq"."\n"); print data values until ($bin_count == $bin_window+1){ until 5 print(out $bin_count*$bin_size."\t". $F_cfd_freqsum[$cfd_count]."\t". $R_cfd[$cfd_count]."\t". $FandR_cfd[$cfd_count]."\t". $FandR_cfd[$cfd_count]/$norm_factor."\n"); $bin_count += $output_scale; $cfd_count += $output_scale; until 5 closer close.cfd out file handle close(out);
User's guide to ChIP-Seq applications: command-line usage and option summary
User's guide to ChIP-Seq applications: command-line usage and option summary 1. Basics about the ChIP-Seq Tools The ChIP-Seq software provides a set of tools performing common genome-wide ChIPseq analysis
More informationBIOS 546 Midterm March 26, Write the line of code that all Perl programs on biolinx must start with so they can be executed.
1. What values are false in Perl? BIOS 546 Midterm March 26, 2007 2. Write the line of code that all Perl programs on biolinx must start with so they can be executed. 3. How do you make a comment in Perl?
More informationBioinformatics. Computational Methods II: Sequence Analysis with Perl. George Bell WIBR Biocomputing Group
Bioinformatics Computational Methods II: Sequence Analysis with Perl George Bell WIBR Biocomputing Group Sequence Analysis with Perl Introduction Input/output Variables Functions Control structures Arrays
More informationSequence Analysis with Perl. Unix, Perl and BioPerl. Why Perl? Objectives. A first Perl program. Perl Input/Output. II: Sequence Analysis with Perl
Sequence Analysis with Perl Unix, Perl and BioPerl II: Sequence Analysis with Perl George Bell, Ph.D. WIBR Bioinformatics and Research Computing Introduction Input/output Variables Functions Control structures
More informationIT441. Network Services Administration. Perl: File Handles
IT441 Network Services Administration Perl: File Handles Comment Blocks Perl normally treats lines beginning with a # as a comment. Get in the habit of including comments with your code. Put a comment
More informationUnix, Perl and BioPerl
Unix, Perl and BioPerl II: Sequence Analysis with Perl George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Analysis with Perl Introduction Input/output Variables Functions Control structures
More informationProgramming introduction part I:
Programming introduction part I: Perl, Unix/Linux and using the BlueHive cluster Bio472- Spring 2014 Amanda Larracuente Text editor Syntax coloring Recognize several languages Line numbers Free! Mac/Windows
More informationPERL Scripting - Course Contents
PERL Scripting - Course Contents Day - 1 Introduction to PERL Comments Reading from Standard Input Writing to Standard Output Scalar Variables Numbers and Strings Use of Single Quotes and Double Quotes
More informationPerl for Biologists. Practical example. Session 14 June 3, Robert Bukowski. Session 14: Practical example Perl for Biologists 1.
Perl for Biologists Session 14 June 3, 2015 Practical example Robert Bukowski Session 14: Practical example Perl for Biologists 1.2 1 Session 13 review Process is an object of UNIX (Linux) kernel identified
More information# input parameters for the script my ($seq, $start, $window, $max_length) #sequence file, calculation start position, window size, max length
#!/bin/perl use List::Util qw[min max sum]; sub TDD # hash of arrays with thermodynamic parameters for DNA/DNA duplex # hash keys are respective pairs # first array element is enthalpy (dh) # second array
More informationPERL Bioinformatics. Nicholas E. Navin, Ph.D. Department of Genetics Department of Bioinformatics. TA: Dr. Yong Wang
PERL Bioinformatics Nicholas E. Navin, Ph.D. Department of Genetics Department of Bioinformatics TA: Dr. Yong Wang UNIX Background and History PERL Practical Extraction and Reporting Language Developed
More informationm6aviewer Version Documentation
m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.
More informationChromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD system Nature Methods 6, (2009)
ChIP-seq Chromatin immunoprecipitation (ChIP) is a technique for identifying and characterizing elements in protein-dna interactions involved in gene regulation or chromatin organization. www.illumina.com
More informationWelcome to Research Computing Services training week! November 14-17, 2011
Welcome to Research Computing Services training week! November 14-17, 2011 Monday intro to Perl, Python and R Tuesday learn to use Titan Wednesday GPU, MPI and profiling Thursday about RCS and services
More informationProgramming Languages and Uses in Bioinformatics
Programming in Perl Programming Languages and Uses in Bioinformatics Perl, Python Pros: reformatting data files reading, writing and parsing files building web pages and database access building work flow
More informationCOMS 3101 Programming Languages: Perl. Lecture 2
COMS 3101 Programming Languages: Perl Lecture 2 Fall 2013 Instructor: Ilia Vovsha http://www.cs.columbia.edu/~vovsha/coms3101/perl Lecture Outline Control Flow (continued) Input / Output Subroutines Concepts:
More informationmerged_bam => $merged_bam, picard_file => /path/to/lib_picard_insert_size_metrics.txt output_dir => /path/for/output/ });
=head1 Title : &optimize_refs Function: Calculate the ideal distance between the two integration (INT) references (refs) based on insert size (i_size). Returns : A list of reference positions and a # of
More informationSpectroscopic Analysis: Peak Detector
Electronics and Instrumentation Laboratory Sacramento State Physics Department Spectroscopic Analysis: Peak Detector Purpose: The purpose of this experiment is a common sort of experiment in spectroscopy.
More informationPerl. Interview Questions and Answers
and Answers Prepared by Abhisek Vyas Document Version 1.0 Team, www.sybaseblog.com 1 of 13 Q. How do you separate executable statements in perl? semi-colons separate executable statements Example: my(
More informationepigenomegateway.wustl.edu
Everything can be found at epigenomegateway.wustl.edu REFERENCES 1. Zhou X, et al., Nature Methods 8, 989-990 (2011) 2. Zhou X & Wang T, Current Protocols in Bioinformatics Unit 10.10 (2012) 3. Zhou X,
More informationChIP-seq (NGS) Data Formats
ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/
More informationpanda Documentation Release 1.0 Daniel Vera
panda Documentation Release 1.0 Daniel Vera February 12, 2014 Contents 1 mat.make 3 1.1 Usage and option summary....................................... 3 1.2 Arguments................................................
More informationSupplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.
Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome. (a) Indexing strategy: The genome sequence of interest is divided into non-overlapping 12-mers. A Hook table is generated that contains
More informationTiling Assembly for Annotation-independent Novel Gene Discovery
Tiling Assembly for Annotation-independent Novel Gene Discovery By Jennifer Lopez and Kenneth Watanabe Last edited on September 7, 2015 by Kenneth Watanabe The following procedure explains how to run the
More informationData needs to be prepped for loading into matlab.
Outline Preparing data sets CTD Data from Tomales Bay Clean up Binning Combined Temperature Depth plots T S scatter plots Multiple plots on a single figure What haven't you learned in this class? Preparing
More informationPerl for Biologists. Arrays and lists. Session 4 April 2, Jaroslaw Pillardy. Session 4: Arrays and lists Perl for Biologists 1.
Perl for Biologists Session 4 April 2, 2014 Arrays and lists Jaroslaw Pillardy Session 4: Arrays and lists Perl for Biologists 1.1 1 if statement if(condition1) statement; elsif(condition2) statement;
More informationSystems Skills in C and Unix
15-123 Systems Skills in C and Unix Plan Perl programming basics Operators loops, arrays, conditionals file processing subroutines, references Systems programming Command line arguments Perl intro Unix
More informationAppendix B WORKSHOP. SYS-ED/ Computer Education Techniques, Inc.
Appendix B WORKSHOP SYS-ED/ Computer Education Techniques, Inc. 1 Scalar Variables 1. Write a Perl program that reads in a number, multiplies it by 2, and prints the result. 2. Write a Perl program that
More informationInput files: Trim reads: Create bwa index: Align trimmed reads: Convert sam to bam: Sort bam: Remove duplicates: Index sorted, no-duplicates bam:
Input files: 11B-872-3.Ac4578.B73xEDMX-2233_palomero-1.fq 11B-872-3.Ac4578.B73xEDMX-2233_palomero-2.fq Trim reads: java -jar trimmomatic-0.32.jar PE -threads $PBS_NUM_PPN -phred33 \ [...]-1.fq [...]-2.fq
More informationAnalyzing ChIP- Seq Data in Galaxy
Analyzing ChIP- Seq Data in Galaxy Lauren Mills RISS ABSTRACT Step- by- step guide to basic ChIP- Seq analysis using the Galaxy platform. Table of Contents Introduction... 3 Links to helpful information...
More informationMindWare Electromyography (EMG) Analysis User Reference Guide Version Copyright 2011 by MindWare Technologies LTD. All Rights Reserved.
MindWare Electromyography (EMG) Analysis User Reference Guide Version 3.0.12 Copyright 2011 by MindWare Technologies LTD. All Rights Reserved. MindWare EMG 3.0.12 User Guide Internet Support E-mail: sales@mindwaretech.com
More informationIntroduction to Perl. Perl Background. Sept 24, 2007 Class Meeting 6
Introduction to Perl Sept 24, 2007 Class Meeting 6 * Notes on Perl by Lenwood Heath, Virginia Tech 2004 Perl Background Practical Extraction and Report Language (Perl) Created by Larry Wall, mid-1980's
More informationHands-On Perl Scripting and CGI Programming
Hands-On Course Description This hands on Perl programming course provides a thorough introduction to the Perl programming language, teaching attendees how to develop and maintain portable scripts useful
More informationIndian Institute of Technology Kharagpur. PERL Part II. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.
Indian Institute of Technology Kharagpur PERL Part II Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T. Kharagpur, INDIA Lecture 22: PERL Part II On completion, the student will be able
More informationPlot2Excel Manual 1. Plot2Excel Manual. Plot2Excel is a general purpose X-Y plotting tool. All right reserved to Andrei Zaostrovski.
Plot2Excel Manual 1 Plot2Excel Manual Plot2Excel is a general purpose X-Y plotting tool. All right reserved to Andrei Zaostrovski. March 1, 2001 Program Description Plot2Excel is an Excel spreadsheet enhanced
More informationGeneral munging practices
2 General munging practices What this chapter covers: Processes for munging data structure designs Encapsulating business rules The UNIX filter model Writing audit trails 18 Decouple input, munging, and
More informationWhat is PERL?
Perl For Beginners What is PERL? Practical Extraction Reporting Language General-purpose programming language Creation of Larry Wall 1987 Maintained by a community of developers Free/Open Source www.cpan.org
More informationChIP-seq Analysis Practical
ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how
More informationPathologically Eclectic Rubbish Lister
Pathologically Eclectic Rubbish Lister 1 Perl Design Philosophy Author: Reuben Francis Cornel perl is an acronym for Practical Extraction and Report Language. But I guess the title is a rough translation
More informationIntroduction to Perl programmation & one line of Perl program. BOCS Stéphanie DROC Gaëtan ARGOUT Xavier
Introduction to Perl programmation & one line of Perl program BOCS Stéphanie DROC Gaëtan ARGOUT Xavier Introduction What is Perl? PERL (Practical Extraction and Report Language) created in 1986 by Larry
More informationSAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional.
Alignment of NGS reads, samtools and visualization Hands-on Software used in this practical BWA MEM : Burrows-Wheeler Aligner. A software package for mapping low-divergent sequences against a large reference
More informationManual. User Reference Guide. Analysis Application (EMG) Electromyography Analysis
Phone: (888) 765-9735 WWW.MINDWARETECH.COM User Reference Guide Manual Analysis Application Electromyography Analysis (EMG) Copyright 2014 by MindWare Technologies LTD. All Rights Reserved. 1 Phone: (614)
More informationGeneious Microsatellite Plugin. Biomatters Ltd
Geneious Microsatellite Plugin Biomatters Ltd November 24, 2018 2 Introduction This plugin imports ABI fragment analysis files and allows you to visualize traces, fit ladders, call peaks, predict bins,
More informationCANB7640 Practical Workshop Class 01
CANB7640 Practical Workshop Class 01 Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/6/2016 http://tanlab.ucdenver.edu/labhomepage/teaching/canb7640/
More information(Refer Slide Time: 01:12)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #22 PERL Part II We continue with our discussion on the Perl
More informationSFDR (Stratified False Discovery Rate) Software Documentation. Version 1.6 Feb 7, 2010
SFDR (Stratified False Discovery Rate) Software Documentation 1. Overview of the methods Version 1.6 Feb 7, 2010 Yun Joo Yoo, Shelley B. Bull, Andrew D.Paterson, Daryl Waggott, Lei Sun FDR, SFDR, WFDR,
More informationGenomeStudio Software Release Notes
GenomeStudio Software 2009.2 Release Notes 1. GenomeStudio Software 2009.2 Framework... 1 2. Illumina Genome Viewer v1.5...2 3. Genotyping Module v1.5... 4 4. Gene Expression Module v1.5... 6 5. Methylation
More informationLab Assignment 1 Dated: 13 th September 2011
Lab Assignment 1 Dated: 13 th September 2011 Agenda of lab session: 1. Introduction of Perl 2. Introduction of Regular expression. Things to be covered: 1. Connecting to marengo.d.umn.edu and ukko.d.umn.edu
More informationComputational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 -
Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 - Benjamin King Mount Desert Island Biological Laboratory bking@mdibl.org Overview of 4 Lectures Introduction to Computation
More informationFortunately, you only need to know 10% of what's in the main page to get 90% of the benefit. This page will show you that 10%.
NAME DESCRIPTION perlreftut - Mark's very short tutorial about references One of the most important new features in Perl 5 was the capability to manage complicated data structures like multidimensional
More informationThey grow as needed, and may be made to shrink. Officially, a Perl array is a variable whose value is a list.
Arrays Perl arrays store lists of scalar values, which may be of different types. They grow as needed, and may be made to shrink. Officially, a Perl array is a variable whose value is a list. A list literal
More informationCloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK
Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction
More informationPackage MsatAllele. February 15, 2013
Package MsatAllele February 15, 2013 Type Package Title Visualizes the scoring and binning of microsatellite fragment sizes Version 1.03 Date 2008-09-11 Author Maintainer The package
More informationCloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK
Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction
More informationSAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012
SAM / BAM Tutorial EMBL Heidelberg Course Materials Tobias Rausch September 2012 Contents 1 SAM / BAM 3 1.1 Introduction................................... 3 1.2 Tasks.......................................
More informationCOMS 3101 Programming Languages: Perl. Lecture 6
COMS 3101 Programming Languages: Perl Lecture 6 Fall 2013 Instructor: Ilia Vovsha http://www.cs.columbia.edu/~vovsha/coms3101/perl Lecture Outline Concepts: Subroutine references Symbolic references Saving
More informationMiniproject 1. Part 1 Due: 16 February. The coverage problem. Method. Why it is hard. Data. Task1
Miniproject 1 Part 1 Due: 16 February The coverage problem given an assembled transcriptome (RNA) and a reference genome (DNA) 1. 2. what fraction (in bases) of the transcriptome sequences match to annotated
More informationBeginning Perl for Bioinformatics. Steven Nevers Bioinformatics Research Group Brigham Young University
Beginning Perl for Bioinformatics Steven Nevers Bioinformatics Research Group Brigham Young University Why Use Perl? Interpreted language (quick to program) Easy to learn compared to most languages Designed
More informationChromHMM: automating chromatin-state discovery and characterization
Nature Methods ChromHMM: automating chromatin-state discovery and characterization Jason Ernst & Manolis Kellis Supplementary Figure 1 Supplementary Figure 2 Supplementary Figure 3 Supplementary Figure
More informationPerl for Biologists. Session 8. April 30, Practical examples. (/home/jarekp/perl_08) Jon Zhang
Perl for Biologists Session 8 April 30, 2014 Practical examples (/home/jarekp/perl_08) Jon Zhang Session 8: Examples CBSU Perl for Biologists 1.1 1 Review of Session 7 Regular expression: a specific pattern
More informationPerl for Biologists. Regular Expressions. Session 7. Jon Zhang. April 23, Session 7: Regular Expressions CBSU Perl for Biologists 1.
Perl for Biologists Session 7 April 23, 2014 Regular Expressions Jon Zhang Session 7: Regular Expressions CBSU Perl for Biologists 1.1 1 Review of Session 6 Each program has three default input/output
More informationPerl Scripting. Students Will Learn. Course Description. Duration: 4 Days. Price: $2295
Perl Scripting Duration: 4 Days Price: $2295 Discounts: We offer multiple discount options. Click here for more info. Delivery Options: Attend face-to-face in the classroom, remote-live or on-demand streaming.
More informationYou will be re-directed to the following result page.
ENCODE Element Browser Goal: to navigate the candidate DNA elements predicted by the ENCODE consortium, including gene expression, DNase I hypersensitive sites, TF binding sites, and candidate enhancers/promoters.
More informationMCA8000D OPTION PA INFORMATION AND INSTRUCTIONS FOR USE I. Option PA Information
MCA8000D Option PA Instructions and Information Rev A0 MCA8000D OPTION PA INFORMATION AND INSTRUCTIONS FOR USE I. Option PA Information Amptek s MCA8000D is a state-of-the-art, compact, high performance,
More informationCS 11 Ocaml track: lecture 3
CS 11 Ocaml track: lecture 3 n Today: n A (large) variety of odds and ends n Imperative programming in Ocaml Equality/inequality operators n Two inequality operators: and!= n Two equality operators:
More informationA short Introduction to UCSC Genome Browser
A short Introduction to UCSC Genome Browser Elodie Girard, Nicolas Servant Institut Curie/INSERM U900 Bioinformatics, Biostatistics, Epidemiology and computational Systems Biology of Cancer 1 Why using
More informationProgramming Perls* Objective: To introduce students to the perl language.
Programming Perls* Objective: To introduce students to the perl language. Perl is a language for getting your job done. Making Easy Things Easy & Hard Things Possible Perl is a language for easily manipulating
More informationEasy visualization of the read coverage using the CoverageView package
Easy visualization of the read coverage using the CoverageView package Ernesto Lowy European Bioinformatics Institute EMBL June 13, 2018 > options(width=40) > library(coverageview) 1 Introduction This
More informationTn-seq Explorer 1.2. User guide
Tn-seq Explorer 1.2 User guide 1. The purpose of Tn-seq Explorer Tn-seq Explorer allows users to explore and analyze Tn-seq data for prokaryotic (bacterial or archaeal) genomes. It implements a methodology
More informationAnalysis of ChIP-seq Data with mosaics Package
Analysis of ChIP-seq Data with mosaics Package Dongjun Chung 1, Pei Fen Kuan 2 and Sündüz Keleş 1,3 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics,
More informationRunning SNAP. The SNAP Team October 2012
Running SNAP The SNAP Team October 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More
More informationThe Perl Debugger. Avoiding Bugs with Warnings and Strict. Daniel Allen. Abstract
1 of 8 6/18/2006 7:36 PM The Perl Debugger Daniel Allen Abstract Sticking in extra print statements is one way to debug your Perl code, but a full-featured debugger can give you more information. Debugging
More informationVersions. Overview. OU Campus Versions Page 1 of 6
Versions Overview A unique version of a page is saved through the automatic version control system every time a page is published. A backup version of a page can also be created at will with the use of
More information2.2 - Layouts. Bforartists Reference Manual - Copyright - This page is Public Domain
2.2 - Layouts Introduction...2 Switching Layouts...2 Standard Layouts...3 3D View full...3 Animation...3 Compositing...3 Default...4 Motion Tracking...4 Scripting...4 UV Editing...5 Video Editing...5 Game
More informationMatlab OTKB GUI Manual:
Matlab OTKB GUI Manual: Preface: This is the manual for the OTKB GUI. This GUI can be used to control stage position as well as perform sensitivity and stiffness calibrations on the trap. This manual will
More informationSpotter Documentation Version 0.5, Released 4/12/2010
Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,
More informationLecture 5. Essential skills for bioinformatics: Unix/Linux
Lecture 5 Essential skills for bioinformatics: Unix/Linux UNIX DATA TOOLS Text processing with awk We have illustrated two ways awk can come in handy: Filtering data using rules that can combine regular
More informationCOMS 3101 Programming Languages: Perl. Lecture 1
COMS 3101 Programming Languages: Perl Lecture 1 Fall 2013 Instructor: Ilia Vovsha http://www.cs.columbia.edu/~vovsha/coms3101/perl What is Perl? Perl is a high level language initially developed as a scripting
More informationPerl for Biologists. Session 6 April 16, Files, directories and I/O operations. Jaroslaw Pillardy
Perl for Biologists Session 6 April 16, 2014 Files, directories and I/O operations Jaroslaw Pillardy Perl for Biologists 1.1 1 Reminder: What is a Hash? Array Hash Index Value Key Value 0 apple red fruit
More informationWeek January 27 January. From last week Arrays. Reading for this week Hashes. Files. 24 H: Hour 4 PP Ch 6:29-34, Ch7:51-52
Week 3 23 January 27 January From last week Arrays 24 H: Hour 4 PP Ch 6:29-34, Ch7:51-52 Reading for this week Hashes 24 H: Hour 7 PP Ch 6:34-37 Files 24 H: Hour 5 PP Ch 19: 163-169 Biol 59500-033 - Practical
More informationLinux Text Utilities 101 for S/390 Wizards SHARE Session 9220/5522
Linux Text Utilities 101 for S/390 Wizards SHARE Session 9220/5522 Scott D. Courtney Senior Engineer, Sine Nomine Associates March 7, 2002 http://www.sinenomine.net/ Table of Contents Concepts of the Linux
More informationImporting sequence assemblies from BAM and SAM files
BioNumerics Tutorial: Importing sequence assemblies from BAM and SAM files 1 Aim With the BioNumerics BAM import routine, a sequence assembly in BAM or SAM format can be imported in BioNumerics. A BAM
More informationOIW-EX 1000 Oil in Water Monitors
OIW-EX 1000 Oil in Water Monitors Spectrometer Handbook Document code: OIW-HBO-0005 Version: EX-002 www.advancedsensors.co.uk Tel: +44(0)28 9332 8922. FAX +44(0)28 9332 8669 Page 1 of 33 Document History
More information1. Introduction. 2. Scalar Data
1. Introduction What Does Perl Stand For? Why Did Larry Create Perl? Why Didn t Larry Just Use Some Other Language? Is Perl Easy or Hard? How Did Perl Get to Be So Popular? What s Happening with Perl Now?
More informationThe svn-multi.pl Script
The svn-multi.pl Script Martin Scharrer martin@scharrer-online.de http://latex.scharrer-online.de/svn-multi CTAN: http://tug.ctan.org/pkg/svn-multi Version 0.1a July 26, 2010 Note: This document is work
More informationIndian Institute of Technology Kharagpur. PERL Part III. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.
Indian Institute of Technology Kharagpur PERL Part III Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T. Kharagpur, INDIA Lecture 23: PERL Part III On completion, the student will be able
More informationUsing the DATAMINE Program
6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection
More informationCTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1
CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial
More informationPerl for human linkage analysis
Perl for human linkage analysis Karl W. Broman Department of Biostatistics, Johns Hopkins University http://www.biostat.jhsph.edu/ kbroman Data A set of pedigrees Family, individual, mom, dad, sex Phenotypes
More informationData Walkthrough: Background
Data Walkthrough: Background File Types FASTA Files FASTA files are text-based representations of genetic information. They can contain nucleotide or amino acid sequences. For this activity, students will
More informationChen lab workshop. Christian Frech
GBrowse Generic genome browser Chen lab workshop Christian Frech January 18, 2010 1 A generic genome browser why do we need it? Genome databases have similar requirements View DNA sequence and its associated
More informationAC109/AT109 UNIX & SHELL PROGRAMMING DEC 2014
Q.2 a. Explain the principal components: Kernel and Shell, of the UNIX operating system. Refer Page No. 22 from Textbook b. Explain absolute and relative pathnames with the help of examples. Refer Page
More informationIT441. Subroutines. (a.k.a., Functions, Methods, etc.) DRAFT. Network Services Administration
IT441 Network Services Administration Subroutines DRAFT (a.k.a., Functions, Methods, etc.) Organizing Code We have recently discussed the topic of organizing data (i.e., arrays and hashes) in order to
More informationMapping Reads to Reference Genome
Mapping Reads to Reference Genome DNA carries genetic information DNA is a double helix of two complementary strands formed by four nucleotides (bases): Adenine, Cytosine, Guanine and Thymine 2 of 31 Gene
More information5/8/2012. Exploring Utilities Chapter 5
Exploring Utilities Chapter 5 Examining the contents of files. Working with the cut and paste feature. Formatting output with the column utility. Searching for lines containing a target string with grep.
More informationCNV-seq Manual. Xie Chao. May 26, 2011
CNV-seq Manual Xie Chao May 26, 20 Introduction acgh CNV-seq Test genome X Genomic fragments Reference genome Y Test genome X Genomic fragments Reference genome Y 2 Sampling & sequencing Whole genome microarray
More informationGalaxie Report Editor
Varian, Inc. 2700 Mitchell Drive Walnut Creek, CA 94598-1675/USA Galaxie Report Editor User s Guide Varian, Inc. 2008 Printed in U.S.A. 03-914949-00: Rev 6 Galaxie Report Editor i Table of Contents Introduction...
More informationChIP-Seq Tutorial on Galaxy
1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data
More informationOutline. CS3157: Advanced Programming. Feedback from last class. Last plug
Outline CS3157: Advanced Programming Lecture #2 Jan 23 Shlomo Hershkop shlomo@cs.columbia.edu Feedback Introduction to Perl review and continued Intro to Regular expressions Reading Programming Perl pg
More informationThe Power of Perl. Perl. Perl. Change all gopher to World Wide Web in a single command
The Power of Perl Perl Change all gopher to World Wide Web in a single command perl -e s/gopher/world Wide Web/gi -p -i.bak *.html Perl can be used as a command Or like an interpreter UVic SEng 265 Daniel
More information