Appendix A. Example code output. Chapter 1. Chapter 3

Similar documents
Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner

by the Genevestigator program ( Darker blue color indicates higher gene expression.

HP22.1 Roth Random Primer Kit A für die RAPD-PCR

Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds

6 Anhang. 6.1 Transgene Su(var)3-9-Linien. P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,III 2 5 I,II,III,IV 3

Supplementary Table 1. Data collection and refinement statistics

TCGR: A Novel DNA/RNA Visualization Technique

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department

SUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells

warm-up exercise Representing Data Digitally goals for today proteins example from nature

Machine Learning Classifiers

Digging into acceptor splice site prediction: an iterative feature selection approach

Supplementary Materials:

Supplementary Data. Image Processing Workflow Diagram A - Preprocessing. B - Hough Transform. C - Angle Histogram (Rose Plot)

2 41L Tag- AA GAA AAA ATA AAA GCA TTA RYA GAA ATT TGT RMW GAR C K65 Tag- A AAT CCA TAC AAT ACT CCA GTA TTT GCY ATA AAG AA

A relation between trinucleotide comma-free codes and trinucleotide circular codes

Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame

LABORATORY STANDARD OPERATING PROCEDURE FOR PULSENET CODE: PNL28 MLVA OF SHIGA TOXIN-PRODUCING ESCHERICHIA COLI

Sequence Assembly. BMI/CS 576 Mark Craven Some sequencing successes

MLiB - Mandatory Project 2. Gene finding using HMMs

Efficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside

Degenerate Coding and Sequence Compacting

Supporting Information

Scalable Solutions for DNA Sequence Analysis

Structural analysis and haplotype diversity in swine LEP and MC4R genes

OFFICE OF RESEARCH AND SPONSORED PROGRAMS

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly

de Bruijn graphs for sequencing data

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization

Graph Algorithms in Bioinformatics

DNA Sequencing. Overview

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics

QuasiAlign: Position Sensitive P-Mer Frequency Clustering with Applications to Genomic Classification and Differentiation

Computational Theory MAT542 (Computational Methods in Genomics) - Part 2 & 3 -

10/15/2009 Comp 590/Comp Fall

10/8/13 Comp 555 Fall

Assembly in the Clouds

Purpose of sequence assembly

3. The object system(s)

de novo assembly Rayan Chikhi Pennsylvania State University Workshop On Genomics - Cesky Krumlov - January /73

WSSP-10 Chapter 7 BLASTN: DNA vs DNA searches

This report is based on sampled data. Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec 28 Feb 1 Mar 8 Apr 12 May 17 Ju

Detecting Superbubbles in Assembly Graphs. Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)!

How to Run NCBI BLAST on zcluster at GACRC

1. PURPOSE: to describe the standardized laboratory protocol for molecular subtyping of Salmonella enterica serotype Enteritidis.

DNA Fragment Assembly

Algorithms for Bioinformatics

Global Alignment. Algorithms in BioInformatics Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) Daimi, University of Aarhus September 2004

Sequence Assembly Required!

software.sci.utah.edu (Select Visitors)

HPE Security Data Security. HPE SecureData. Product Lifecycle Status. End of Support Dates. Date: April 20, 2017 Version:

Assignment 2. Summary. Some Important bash Instructions. CSci132 Practical UNIX and Programming Assignment 2, Fall Prof.

A Novel Implementation of an Extended 8x8 Playfair Cipher Using Interweaving on DNA-encoded Data

Genome 373: Genome Assembly. Doug Fowler

ICT PROFESSIONAL MICROSOFT OFFICE SCHEDULE MIDRAND

Problem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example.

debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA

EECS2301. Example. Testing 3/22/2017. Linux/Unix Part 3. for SCRIPT in /path/to/scripts/dir/* do if [ -f $SCRIPT -a -x $SCRIPT ] then $SCRIPT fi done

Sequencing. Short Read Alignment. Sequencing. Paired-End Sequencing 6/10/2010. Tobias Rausch 7 th June 2010 WGS. ChIP-Seq. Applied Biosystems.

Graph Algorithms in Bioinformatics

GATB programming day

DELAMANID SUSCEPTIBILITY TESTING IN AN AUTOMATED LIQUID CULTURE SYSTEM

Asks for clarification of whether a GOP must communicate to a TOP that a generator is in manual mode (no AVR) during start up or shut down.

Multiple Sequence Alignment. With thanks to Eric Stone and Steffen Heber, North Carolina State University

Read Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition


CSE 341 Section Handout #6 Cheat Sheet

Annex A to the DVD-R Disc and DVD-RW Disc Patent License Agreement Essential Sony Patents relevant to DVD-RW Disc

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Polycom Advantage Service Endpoint Utilization Report

All King County Summary Report

Algorithms and Data Structures

Polycom Advantage Service Endpoint Utilization Report

Parallel de novo Assembly of Complex (Meta) Genomes via HipMer

Computational Architecture of Cloud Environments Michael Schatz. April 1, 2010 NHGRI Cloud Computing Workshop

Omega Engineering Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

UAE PUBLIC TRAINING CALENDAR

Genome 373: Intro to Python I. Doug Fowler

Solutions Exercise Set 3 Author: Charmi Panchal

Eulerian Tours and Fleury s Algorithm

Grade 4 Mathematics Pacing Guide

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

Fluidity Trader Historical Data for Ensign Software Playback

Section 1.2: What is a Function? y = 4x

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

SCI - software.sci.utah.edu (Select Visitors)

If Case Loop Next Exit

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

High Performance Computing

Sand Pit Utilization

Graphs and Puzzles. Eulerian and Hamiltonian Tours.

GWDG Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

NOSEP: Non-Overlapping Sequence Pattern Mining with Gap Constraints

CMIS 102 Hands-On Lab

IKS Service GmbH - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

COURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760 - SAP Hybris Marketing Cloud

Similarly, example of memory address of array and hash are: ARRAY(0x1a31d44) and HASH(0x80f6c6c) respectively.

Transcription:

Appendix A Example code output This is a compilation of output from selected examples. Some of these examples requires exernal input from e.g. STDIN, for such examples the interaction with the program is also shown. Chapter 1 Hello World hello.pl (page 9) hello.pl (page 9) Chapter 3 hsort.pl (page 54) http://cbbp.thep.lu.se http://www.biology.lu.se/education/courses/advanced-courses/bioinformatics-and-sequence-analysis-75 http://www.createhealth.lth.se/ http://www.lumi.lu.se http://www.lu.se/ mailto:mattias@thep.lu.se /~mattias/images/category_holiday_16.png /~mattias/images/working-16.png /~mattias/include/style.css publications/ teaching/binp13 teaching/fytn06 hsort.pl (page 54) 147

148 APPENDIX A. EXAMPLE CODE OUTPUT Chapter 4 splitfasta.pl (page 58) This FastA sequence comes from the species RATTUS NORVEGICUS and has accession number Q62671. splitfasta.pl (page 58) splitwhite.pl (page 59) Array numbers contain: 1 2 5 6 8 splitwhite.pl (page 59) map1.pl (page 62) JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC map1.pl (page 62) map2.pl (page 62) 1.73205080756888 3 3.16227766016838 map2.pl (page 62) $it = 11 $it = 5.5 subend.pl (page 66) subend.pl (page 66) The value was: 11 The value was: 5.5 subpar.pl (page 67) subpar.pl (page 67) The sum is: 9 The sum is: -1 The sum is: 5 subsum.pl (page 67) subsum.pl (page 67) The sum is: 10 The sum is: 10 subsums.pl (page 68) subsums.pl (page 68) The sum is: 10 Half the sum is: 5 sumfn.pl (page 68) sumfn.pl (page 68)

149 The sum of the 5 elements is: 10 sumnfn.pl (page 69) sumnfn.pl (page 69) sumnfn2.pl (page 69) The first list had 3 elements and the sum was: 18 The second list had 3 elements and the sum was: 18 sumnfn2.pl (page 69) sumnfn3.pl (page 70) The first list had 5 elements and the sum was: 10 The second list had 3 elements and the sum was: 18 sumnfn3.pl (page 70) 0 1 2 3 4 Sum = 15, Average = 2.5 Sum = 15, Average = 2.5 popchop.pl (page 72) popchop.pl (page 72) sum.pl (page 81) sum.pl (page 81) sum2.pl (page 81) sum2.pl (page 81) Chapter 5 A: GCA GCC GCG GCT C: TGC TGT D: GAC GAT E: GAA GAG F: TTC TTT G: GGA GGC GGG GGT H: CAC CAT I: ATA ATC ATT K: AAA AAG L: CTA CTC CTG CTT TTA TTG M: ATG N: AAC AAT P: CCA CCC CCG CCT Q: CAA CAG R: AGA AGG CGA CGC CGG CGT S: AGC AGT TCA TCC TCG TCT T: ACA ACC ACG ACT V: GTA GTC GTG GTT W: TGG Y: TAC TAT ref3.pl (page 98) ref3.pl (page 98)

150 APPENDIX A. EXAMPLE CODE OUTPUT Chapter 6 2 5 6-7 9 1 4-6 7 10 0 2 3 3 0 1 1 4 2 8 matrix1.pl (page 114) matrix1.pl (page 114) bp ex4.pl (page 138) ======= ID: ref XP_005022418.1 ======= ======= ID: ref XP_006137248.1 ======= +LDAPSQIEV+DVTDTTALITWFKPLAEID +EL+YG KDVPGDRTTIDL+EDE+QYSIG Subj: 804 KLDAPSQIEVRDVTDTTALITWFKPLAEIDDMELSYGPKDVPGDRTTIDLSEDESQYSIG 863 NLKP TEYEV+LISRRGDM+S+P KETF T Subj: 804 NLKPHTEYEVTLISRRGDMTSDPVKETFVT 833 ======= ID: ref XP_005022417.1 ======= ======= ID: ref XP_005022415.1 ======= ======= ID: ref XP_005022416.1 =======

151 ======= ID: gb EOA99266.1 ======= ======= ID: ref XP_005022414.1 ======= ======= ID: gb KFQ40359.1 ======= Subj: 454 KLDAPSQIEAKDVTDTTALITWSKPLAEIEGIELTYGPKDVPGDRTTIDLSEDENQYSIG 513 Subj: 454 NLRPHTEYEVTLISRRGDMESDPMKEVFVT 483 ======= ID: ref XP_005498448.1 ======= => Identities: 83.33% +LDAPSQIE KDVTDTTALITW KPLA+I+GIELTYG KDVPGDRTTIDL+EDENQYSIG Subj: 741 KLDAPSQIEAKDVTDTTALITWSKPLADIEGIELTYGPKDVPGDRTTIDLSEDENQYSIG 800 Subj: 741 NLRPHTEYEVTLISRRGDMESDPMKEVFVT 770 ======= ID: gb KFQ87172.1 ======= Subj: 456 KLDAPSQIEAKDVTDTTALITWSKPLAEIEGIELTYGPKDVPGDRTTIDLSEDENQYSIG 515 Subj: 456 NLRPHTEYEVTLISRRGDMESDPMKEVFVT 485

152 APPENDIX A. EXAMPLE CODE OUTPUT bp ex4.pl (page 138) bp ex5.pl (page 140) Database: All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding environmental s Query Name: 1TEN:_ ======= ID: ref XP_003264097.2 ======= Subj: 802 RLDAPSQVEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIG 861 Subj: 802 831 ======= ID: ref XP_009186466.1 ======= Subj: 802 RLDAPSQMEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIG 861 Subj: 802 831 ======= ID: ref XP_009186465.1 ======= Subj: 802 RLDAPSQMEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIG 861 Subj: 802 831 ======= ID: ref XP_007966465.1 ======= Subj: 802 RLDAPSQMEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIG 861 Subj: 802 831 ======= ID: ref XP_007966464.1 =======

153 Subj: 802 RLDAPSQMEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIG 861 Subj: 802 831 ======= ID: ref XP_007966462.1 ======= Subj: 802 RLDAPSQMEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIG 861 Subj: 802 831 bp ex5.pl (page 140) bp ex7.pl (page 142) CLUSTAL W (1.81) multiple sequence alignment -----------------------MEDEIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGR -----------------------MDDEIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGR -----------------------MEDEVASLVVDNGSGMCKAGFAGDDAPRAVFPSIVGR MDLRATSSNDSNATSGYSDTAAVDWDEGENATGSGSLPDPELSYQIITSLFLGALILCSI -------MEGTNNTTGWT-----HFDSTSNRTSKSFDEEVKLSYQVVTSFLLGALILCSI -------MELDNNSLDYFSSN--FTDIPSNTTVAHWTEATLLGLQISVSVVLAIVTLATM *.. : : PRHQGVMVGMGQK-------DSYVGDEAQS--KRGILTLKYPIEHGIVTNWDDMEKIWHH PRHQGVMVGMGQK-------DSYVGDEAQS--KRGILTLKYPIEHGIVTNWDDMEKIWHH PRHQGVMVGMGQK-------DSYVGDEAQS--KRGILTLKYPIEHGIVTNWDDMEKIWHH FGNSCVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQDICDL FGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNRWTLGQIPCDI LSNAFVIATIFLTRKLHTPANFLIGSLAVTDMLVSILVMPISIVYTVSKTWSLGQIVCDI : *:. : : :*. * :.:*.:. : :. * :. TFYNELRVAPEEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRT TFYNELRVAPEEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRT TFYNELRVAPEEHPVLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRT FIALDVLCCTSSILHLCAIALDRYWAITDPIDYVNKRTPRRAAVLISVTWLIGFSISIPP FISLDMLCCTSSILHLCVIALDRYWAITEPIDYMKKRTPRRAAVLISVTWLVGFSISIPP WLSSDITFCTASILHLCVIALDRYWAITDALEYSKRRTMRRAAVMVAVVWVISISISMPP : ::... *. *. * : :.*..*: ::: ::.: *. TGIVMDS----GDGVTHTVPIYEGYALPHAILRLDLAGRDLTDYLMKILTERGYSFTTTA

154 APPENDIX A. EXAMPLE CODE OUTPUT TGIVMDS----GDGVTHTVPIYEGYALPHAILRLDLAGRDLTDYLMKILTERGYSFTTTA TGIVMDS----GDGVTHTVPIYEGYALPHAILRLDLAGRDLTDYLMKILTERGYSFTTTA MLGWRS-----AEDRANPDACIISQDPG-YTIYSTFGAFYIPLILMLVLYGRIFKAARFR MLIMRSQPSSMAEDRANSKQCKITQDPW-YTIYSTFGAFYIPLTLMLVLYGRIFKAARFR LFWRQ------AKAHEELKECMVNTDQISYTLYSTFGAFYVPTVLLIILYGRIYVAARSR... : :.. :. *: :* * : : EREIVRDIKEK------LCYVALDFEQEMGTAASSSSLEKSYELPD------------GQ EREIVRDIKEK------LCYVALDFEQEMGTAASSSSLEKSYELPD------------GQ EREIVRDIKEK------LCYVALDFEQEMATASSSSSLEKSYELPD------------GQ IRKTVKKTEKAKASDMCLTLSPAVFHKRANGDAVSAEWKRGYKFKPSSPCANGAVRHGEE IRRTVRKTEKKKVSDTCLALSPAMFHRKTPGDAHGKSWKRSVEPRP-LPNVNGAVKHAGE IFKTPSYSGKR--------FTTAQLIQTSAGSSLCS--------------LNSASNQEAH. :. : : :. VITIGNERFRCPEALFQPSFLGMESCGIHETTYNSIMKCDVDIRKDLYANTVLSGGTTMY VITIGNERFRCPEALFQPSFLGMESCGIHETTFNSIMKCDVDIRKDLYANTVLSGGTTMY VITIGNERFRCPEALFQPSFLGMESSGIHETTYNSIMKCDVDIRKDLYANTVLSGGTTMY MESLEIIEVNSNSKTHLPLPNTP-QSSSHENINEKTTGTRRKIALARERKTVKTLGIIMG GESLDIIEVQSNSRCNLPLPNTPGTVPLFENRHEKATETKRKIALARERKTVKTLGIIMG LHSGAGGEGGGS-----PLFVNSVKVKLADNVLE-----RKRLCAARERKATKTLGIILG :. * :. : : ::. : * : PGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDESG PGIADRMQKEITSLAPTTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDESG PGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDESG TFIFCWLPFFIVALVLPFCAENCYMPEWLGAVINWLGYSNSLLNPIIYAYFN-KDFQSAF TFILCWLPFFIVALVMPFCQESCFMPHWLKDVINWLGYSNSLLNPIIYAYFN-KDFQSAF AFIICWLPFFVVTLVWAICKECSFDP-LLFDVFTWLGYLNSLINPVIYTVFN-DEFKQAF. * : :.:*.. *. * : :... :..::..: PSIVHRKCF-- PSIVHRKCF-- PSIVHRKCF-- KKILRCKFHRH KKIIKCHFCRA QKLIK--FRR-.::: bp ex7.pl (page 142)

Appendix B Common Perl mistakes This is taken from the document Introduction to Perl 1 1. Testing all-at-once instead of incrementally, either bottom-up or top-down. 2. Optimistically skipping print scaffolding to dump values and show progress. 3. Not running with the perl -w switch to catch obvious typographical errors. 4. Leaving off $ or @ or % from the front of a variable, or omitting & when invoking a subroutine in Perl 4. 5. Forgetting the trailing semicolon. 6. Forgetting curly braces around a block. 7. Unbalanced (), {}, [],,,, and sometimes <>. 8. Mixing and, or / and \. 9. Using == instead of eq,!= instead of ne, = instead of ==, etc. ( White == Black ) and ($x = 5) evaluate as (0 == 0) and (5) and thus are true! 10. Using else if instead of elsif. 11. Not chopping the output of backquotes date or not chopping input: print "Enter y to proceed: "; $ans = <STDIN>; chop $ans; if ($ans eq y ) { print "You said y\n";} else { print "You did not say y \n";} 12. Putting a comma after the file handle in a print statement. 1 http://www.cclabs.missouri.edu/things/instruction/perl/perlcourse.html 155

156 APPENDIX B. COMMON PERL MISTAKES 13. Forgetting that Perl array subscripts and string indexes normally start at 0, not 1. 14. Using $, $1, or other side-effect variables, then modifying the code in a way that unknowingly affects or is affected by these. 15. Forgetting that regular expressions are greedy, seeking the longest match not the shortest match.