Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales

Size: px
Start display at page:

Download "Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales"

Transcription

1 Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales R.A. Daly 1, M.A. Borton 1, M.J. Wilkins 1,2, D.W. Hoyt 3, D.J. Kountz 1, R.A. Wolfe 1, S.A. Welch 2, D.N. Marcus 1, R.V. Trexler 4, J.D. MacRae 5, J. A. Krzycki 1, D.R. Cole 2, P.J. Mouser 4, K.C. Wrighton 1 1 Department of Microbiology, The Ohio State University, Columbus, OH, 43214, USA 2 School of Earth Sciences, The Ohio State University, Columbus, OH 43214, USA 3 EMSL, Pacific Northwest National Laboratory, Richland, WA 99352, USA 4 Department of Civil, Environmental, and Geodetic Engineering, The Ohio State University, Columbus, OH, 43214, USA 5 Department of Civil and Environmental Engineering, University of Maine, Orono, ME, 04469, USA Quality Control (Sickle) sickle pe -f R1_All.fastq -r R2_All.fastq -t sanger -o R1_All_trimmed.fastq -p R2_All_trimmed.fastq -s R1R2_All_trimmed.fastq fq2fa --merge --filter R1_All_trimmed.fastq R2_All_trimmed.fastq R1R2_All_trimmed.fa Assembly (IDBA-UD) fq2fa --merge --filter R1_All_trimmed.fastq R2_All_trimmed.fastq R1R2_All_trimmed.fa idba_ud -r R1R2_All_trimmed.fa -o idba_assembled_output Coverage calculation (Bowtie2) bowtie2-build scaffold.fa scaffold_fa bowtie2 --fast -p 33 -x scaffold_fa -S All_mappedtoall_paired.sam -1 R1_All_trimmed.fastq -2 R2_All_trimmed.fastq --un unmapped_paired.fq --al mapped_paired.fq grep -v '^@' All_mappedtoall_paired.sam awk '{count [$3]++} END { for ( j in count ) print j, "\t"count[j] }' sort -rn -t $'\t' -k2,2 > R1R2_ALL_contig_reads_paired.txt Subassemblies (using 10% and 8% of reads for T13 and T82 samples) Daly et al.- Submission for consideration at Nature Data File 3 1

2 10% T13: python /opt/scripts/bin/pullseq_random_fastq.py -i R1_All_trimmed.fastq -o R1_All_trimmed_10_percent.fastq -s 10 python /opt/scripts/bin/pullseq_random_fastq.py -i R2_All_trimmed.fastq -o R2_All_trimmed_10_percent.fastq -s 10 time fq2fa --merge --filter R1_All_trimmed_10_percent.fastq R2_All_trimmed_10_percent.fastq R1R2_All_trimmed_10_percent.fa idba_ud -r R1R2_All_trimmed_10_percent.fa -o idba_assembled_10_percent_output 8% T82: python /opt/scripts/bin/pullseq_random_fastq.py -i /home/projects/shale/marcellus_2013/sample_ /data_qc/ _tgtgaa _L003_R1_ALL_trimmed.fastq -o R1_All_trimmed_8_percent.fastq -s 12 python /opt/scripts/bin/pullseq_random_fastq.py -i /home/projects/shale/marcellus_2013/sample_ /data_qc/ _tgtgaa _L003_R2_ALL_trimmed.fastq -o R2_All_trimmed_8_percent.fastq -s 12 time fq2fa --merge --filter R1_All_trimmed_8_percent.fastq R2_All_trimmed_8_percent.fastq R1R2_All_trimmed_8_percent.fa idba_ud -r R1R2_All_trimmed_8_percent.fa -o idba_assembled_8_percent_output Annotation pullseq.py -i scaffold.fa -m o contigs_1000.fa prodigal -i contigs_1000.fa -o contigs_1000.genes -a contigs_1000.genes.faa -d contigs_1000.genes.fna -p meta -m /opt/my_interproscan_ /interproscan /interproscan.sh -i contigs_1000.genes.faa -o combined.iprscan -f TSV -dp -appl TIGRFAM,Pfam,ProSiteProfiles,ProSitePatterns -iprlookup -goterms usearch -ublast contigs_1000.genes.faa -db /ORG-Data/Database/UniRef/uniref90.udb - maxhits 1 -evalue blast6out renamed_ublast_uniref90.b6 renamed_ublast_uniref90.b6 > temp1 awk ' $12 > 60 { print $0 }' temp1 > renamed_ublast_uniref90.b6.bit_score60.b6 usearch -ublast contigs_1000.genes.faa -db /ORG-Data/Database/KEGG/kegg-allorgs_ pep.udb -maxhits 1 -evalue blast6out renamed_ublast_kegg.b6 renamed_ublast_kegg.b6 > temp2 Daly et al.- Submission for consideration at Nature Data File 3 2

3 awk ' $12 > 60 { print $0 }' temp2 > renamed_ublast_kegg.b6.bit_score60.b6 usearch -makeudb_ublast contigs_1000.genes.faa -output contigs_1000.genes.faa.udb usearch -ublast /ORG-Data/Database/UniRef/uniref90.fasta -db contigs_1000.genes.faa.udb -maxhits 1 -evalue blast6out uniref90_ublast_renamed.b6 uniref90_ublast_renamed.b6 > temp4 awk ' $12 > 300 { print $0 }' temp4 > uniref90_ublast_renamed.b6_bit_score_300 usearch -ublast /ORG-Data/Database/KEGG/kegg-all-orgs_ pep -db contigs_1000.genes.faa.udb -maxhits 1 -evalue blast6out KEGG_ublast_renamed.b6 KEGG_ublast_renamed.b6 > temp3 awk ' $12 > 300 { print $0 }' temp3 > KEGG_ublast_renamed.b6_BIT_SCORE_300 rbh.rb --forward renamed_ublast_uniref90.b6.bit_score60.b6 --reverse uniref90_ublast_renamed.b6_bit_score_300 > renamed.unirbh.txt rbh.rb --forward renamed_ublast_kegg.b6.bit_score60.b6 --reverse KEGG_ublast_renamed.b6_BIT_SCORE_300 > renamed.keggrbh.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl1.pl renamed_ublast_uniref90.b6.bit_score60.b6 > renamed_ublast_uniref90.b6.bit_score60.b6.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl1.pl renamed_ublast_kegg.b6.bit_score60.b6 > renamed_ublast_kegg.b6.bit_score60.b6.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl2.pl renamed.unirbh.txt > renamed.unirbh.txt.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl2.pl renamed.keggrbh.txt > renamed.keggrbh.txt.out1.txt Daly et al.- Submission for consideration at Nature Data File 3 3

4 perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl4_NEW.pl renamed_ublast_uniref90.b6.bit_score60.b6.out1.txt renamed_ublast_kegg.b6.bit_score60.b6.out1.txt renamed.unirbh.txt.out1.txt renamed.keggrbh.txt.out1.txt combined.iprscan > ANNOTATION_OUT_ contigs_1000.genes.faa.3.txt grep "RBH" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=kegg" > RBH_KEGG1 grep "RBH" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=uniref" >> RBH_KEGG1 grep "BLAST" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=kegg" >> RBH_KEGG1 grep "BLAST" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=uniref" >> RBH_KEGG1 grep "IPRSCAN" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt >> RBH_KEGG1 python /ORG-Data/scripts/bin/Phylogeny_Protpipe/pull_all_contig_annotations.py -i RBH_KEGG1 -o ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl6.pl ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt > T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt python /ORG-Data/scripts/bin/Phylogeny_Protpipe/write_annotation_to_fasta.py -i contigs_1000.genes.faa -o contigs_1000.genes.faa.3.4 -a ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt -j T0 python /ORG-Data/scripts/bin/Phylogeny_Protpipe/write_annotation_to_fasta.py -i contigs_1000.genes.faa -o contigs_1000.genes.fna.3.4 -a ANNOTATION_OUT_contigs_1000.genes.fna.3.txt_FINAL.txt -j T0 sed -i "s/^/t0_/g" T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt grep 'Unknown_Function' contigs_1000.genes.faa.3.4 > T0.contigs_1000.genes.faa.3.4.5_unknown_headers sed -i "s/>//g" T0.contigs_1000.genes.faa.3.4.5_unknown_headers awk -F' Unknown_Function' '{print $1 "\tf\tunknown Function"}' T0.contigs_1000.genes.faa.3.4.5_unknown_headers >> T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt Estimated genome completion (Amphora2) perl /opt/amphora2/scripts/markerscanner.pl -DNA../Marinobacter_T13_10_percent.contigs_1000.genes.fna Evalue 1e-3 Daly et al.- Submission for consideration at Nature Data File 3 4

5 perl /opt/amphora2/scripts/markeraligntrim.pl -WithReference -OutputFormat phylip perl /opt/amphora2/scripts/phylotyping.pl -CPUs 20 > Marinobacter_T13_10_percent _phylotype_1e-20.result python /opt/scripts/bin/phylogeny_protpipe/single_copy_genes_make_table.py -i Marinobacter_T13_10_percent _phylotype_1e-20.result -t Mixed 16S rrna gene reconstruction from reads (EMIRGE) longest_sequence_fastq.py -i R1_All_trimmed.fastq longest_sequence_fastq.py -i R2_All_trimmed.fastq emirge.py DIR -1../R1_All_trimmed.fastq -2../R2_All_trimmed.fastq -f /opt/emirgemaster/ssuref_111_candidate_db.fasta -b /opt/emirgemaster/ssu_candidate_db_btindex -l 114 -i 500 -s 150 -n 50 -a 20 --phred33 emirge_rename_fasta.py iter.50 > T0_renamed.fasta Identification of CRISPR repeat and spacer sequences (CRASS) crass../r1r2_all_trimmed.fa crisprtools stat -aph crass.crispr > crisprtools_stat.out crisprtools extract -o crisprtools_extract -s -xc -d -f crass.crispr cat crisprtools_extract/*_direct_repeats.fa > All_direct_repeats.fa pullseq.py -i /home/projects/shales/hilary_morrison/project_dco_wrighton/sample_kelly_wrighto n_1/r1r2_trimmed_assembled/scaffold.fa -m o contigs_5000.fa makeblastdb -in contigs_5000.fa -dbtype nucl blastn -db contigs_5000.fa -query All_T0_direct_repeats.fa -out DR_to_scaffolds_5000_blastn -outfmt 6 -num_threads 10 -evalue 1e-8 awk '{print $2 }' DR_to_scaffolds_5000_blastn > DR_Scaffolds_5000.txt pullseq_header_name.py -i contigs_5000.fa -o scaffolds_5000_bactdr.fa -n DR_Scaffolds_5000_bact.txt -e F pullseq_header_name.py -i contigs_5000.fa -o scaffolds_5000_minusbactdr.fa -n DR_Scaffolds_5000_bact.txt -e T Daly et al.- Submission for consideration at Nature Data File 3 5

6 makeblastdb -in scaffolds_5000_minusbactdr.fa -dbtype nucl cat crisprtools_extract/*_spacers.fa > All_spacers.fa blastn -db scaffolds_5000_minusbactdr.fa -query All_T0_spacers.fa -out SP_to_scaffolds_5000_blastn -outfmt 6 crass_parsing.pl All_T0_direct_repeats.fa DR_to_scaffolds_5000_blastn > crass_summary_dr.txt sed -i '/^$/d' crass_summary_dr.txt sort -u -k2,2 crass_summary_dr.txt > crass_summary_dr1.txt crass_parsing.pl All_T0_spacers.fa SP_to_scaffolds_5000_blastn > crass_summary_sp.txt sed -i '/^$/d' crass_summary_sp.txt sort -u -k2,2 crass_summary_sp.txt > crass_summary_sp1.txt Daly et al.- Submission for consideration at Nature Data File 3 6

Sequence Alignment: BLAST

Sequence Alignment: BLAST E S S E N T I A L S O F N E X T G E N E R A T I O N S E Q U E N C I N G W O R K S H O P 2015 U N I V E R S I T Y O F K E N T U C K Y A G T C Class 6 Sequence Alignment: BLAST Be able to install and use

More information

BLAST. Jon-Michael Deldin. Dept. of Computer Science University of Montana Mon

BLAST. Jon-Michael Deldin. Dept. of Computer Science University of Montana Mon BLAST Jon-Michael Deldin Dept. of Computer Science University of Montana jon-michael.deldin@mso.umt.edu 2011-09-19 Mon Jon-Michael Deldin (UM) BLAST 2011-09-19 Mon 1 / 23 Outline 1 Goals 2 Setting up your

More information

The study of microbial communities: Bioinformatics applications within the UL HPC environment

The study of microbial communities: Bioinformatics applications within the UL HPC environment The study of microbial communities: Bioinformatics applications within the UL HPC environment UL HPC school 2017 13 June 2017 Shaman Narayanasamy Eco-Systems Biology group of LCSB The subject: microbial

More information

Examining De Novo Transcriptome Assemblies via a Quality Assessment Pipeline

Examining De Novo Transcriptome Assemblies via a Quality Assessment Pipeline Examining De Novo Transcriptome Assemblies via a Quality Assessment Pipeline Noushin Ghaffari, Osama A. Arshad, Hyundoo Jeong, John Thiltges, Michael F. Criscitiello, Byung-Jun Yoon, Aniruddha Datta, Charles

More information

Practical Linux Examples

Practical Linux Examples Practical Linux Examples Processing large text file Parallelization of independent tasks Qi Sun & Robert Bukowski Bioinformatics Facility Cornell University http://cbsu.tc.cornell.edu/lab/doc/linux_examples_slides.pdf

More information

How to Run NCBI BLAST on zcluster at GACRC

How to Run NCBI BLAST on zcluster at GACRC How to Run NCBI BLAST on zcluster at GACRC BLAST: Basic Local Alignment Search Tool Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 OVERVIEW What is BLAST?

More information

Install and run external command line softwares. Yanbin Yin

Install and run external command line softwares. Yanbin Yin Install and run external command line softwares Yanbin Yin 1 Create a folder under your home called hw8 Change directory to hw8 Homework #8 Download Escherichia_coli_K_12_substr MG1655_uid57779 faa file

More information

Assessing Transcriptome Assembly

Assessing Transcriptome Assembly Assessing Transcriptome Assembly Matt Johnson July 9, 2015 1 Introduction Now that you have assembled a transcriptome, you are probably wondering about the sequence content. Are the sequences from the

More information

Command-Line Data Analysis INX_S17, Day 15,

Command-Line Data Analysis INX_S17, Day 15, Command-Line Data Analysis INX_S17, Day 15, 2017-05-12 General tool efficiency, tr, newlines, join, column Learning Outcome(s): Discuss the theory behind Unix/Linux tool efficiency, e.g., the reasons behind

More information

Omega: an Overlap-graph de novo Assembler for Metagenomics

Omega: an Overlap-graph de novo Assembler for Metagenomics Omega: an Overlap-graph de novo Assembler for Metagenomics B a h l e l H a i d e r, Ta e - H y u k A h n, B r i a n B u s h n e l l, J u a n j u a n C h a i, A l e x C o p e l a n d, C h o n g l e Pa n

More information

Introduction Into Linux Lecture 1 Johannes Werner WS 2017

Introduction Into Linux Lecture 1 Johannes Werner WS 2017 Introduction Into Linux Lecture 1 Johannes Werner WS 2017 Table of contents Introduction Operating systems Command line Programming Take home messages Introduction Lecturers Johannes Werner (j.werner@dkfz-heidelberg.de)

More information

Running Galaxy in an HPC environment requirements, challenges and some solutions : the LIFEPORTAL

Running Galaxy in an HPC environment requirements, challenges and some solutions : the LIFEPORTAL Running Galaxy in an HPC environment requirements, challenges and some solutions : the LIFEPORTAL Nikolay Vazov University Center for Information Technologies University of Oslo https://lifeportal.uio.no

More information

AMPHORA2 User Manual. An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences. COPYRIGHT 2011 by Martin Wu

AMPHORA2 User Manual. An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences. COPYRIGHT 2011 by Martin Wu AMPHORA2 User Manual An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences. COPYRIGHT 2011 by Martin Wu AMPHORA2 is free software: you may redistribute it and/or modify its

More information

Whole genome assembly comparison of duplication originally described in Bailey et al

Whole genome assembly comparison of duplication originally described in Bailey et al WGAC Whole genome assembly comparison of duplication originally described in Bailey et al. 2001. Inputs species name path to FASTA sequence(s) to be processed either a directory of chromosomal FASTA files

More information

Seminar III: R/Bioconductor

Seminar III: R/Bioconductor Leonardo Collado Torres lcollado@lcg.unam.mx Bachelor in Genomic Sciences www.lcg.unam.mx/~lcollado/ August - December, 2009 1 / 25 Class outline Working with HTS data: a simulated case study Intro R for

More information

MetaPhyler Usage Manual

MetaPhyler Usage Manual MetaPhyler Usage Manual Bo Liu boliu@umiacs.umd.edu March 13, 2012 Contents 1 What is MetaPhyler 1 2 Installation 1 3 Quick Start 2 3.1 Taxonomic profiling for metagenomic sequences.............. 2 3.2

More information

Notes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017

Notes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Notes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017

More information

Copyright 2010 Robert C. Edgar All rights reserved http://www.drive5.com/usearch robert@drive5.com Version 3.0 July 27, 2010 Table of Contents Introduction... 5 Installation... 5 UCLUST overview... 6 Searching...

More information

Sequence Analysis Pipeline

Sequence Analysis Pipeline Sequence Analysis Pipeline Transcript fragments 1. PREPROCESSING 2. ASSEMBLY (today) Removal of contaminants, vector, adaptors, etc Put overlapping sequence together and calculate bigger sequences 3. Analysis/Annotation

More information

MetAmp: a tool for Meta-Amplicon analysis User Manual

MetAmp: a tool for Meta-Amplicon analysis User Manual November 12, 2014 MetAmp: a tool for Meta-Amplicon analysis User Manual Ilya Y. Zhbannikov 1, Janet E. Williams 1, James A. Foster 1,2,3 3 Institute for Bioinformatics and Evolutionary Studies, University

More information

Genome Browser. Background and Strategy. 12 April 2010

Genome Browser. Background and Strategy. 12 April 2010 Genome Browser Background and Strategy 12 April 2010 I. Background 1. Project definition 2. Survey of genome browsers II. Strategy Alejandro Caro, Chandni Desai, Neha Gupta, Jay Humphrey, Chengwei Luo,

More information

Genomic Files. University of Massachusetts Medical School. October, 2014

Genomic Files. University of Massachusetts Medical School. October, 2014 .. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

Public Repositories Tutorial: Bulk Downloads

Public Repositories Tutorial: Bulk Downloads Public Repositories Tutorial: Bulk Downloads Almost all of the public databases, genome browsers, and other tools you have explored so far offer some form of access to rapidly download all or large chunks

More information

Helping Non-traditional HPC Users Using XSEDE Resources Efficiently

Helping Non-traditional HPC Users Using XSEDE Resources Efficiently Helping Non-traditional HPC Users Using XSEDE Resources Efficiently PIs: Robert Sean Norman (U South Carolina) Atsuko Tanaka and Chao Fu (U Wisconsin) ECSS staff: Shiquan Su National Institute of Computational

More information

Workshop Practical on concatenation and model testing

Workshop Practical on concatenation and model testing Workshop Practical on concatenation and model testing Jacob L. Steenwyk & Antonis Rokas Programs that you will use: Bash, Python, Perl, Phyutility, PartitionFinder, awk To infer a putative species phylogeny

More information

HORIZONTAL GENE TRANSFER DETECTION

HORIZONTAL GENE TRANSFER DETECTION HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all

More information

Annotating a Genome in PATRIC

Annotating a Genome in PATRIC Annotating a Genome in PATRIC The following step-by-step workflow is intended to help you learn how to navigate the new PATRIC workspace environment in order to annotate and browse your genome on the PATRIC

More information

Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:

Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software: A Tutorial: De novo RNA- Seq Assembly and Analysis Using Trinity and edger The following data and software resources are required for following the tutorial: Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat

More information

Variant calling using SAMtools

Variant calling using SAMtools Variant calling using SAMtools Calling variants - a trivial use of an Interactive Session We are going to conduct the variant calling exercises in an interactive idev session just so you can get a feel

More information

Argonne National Laboratory

Argonne National Laboratory The Use of ORACLE in Discovery of Distant Protein Sequence Similarities th Oracle Life Sciences Users Group Meeting June -, 00 Reston, VA Gyorgy Babnigg, Ph.D. Biosciences Division Protein Mapping Group

More information

Tutorial: How to use the Wheat TILLING database

Tutorial: How to use the Wheat TILLING database Tutorial: How to use the Wheat TILLING database Last Updated: 9/7/16 1. Visit http://dubcovskylab.ucdavis.edu/wheat_blast to go to the BLAST page or click on the Wheat BLAST button on the homepage. 2.

More information

FARAO Flexible All-Round Annotation Organizer. Documentation

FARAO Flexible All-Round Annotation Organizer. Documentation FARAO Flexible All-Round Annotation Organizer Documentation This is a guide on how to install and use FARAO. The software is written in Perl, is aimed for Unix-like platforms, and should work on nearly

More information

IB047. Unix Text Tools. Pavel Rychlý Mar 3.

IB047. Unix Text Tools. Pavel Rychlý Mar 3. Unix Text Tools pary@fi.muni.cz 2014 Mar 3 Unix Text Tools Tradition Unix has tools for text processing from the very beginning (1970s) Small, simple tools, each tool doing only one operation Pipe (pipeline):

More information

Useful commands in Linux and other tools for quality control. Ignacio Aguilar INIA Uruguay

Useful commands in Linux and other tools for quality control. Ignacio Aguilar INIA Uruguay Useful commands in Linux and other tools for quality control Ignacio Aguilar INIA Uruguay 05-2018 Unix Basic Commands pwd ls ll mkdir d cd d show working directory list files in working directory as before

More information

Benchmarking Computational Tools for Polymorphic Transposable Element Detection

Benchmarking Computational Tools for Polymorphic Transposable Element Detection Supplementary Information for: Benchmarking Computational Tools for Polymorphic Transposable Element Detection Lavanya Rishishwar 1,2,3,4, Leonardo Mariño-Ramírez 3,5,* and I. King Jordan 1,2,3,4,* 1 School

More information

Exercise 9: simple bash script

Exercise 9: simple bash script Exercise 9: simple bash script Write a bash script (call it blast_script.sh) to launch a BLAST search using the input data wnloaded previously and the command from the lecture blastall -p blastx -b 1 -d./databases/swissprot

More information

LING203: Corpus. March 9, 2009

LING203: Corpus. March 9, 2009 LING203: Corpus March 9, 2009 Corpus A collection of machine readable texts SJSU LLD have many corpora http://linguistics.sjsu.edu/bin/view/public/chltcorpora Each corpus has a link to a description page

More information

Taller práctico sobre uso, manejo y gestión de recursos genómicos de abril de 2013 Assembling long-read Transcriptomics

Taller práctico sobre uso, manejo y gestión de recursos genómicos de abril de 2013 Assembling long-read Transcriptomics Taller práctico sobre uso, manejo y gestión de recursos genómicos 22-24 de abril de 2013 Assembling long-read Transcriptomics Rocío Bautista Outline Introduction How assembly Tools assembling long-read

More information

USEARCH Suite and UPARSE Pipeline. Susan Huse Brown University August 7, 2015

USEARCH Suite and UPARSE Pipeline. Susan Huse Brown University August 7, 2015 USEARCH Suite and UPARSE Pipeline Susan Huse Brown University August 7, 2015 USEARCH Robert Edgar USEARCH and UCLUST Edgar (201) Bioinforma)cs 26(19) UCHIME Edgar et al. (2011) Bioinforma)cs 27(16) UPARSE

More information

Calling variants in diploid or multiploid genomes

Calling variants in diploid or multiploid genomes Calling variants in diploid or multiploid genomes Diploid genomes The initial steps in calling variants for diploid or multi-ploid organisms with NGS data are the same as what we've already seen: 1. 2.

More information

Contact: Raymond Hovey Genomics Center - SFS

Contact: Raymond Hovey Genomics Center - SFS Bioinformatics Lunch Seminar (Summer 2014) Every other Friday at noon. 20-30 minutes plus discussion Informal, ask questions anytime, start discussions Content will be based on feedback Targeted at broad

More information

NGS Data and Sequence Alignment

NGS Data and Sequence Alignment Applications and Servers SERVER/REMOTE Compute DB WEB Data files NGS Data and Sequence Alignment SSH WEB SCP Manpreet S. Katari App Aug 11, 2016 Service Terminal IGV Data files Window Personal Computer/Local

More information

Washington State University School of EECS Computer Science Course Assessment Report

Washington State University School of EECS Computer Science Course Assessment Report Washington State University School of EECS Computer Science Course Assessment Report Course Number CptS 224 Course Title Programming Tools Semesters Offered Summer Spring Instructor Andrew O'Fallon 10

More information

CS 25200: Systems Programming. Lecture 11: *nix Commands and Shell Internals

CS 25200: Systems Programming. Lecture 11: *nix Commands and Shell Internals CS 25200: Systems Programming Lecture 11: *nix Commands and Shell Internals Dr. Jef Turkstra 2018 Dr. Jeffrey A. Turkstra 1 Lecture 11 Shell commands Basic shell internals 2018 Dr. Jeffrey A. Turkstra

More information

Uploading sequences to GenBank

Uploading sequences to GenBank A primer for practical phylogenetic data gathering. Uconn EEB3899-007. Spring 2015 Session 5 Uploading sequences to GenBank Rafael Medina (rafael.medina.bry@gmail.com) Yang Liu (yang.liu@uconn.edu) confirmation

More information

Lecture 8. Sequence alignments

Lecture 8. Sequence alignments Lecture 8 Sequence alignments DATA FORMATS bioawk bioawk is a program that extends awk s powerful processing of tabular data to processing tasks involving common bioinformatics formats like FASTA/FASTQ,

More information

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

By Ludovic Duvaux (27 November 2013)

By Ludovic Duvaux (27 November 2013) Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ==========

More information

applied regex implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1

applied regex implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1 applied regex cl implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1 Is there a regular expression for every FSM? a 1

More information

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your

More information

Galaxy workshop at the Winter School Igor Makunin

Galaxy workshop at the Winter School Igor Makunin Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis

More information

PacBio SMRT Analysis 3.0 preview

PacBio SMRT Analysis 3.0 preview PacBio SMRT Analysis 3.0 preview David Alexander, Ph.D. Pacific Biosciences, Inc. FIND MEANING IN COMPLEXITY For Research Use Only. Not for use in diagnostic procedures. Copyright 2015 by Pacific Biosciences

More information

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Anas Abu-Doleh 1,2, Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 1 Department of Biomedical Informatics 2 Department of Electrical

More information

Our Task At Hand Aggregate data from every group

Our Task At Hand Aggregate data from every group Where magical things happen Our Task At Hand Aggregate data from every group That s not too bad? Make it accessible to the public Just some basic HTML? Simple enough, right? Our Real Task Manage 1 million+

More information

Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly

Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly 2 Sept Groups Group 5 was down to 3 people so I merged it into the other groups Group 1 is now 6 people anyone want to change? The initial drafter is not the official leader use any management structure

More information

Introduction to High Performance Computing (HPC) Resources at GACRC

Introduction to High Performance Computing (HPC) Resources at GACRC Introduction to High Performance Computing (HPC) Resources at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? Concept

More information

Mapping NGS reads for genomics studies

Mapping NGS reads for genomics studies Mapping NGS reads for genomics studies Valencia, 28-30 Sep 2015 BIER Alejandro Alemán aaleman@cipf.es Genomics Data Analysis CIBERER Where are we? Fastq Sequence preprocessing Fastq Alignment BAM Visualization

More information

Anthill User Group Meeting, 2015

Anthill User Group Meeting, 2015 Agenda Anthill User Group Meeting, 2015 1. Introduction to the machines and the networks 2. Accessing the machines 3. Command line introduction 4. Setting up your environment to see the queues 5. The different

More information

Manual of mirdeepfinder for EST or GSS

Manual of mirdeepfinder for EST or GSS Manual of mirdeepfinder for EST or GSS Index 1. Description 2. Requirement 2.1 requirement for Windows system 2.1.1 Perl 2.1.2 Install the module DBI 2.1.3 BLAST++ 2.2 Requirement for Linux System 2.2.1

More information

Practical: Using LAST and MEGAN to get a quick view of a metagenome

Practical: Using LAST and MEGAN to get a quick view of a metagenome Practical: Using LAST and MEGAN to get a quick view of a metagenome Daniel Lundin Linneaeus University November 14, 2014 Daniel Lundin (LNU) LAST+MEGAN practical November 14, 2014 1 / 25 A GIT archive

More information

Finding the appropriate method, with a special focus on: Mapping and alignment. Philip Clausen

Finding the appropriate method, with a special focus on: Mapping and alignment. Philip Clausen Finding the appropriate method, with a special focus on: Mapping and alignment Philip Clausen Background Most people choose their methods based on popularity and history, not by reasoning and research.

More information

Running Programs in UNIX 1 / 30

Running Programs in UNIX 1 / 30 Running Programs in UNIX 1 / 30 Outline Cmdline Running Programs in UNIX Capturing Output Using Pipes in UNIX to pass Input/Output 2 / 30 cmdline options in BASH ^ means "Control key" cancel a running

More information

Bioinformatics Services for HT Sequencing

Bioinformatics Services for HT Sequencing Bioinformatics Services for HT Sequencing Tyler Backman, Rebecca Sun, Thomas Girke December 19, 2008 Bioinformatics Services for HT Sequencing Slide 1/18 Introduction People Service Overview and Rates

More information

Miniproject 1. Part 1 Due: 16 February. The coverage problem. Method. Why it is hard. Data. Task1

Miniproject 1. Part 1 Due: 16 February. The coverage problem. Method. Why it is hard. Data. Task1 Miniproject 1 Part 1 Due: 16 February The coverage problem given an assembled transcriptome (RNA) and a reference genome (DNA) 1. 2. what fraction (in bases) of the transcriptome sequences match to annotated

More information

Part 1: Basic Commands/U3li3es

Part 1: Basic Commands/U3li3es Final Exam Part 1: Basic Commands/U3li3es May 17 th 3:00~4:00pm S-3-143 Same types of questions as in mid-term 1 2 ls, cat, echo ls -l e.g., regular file or directory, permissions, file size ls -a cat

More information

RAMMCAP The Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline

RAMMCAP The Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline RAMMCAP The Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline Weizhong Li, liwz@sdsc.edu CAMERA project (http://camera.calit2.net) Contents: 1. Introduction 2. Implementation

More information

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,

More information

HyDRA Web User Guide

HyDRA Web User Guide HyDRA Web User Guide Public Health Agency of Canada 2016-8-31 i This application was developed through collaboration between the National Laboratory for HIV Genetics and the Bioinformatics Core at the,

More information

Introduc)on to annota)on with Artemis. Download presenta.on and data

Introduc)on to annota)on with Artemis. Download presenta.on and data Introduc)on to annota)on with Artemis Download presenta.on and data Annota)on Assign an informa)on to genomic sequences???? Genome annota)on 1. Iden.fying genomic elements by: Predic)on (structural annota.on

More information

Using the Galaxy Local Bioinformatics Cloud at CARC

Using the Galaxy Local Bioinformatics Cloud at CARC Using the Galaxy Local Bioinformatics Cloud at CARC Lijing Bu Sr. Research Scientist Bioinformatics Specialist Center for Evolutionary and Theoretical Immunology (CETI) Department of Biology, University

More information

PHYLOGENOMICS WORKSHOP

PHYLOGENOMICS WORKSHOP PHYLOGENOMICS WORKSHOP This phylogenomics tutorial is divided into 3 major sections. The first section deals with identification of orthologs from closely related plasmodium species. Second section is

More information

Meraculous De Novo Assembly of the Ariolimax dolichophallus Genome. Charles Cole, Jake Houser, Kyle McGovern, and Jennie Richardson

Meraculous De Novo Assembly of the Ariolimax dolichophallus Genome. Charles Cole, Jake Houser, Kyle McGovern, and Jennie Richardson Meraculous De Novo Assembly of the Ariolimax dolichophallus Genome Charles Cole, Jake Houser, Kyle McGovern, and Jennie Richardson Meraculous Assembler Published by the US Department of Energy Joint Genome

More information

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines: Illumina MiSeq,

More information

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Lecture 3. Essential skills for bioinformatics: Unix/Linux Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,

More information

Sequence Preprocessing: A perspective

Sequence Preprocessing: A perspective Sequence Preprocessing: A perspective Dr. Matthew L. Settles Genome Center University of California, Davis settles@ucdavis.edu Why Preprocess reads We have found that aggressively cleaning and processing

More information

EBI patent related services

EBI patent related services EBI patent related services 4 th Annual Forum for SMEs October 18-19 th 2010 Jennifer McDowall Senior Scientist, EMBL-EBI EBI is an Outstation of the European Molecular Biology Laboratory. Overview Patent

More information

Database Searching Using BLAST

Database Searching Using BLAST Mahidol University Objectives SCMI512 Molecular Sequence Analysis Database Searching Using BLAST Lecture 2B After class, students should be able to: explain the FASTA algorithm for database searching explain

More information

Pandaseq Tutorial Documentation

Pandaseq Tutorial Documentation Pandaseq Tutorial Documentation Release 0.0 Adina Howe Aug 17, 2017 Contents 1 Merging paired-end Illumina reads with pandaseq 3 2 Indices and tables 5 i ii Pandaseq Tutorial Documentation, Release 0.0

More information

A generic and modular platform for automated sequence processing and annotation. Arthur Gruber

A generic and modular platform for automated sequence processing and annotation. Arthur Gruber 2 A generic and modular platform for automated sequence processing and annotation Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP 2 Sequence processing and annotation

More information

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub trinityrnaseq / RNASeq_Trinity_Tuxedo_Workshop Trinity De novo Transcriptome Assembly Workshop Brian Haas edited this page on Oct 17, 2015 14 revisions De novo RNA-Seq Assembly and Analysis Using Trinity

More information

Galaxy Platform For NGS Data Analyses

Galaxy Platform For NGS Data Analyses Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory Collaboratory Workshops Workshop Outline ü Day 1 UCLA galaxy and user account

More information

DNA sequences obtained in section were assembled and edited using DNA

DNA sequences obtained in section were assembled and edited using DNA Sequetyper DNA sequences obtained in section 4.4.1.3 were assembled and edited using DNA Baser Sequence Assembler v4 (www.dnabaser.com). The consensus sequences were used to interrogate the GenBank database

More information

KEGGscape. Release 0.8.1

KEGGscape. Release 0.8.1 KEGGscape Release 0.8.1 Oct 21, 2018 Contents 1 Installing KEGGscape 3 2 How to import KEGG pathway xml(kgml) to Cytoscape 5 2.1 Importing kgml to Cytoscape with REST endpoint...........................

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

Workplace Risk Assessment System (WRAS) User Guide

Workplace Risk Assessment System (WRAS) User Guide Workplace Risk Assessment System (WRAS) User Guide This user guide provides a step by step walkthrough on the use of WRAS. Please contact the Office of Health and Safety @ : ohs@ntu.edu.sg If you have

More information

How to use KAIKObase Version 3.1.0

How to use KAIKObase Version 3.1.0 How to use KAIKObase Version 3.1.0 Version3.1.0 29/Nov/2010 http://sgp2010.dna.affrc.go.jp/kaikobase/ Copyright National Institute of Agrobiological Sciences. All rights reserved. Outline 1. System overview

More information

Shell Programming. Introduction to Linux. Peter Ruprecht Research CU Boulder

Shell Programming. Introduction to Linux. Peter Ruprecht  Research CU Boulder Introduction to Linux Shell Programming Peter Ruprecht peter.ruprecht@colorado.edu www.rc.colorado.edu Downloadable Materials Slides and examples available at https://github.com/researchcomputing/ Final_Tutorials/

More information

Linux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015

Linux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015 Linux command line basics III: piping commands for text processing Yanbin Yin Fall 2015 1 h.p://korflab.ucdavis.edu/unix_and_perl/unix_and_perl_v3.1.1.pdf 2 The beauty of Unix for bioinformagcs sort, cut,

More information

HymenopteraMine Documentation

HymenopteraMine Documentation HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................

More information

Genome Browser. Background and Strategy

Genome Browser. Background and Strategy Genome Browser Background and Strategy Contents What is a genome browser? Purpose of a genome browser Examples Structure Extra Features Contents What is a genome browser? Purpose of a genome browser Examples

More information

ASAP - Allele-specific alignment pipeline

ASAP - Allele-specific alignment pipeline ASAP - Allele-specific alignment pipeline Jan 09, 2012 (1) ASAP - Quick Reference ASAP needs a working version of Perl and is run from the command line. Furthermore, Bowtie needs to be installed on your

More information

Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014

Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014 Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014 Deciphering the information contained in DNA sequences began decades ago since the time of Sanger sequencing.

More information

Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture

Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture Dong-hyeon Park, Jon Beaumont, Trevor Mudge University of Michigan, Ann Arbor Genomics Past Weeks ~$3 billion Human Genome

More information

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu 1 Outline What is GACRC? What is HPC Concept? What

More information

Sequencing Data. Paul Agapow 2011/02/03

Sequencing Data. Paul Agapow 2011/02/03 Webservices for Next Generation Sequencing Data Paul Agapow 2011/02/03 Aims Assumed parameters: Must have a system for non-technical users to browse and manipulate their Next Generation Sequencing (NGS)

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

Metagenome Processing and Analysis

Metagenome Processing and Analysis San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Metagenome Processing and Analysis Sheetal Gosrani Follow this and additional works at: http://scholarworks.sjsu.edu/etd_projects

More information

biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data

biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data Ilkay Al/ntas 1, Jianwu Wang 2, Daniel Crawl 1, Shweta Purawat 1 1 San Diego

More information

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University

More information