Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales
|
|
- Sara Gregory
- 6 years ago
- Views:
Transcription
1 Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales R.A. Daly 1, M.A. Borton 1, M.J. Wilkins 1,2, D.W. Hoyt 3, D.J. Kountz 1, R.A. Wolfe 1, S.A. Welch 2, D.N. Marcus 1, R.V. Trexler 4, J.D. MacRae 5, J. A. Krzycki 1, D.R. Cole 2, P.J. Mouser 4, K.C. Wrighton 1 1 Department of Microbiology, The Ohio State University, Columbus, OH, 43214, USA 2 School of Earth Sciences, The Ohio State University, Columbus, OH 43214, USA 3 EMSL, Pacific Northwest National Laboratory, Richland, WA 99352, USA 4 Department of Civil, Environmental, and Geodetic Engineering, The Ohio State University, Columbus, OH, 43214, USA 5 Department of Civil and Environmental Engineering, University of Maine, Orono, ME, 04469, USA Quality Control (Sickle) sickle pe -f R1_All.fastq -r R2_All.fastq -t sanger -o R1_All_trimmed.fastq -p R2_All_trimmed.fastq -s R1R2_All_trimmed.fastq fq2fa --merge --filter R1_All_trimmed.fastq R2_All_trimmed.fastq R1R2_All_trimmed.fa Assembly (IDBA-UD) fq2fa --merge --filter R1_All_trimmed.fastq R2_All_trimmed.fastq R1R2_All_trimmed.fa idba_ud -r R1R2_All_trimmed.fa -o idba_assembled_output Coverage calculation (Bowtie2) bowtie2-build scaffold.fa scaffold_fa bowtie2 --fast -p 33 -x scaffold_fa -S All_mappedtoall_paired.sam -1 R1_All_trimmed.fastq -2 R2_All_trimmed.fastq --un unmapped_paired.fq --al mapped_paired.fq grep -v '^@' All_mappedtoall_paired.sam awk '{count [$3]++} END { for ( j in count ) print j, "\t"count[j] }' sort -rn -t $'\t' -k2,2 > R1R2_ALL_contig_reads_paired.txt Subassemblies (using 10% and 8% of reads for T13 and T82 samples) Daly et al.- Submission for consideration at Nature Data File 3 1
2 10% T13: python /opt/scripts/bin/pullseq_random_fastq.py -i R1_All_trimmed.fastq -o R1_All_trimmed_10_percent.fastq -s 10 python /opt/scripts/bin/pullseq_random_fastq.py -i R2_All_trimmed.fastq -o R2_All_trimmed_10_percent.fastq -s 10 time fq2fa --merge --filter R1_All_trimmed_10_percent.fastq R2_All_trimmed_10_percent.fastq R1R2_All_trimmed_10_percent.fa idba_ud -r R1R2_All_trimmed_10_percent.fa -o idba_assembled_10_percent_output 8% T82: python /opt/scripts/bin/pullseq_random_fastq.py -i /home/projects/shale/marcellus_2013/sample_ /data_qc/ _tgtgaa _L003_R1_ALL_trimmed.fastq -o R1_All_trimmed_8_percent.fastq -s 12 python /opt/scripts/bin/pullseq_random_fastq.py -i /home/projects/shale/marcellus_2013/sample_ /data_qc/ _tgtgaa _L003_R2_ALL_trimmed.fastq -o R2_All_trimmed_8_percent.fastq -s 12 time fq2fa --merge --filter R1_All_trimmed_8_percent.fastq R2_All_trimmed_8_percent.fastq R1R2_All_trimmed_8_percent.fa idba_ud -r R1R2_All_trimmed_8_percent.fa -o idba_assembled_8_percent_output Annotation pullseq.py -i scaffold.fa -m o contigs_1000.fa prodigal -i contigs_1000.fa -o contigs_1000.genes -a contigs_1000.genes.faa -d contigs_1000.genes.fna -p meta -m /opt/my_interproscan_ /interproscan /interproscan.sh -i contigs_1000.genes.faa -o combined.iprscan -f TSV -dp -appl TIGRFAM,Pfam,ProSiteProfiles,ProSitePatterns -iprlookup -goterms usearch -ublast contigs_1000.genes.faa -db /ORG-Data/Database/UniRef/uniref90.udb - maxhits 1 -evalue blast6out renamed_ublast_uniref90.b6 renamed_ublast_uniref90.b6 > temp1 awk ' $12 > 60 { print $0 }' temp1 > renamed_ublast_uniref90.b6.bit_score60.b6 usearch -ublast contigs_1000.genes.faa -db /ORG-Data/Database/KEGG/kegg-allorgs_ pep.udb -maxhits 1 -evalue blast6out renamed_ublast_kegg.b6 renamed_ublast_kegg.b6 > temp2 Daly et al.- Submission for consideration at Nature Data File 3 2
3 awk ' $12 > 60 { print $0 }' temp2 > renamed_ublast_kegg.b6.bit_score60.b6 usearch -makeudb_ublast contigs_1000.genes.faa -output contigs_1000.genes.faa.udb usearch -ublast /ORG-Data/Database/UniRef/uniref90.fasta -db contigs_1000.genes.faa.udb -maxhits 1 -evalue blast6out uniref90_ublast_renamed.b6 uniref90_ublast_renamed.b6 > temp4 awk ' $12 > 300 { print $0 }' temp4 > uniref90_ublast_renamed.b6_bit_score_300 usearch -ublast /ORG-Data/Database/KEGG/kegg-all-orgs_ pep -db contigs_1000.genes.faa.udb -maxhits 1 -evalue blast6out KEGG_ublast_renamed.b6 KEGG_ublast_renamed.b6 > temp3 awk ' $12 > 300 { print $0 }' temp3 > KEGG_ublast_renamed.b6_BIT_SCORE_300 rbh.rb --forward renamed_ublast_uniref90.b6.bit_score60.b6 --reverse uniref90_ublast_renamed.b6_bit_score_300 > renamed.unirbh.txt rbh.rb --forward renamed_ublast_kegg.b6.bit_score60.b6 --reverse KEGG_ublast_renamed.b6_BIT_SCORE_300 > renamed.keggrbh.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl1.pl renamed_ublast_uniref90.b6.bit_score60.b6 > renamed_ublast_uniref90.b6.bit_score60.b6.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl1.pl renamed_ublast_kegg.b6.bit_score60.b6 > renamed_ublast_kegg.b6.bit_score60.b6.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl2.pl renamed.unirbh.txt > renamed.unirbh.txt.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl2.pl renamed.keggrbh.txt > renamed.keggrbh.txt.out1.txt Daly et al.- Submission for consideration at Nature Data File 3 3
4 perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl4_NEW.pl renamed_ublast_uniref90.b6.bit_score60.b6.out1.txt renamed_ublast_kegg.b6.bit_score60.b6.out1.txt renamed.unirbh.txt.out1.txt renamed.keggrbh.txt.out1.txt combined.iprscan > ANNOTATION_OUT_ contigs_1000.genes.faa.3.txt grep "RBH" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=kegg" > RBH_KEGG1 grep "RBH" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=uniref" >> RBH_KEGG1 grep "BLAST" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=kegg" >> RBH_KEGG1 grep "BLAST" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=uniref" >> RBH_KEGG1 grep "IPRSCAN" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt >> RBH_KEGG1 python /ORG-Data/scripts/bin/Phylogeny_Protpipe/pull_all_contig_annotations.py -i RBH_KEGG1 -o ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl6.pl ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt > T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt python /ORG-Data/scripts/bin/Phylogeny_Protpipe/write_annotation_to_fasta.py -i contigs_1000.genes.faa -o contigs_1000.genes.faa.3.4 -a ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt -j T0 python /ORG-Data/scripts/bin/Phylogeny_Protpipe/write_annotation_to_fasta.py -i contigs_1000.genes.faa -o contigs_1000.genes.fna.3.4 -a ANNOTATION_OUT_contigs_1000.genes.fna.3.txt_FINAL.txt -j T0 sed -i "s/^/t0_/g" T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt grep 'Unknown_Function' contigs_1000.genes.faa.3.4 > T0.contigs_1000.genes.faa.3.4.5_unknown_headers sed -i "s/>//g" T0.contigs_1000.genes.faa.3.4.5_unknown_headers awk -F' Unknown_Function' '{print $1 "\tf\tunknown Function"}' T0.contigs_1000.genes.faa.3.4.5_unknown_headers >> T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt Estimated genome completion (Amphora2) perl /opt/amphora2/scripts/markerscanner.pl -DNA../Marinobacter_T13_10_percent.contigs_1000.genes.fna Evalue 1e-3 Daly et al.- Submission for consideration at Nature Data File 3 4
5 perl /opt/amphora2/scripts/markeraligntrim.pl -WithReference -OutputFormat phylip perl /opt/amphora2/scripts/phylotyping.pl -CPUs 20 > Marinobacter_T13_10_percent _phylotype_1e-20.result python /opt/scripts/bin/phylogeny_protpipe/single_copy_genes_make_table.py -i Marinobacter_T13_10_percent _phylotype_1e-20.result -t Mixed 16S rrna gene reconstruction from reads (EMIRGE) longest_sequence_fastq.py -i R1_All_trimmed.fastq longest_sequence_fastq.py -i R2_All_trimmed.fastq emirge.py DIR -1../R1_All_trimmed.fastq -2../R2_All_trimmed.fastq -f /opt/emirgemaster/ssuref_111_candidate_db.fasta -b /opt/emirgemaster/ssu_candidate_db_btindex -l 114 -i 500 -s 150 -n 50 -a 20 --phred33 emirge_rename_fasta.py iter.50 > T0_renamed.fasta Identification of CRISPR repeat and spacer sequences (CRASS) crass../r1r2_all_trimmed.fa crisprtools stat -aph crass.crispr > crisprtools_stat.out crisprtools extract -o crisprtools_extract -s -xc -d -f crass.crispr cat crisprtools_extract/*_direct_repeats.fa > All_direct_repeats.fa pullseq.py -i /home/projects/shales/hilary_morrison/project_dco_wrighton/sample_kelly_wrighto n_1/r1r2_trimmed_assembled/scaffold.fa -m o contigs_5000.fa makeblastdb -in contigs_5000.fa -dbtype nucl blastn -db contigs_5000.fa -query All_T0_direct_repeats.fa -out DR_to_scaffolds_5000_blastn -outfmt 6 -num_threads 10 -evalue 1e-8 awk '{print $2 }' DR_to_scaffolds_5000_blastn > DR_Scaffolds_5000.txt pullseq_header_name.py -i contigs_5000.fa -o scaffolds_5000_bactdr.fa -n DR_Scaffolds_5000_bact.txt -e F pullseq_header_name.py -i contigs_5000.fa -o scaffolds_5000_minusbactdr.fa -n DR_Scaffolds_5000_bact.txt -e T Daly et al.- Submission for consideration at Nature Data File 3 5
6 makeblastdb -in scaffolds_5000_minusbactdr.fa -dbtype nucl cat crisprtools_extract/*_spacers.fa > All_spacers.fa blastn -db scaffolds_5000_minusbactdr.fa -query All_T0_spacers.fa -out SP_to_scaffolds_5000_blastn -outfmt 6 crass_parsing.pl All_T0_direct_repeats.fa DR_to_scaffolds_5000_blastn > crass_summary_dr.txt sed -i '/^$/d' crass_summary_dr.txt sort -u -k2,2 crass_summary_dr.txt > crass_summary_dr1.txt crass_parsing.pl All_T0_spacers.fa SP_to_scaffolds_5000_blastn > crass_summary_sp.txt sed -i '/^$/d' crass_summary_sp.txt sort -u -k2,2 crass_summary_sp.txt > crass_summary_sp1.txt Daly et al.- Submission for consideration at Nature Data File 3 6
Sequence Alignment: BLAST
E S S E N T I A L S O F N E X T G E N E R A T I O N S E Q U E N C I N G W O R K S H O P 2015 U N I V E R S I T Y O F K E N T U C K Y A G T C Class 6 Sequence Alignment: BLAST Be able to install and use
More informationBLAST. Jon-Michael Deldin. Dept. of Computer Science University of Montana Mon
BLAST Jon-Michael Deldin Dept. of Computer Science University of Montana jon-michael.deldin@mso.umt.edu 2011-09-19 Mon Jon-Michael Deldin (UM) BLAST 2011-09-19 Mon 1 / 23 Outline 1 Goals 2 Setting up your
More informationThe study of microbial communities: Bioinformatics applications within the UL HPC environment
The study of microbial communities: Bioinformatics applications within the UL HPC environment UL HPC school 2017 13 June 2017 Shaman Narayanasamy Eco-Systems Biology group of LCSB The subject: microbial
More informationExamining De Novo Transcriptome Assemblies via a Quality Assessment Pipeline
Examining De Novo Transcriptome Assemblies via a Quality Assessment Pipeline Noushin Ghaffari, Osama A. Arshad, Hyundoo Jeong, John Thiltges, Michael F. Criscitiello, Byung-Jun Yoon, Aniruddha Datta, Charles
More informationPractical Linux Examples
Practical Linux Examples Processing large text file Parallelization of independent tasks Qi Sun & Robert Bukowski Bioinformatics Facility Cornell University http://cbsu.tc.cornell.edu/lab/doc/linux_examples_slides.pdf
More informationHow to Run NCBI BLAST on zcluster at GACRC
How to Run NCBI BLAST on zcluster at GACRC BLAST: Basic Local Alignment Search Tool Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 OVERVIEW What is BLAST?
More informationInstall and run external command line softwares. Yanbin Yin
Install and run external command line softwares Yanbin Yin 1 Create a folder under your home called hw8 Change directory to hw8 Homework #8 Download Escherichia_coli_K_12_substr MG1655_uid57779 faa file
More informationAssessing Transcriptome Assembly
Assessing Transcriptome Assembly Matt Johnson July 9, 2015 1 Introduction Now that you have assembled a transcriptome, you are probably wondering about the sequence content. Are the sequences from the
More informationCommand-Line Data Analysis INX_S17, Day 15,
Command-Line Data Analysis INX_S17, Day 15, 2017-05-12 General tool efficiency, tr, newlines, join, column Learning Outcome(s): Discuss the theory behind Unix/Linux tool efficiency, e.g., the reasons behind
More informationOmega: an Overlap-graph de novo Assembler for Metagenomics
Omega: an Overlap-graph de novo Assembler for Metagenomics B a h l e l H a i d e r, Ta e - H y u k A h n, B r i a n B u s h n e l l, J u a n j u a n C h a i, A l e x C o p e l a n d, C h o n g l e Pa n
More informationIntroduction Into Linux Lecture 1 Johannes Werner WS 2017
Introduction Into Linux Lecture 1 Johannes Werner WS 2017 Table of contents Introduction Operating systems Command line Programming Take home messages Introduction Lecturers Johannes Werner (j.werner@dkfz-heidelberg.de)
More informationRunning Galaxy in an HPC environment requirements, challenges and some solutions : the LIFEPORTAL
Running Galaxy in an HPC environment requirements, challenges and some solutions : the LIFEPORTAL Nikolay Vazov University Center for Information Technologies University of Oslo https://lifeportal.uio.no
More informationAMPHORA2 User Manual. An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences. COPYRIGHT 2011 by Martin Wu
AMPHORA2 User Manual An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences. COPYRIGHT 2011 by Martin Wu AMPHORA2 is free software: you may redistribute it and/or modify its
More informationWhole genome assembly comparison of duplication originally described in Bailey et al
WGAC Whole genome assembly comparison of duplication originally described in Bailey et al. 2001. Inputs species name path to FASTA sequence(s) to be processed either a directory of chromosomal FASTA files
More informationSeminar III: R/Bioconductor
Leonardo Collado Torres lcollado@lcg.unam.mx Bachelor in Genomic Sciences www.lcg.unam.mx/~lcollado/ August - December, 2009 1 / 25 Class outline Working with HTS data: a simulated case study Intro R for
More informationMetaPhyler Usage Manual
MetaPhyler Usage Manual Bo Liu boliu@umiacs.umd.edu March 13, 2012 Contents 1 What is MetaPhyler 1 2 Installation 1 3 Quick Start 2 3.1 Taxonomic profiling for metagenomic sequences.............. 2 3.2
More informationNotes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Notes for installing a local blast+ instance of NCBI BLAST F. J. Pineda 09/25/2017
More informationCopyright 2010 Robert C. Edgar All rights reserved http://www.drive5.com/usearch robert@drive5.com Version 3.0 July 27, 2010 Table of Contents Introduction... 5 Installation... 5 UCLUST overview... 6 Searching...
More informationSequence Analysis Pipeline
Sequence Analysis Pipeline Transcript fragments 1. PREPROCESSING 2. ASSEMBLY (today) Removal of contaminants, vector, adaptors, etc Put overlapping sequence together and calculate bigger sequences 3. Analysis/Annotation
More informationMetAmp: a tool for Meta-Amplicon analysis User Manual
November 12, 2014 MetAmp: a tool for Meta-Amplicon analysis User Manual Ilya Y. Zhbannikov 1, Janet E. Williams 1, James A. Foster 1,2,3 3 Institute for Bioinformatics and Evolutionary Studies, University
More informationGenome Browser. Background and Strategy. 12 April 2010
Genome Browser Background and Strategy 12 April 2010 I. Background 1. Project definition 2. Survey of genome browsers II. Strategy Alejandro Caro, Chandni Desai, Neha Gupta, Jay Humphrey, Chengwei Luo,
More informationGenomic Files. University of Massachusetts Medical School. October, 2014
.. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationPublic Repositories Tutorial: Bulk Downloads
Public Repositories Tutorial: Bulk Downloads Almost all of the public databases, genome browsers, and other tools you have explored so far offer some form of access to rapidly download all or large chunks
More informationHelping Non-traditional HPC Users Using XSEDE Resources Efficiently
Helping Non-traditional HPC Users Using XSEDE Resources Efficiently PIs: Robert Sean Norman (U South Carolina) Atsuko Tanaka and Chao Fu (U Wisconsin) ECSS staff: Shiquan Su National Institute of Computational
More informationWorkshop Practical on concatenation and model testing
Workshop Practical on concatenation and model testing Jacob L. Steenwyk & Antonis Rokas Programs that you will use: Bash, Python, Perl, Phyutility, PartitionFinder, awk To infer a putative species phylogeny
More informationHORIZONTAL GENE TRANSFER DETECTION
HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all
More informationAnnotating a Genome in PATRIC
Annotating a Genome in PATRIC The following step-by-step workflow is intended to help you learn how to navigate the new PATRIC workspace environment in order to annotate and browse your genome on the PATRIC
More informationData: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:
A Tutorial: De novo RNA- Seq Assembly and Analysis Using Trinity and edger The following data and software resources are required for following the tutorial: Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat
More informationVariant calling using SAMtools
Variant calling using SAMtools Calling variants - a trivial use of an Interactive Session We are going to conduct the variant calling exercises in an interactive idev session just so you can get a feel
More informationArgonne National Laboratory
The Use of ORACLE in Discovery of Distant Protein Sequence Similarities th Oracle Life Sciences Users Group Meeting June -, 00 Reston, VA Gyorgy Babnigg, Ph.D. Biosciences Division Protein Mapping Group
More informationTutorial: How to use the Wheat TILLING database
Tutorial: How to use the Wheat TILLING database Last Updated: 9/7/16 1. Visit http://dubcovskylab.ucdavis.edu/wheat_blast to go to the BLAST page or click on the Wheat BLAST button on the homepage. 2.
More informationFARAO Flexible All-Round Annotation Organizer. Documentation
FARAO Flexible All-Round Annotation Organizer Documentation This is a guide on how to install and use FARAO. The software is written in Perl, is aimed for Unix-like platforms, and should work on nearly
More informationIB047. Unix Text Tools. Pavel Rychlý Mar 3.
Unix Text Tools pary@fi.muni.cz 2014 Mar 3 Unix Text Tools Tradition Unix has tools for text processing from the very beginning (1970s) Small, simple tools, each tool doing only one operation Pipe (pipeline):
More informationUseful commands in Linux and other tools for quality control. Ignacio Aguilar INIA Uruguay
Useful commands in Linux and other tools for quality control Ignacio Aguilar INIA Uruguay 05-2018 Unix Basic Commands pwd ls ll mkdir d cd d show working directory list files in working directory as before
More informationBenchmarking Computational Tools for Polymorphic Transposable Element Detection
Supplementary Information for: Benchmarking Computational Tools for Polymorphic Transposable Element Detection Lavanya Rishishwar 1,2,3,4, Leonardo Mariño-Ramírez 3,5,* and I. King Jordan 1,2,3,4,* 1 School
More informationExercise 9: simple bash script
Exercise 9: simple bash script Write a bash script (call it blast_script.sh) to launch a BLAST search using the input data wnloaded previously and the command from the lecture blastall -p blastx -b 1 -d./databases/swissprot
More informationLING203: Corpus. March 9, 2009
LING203: Corpus March 9, 2009 Corpus A collection of machine readable texts SJSU LLD have many corpora http://linguistics.sjsu.edu/bin/view/public/chltcorpora Each corpus has a link to a description page
More informationTaller práctico sobre uso, manejo y gestión de recursos genómicos de abril de 2013 Assembling long-read Transcriptomics
Taller práctico sobre uso, manejo y gestión de recursos genómicos 22-24 de abril de 2013 Assembling long-read Transcriptomics Rocío Bautista Outline Introduction How assembly Tools assembling long-read
More informationUSEARCH Suite and UPARSE Pipeline. Susan Huse Brown University August 7, 2015
USEARCH Suite and UPARSE Pipeline Susan Huse Brown University August 7, 2015 USEARCH Robert Edgar USEARCH and UCLUST Edgar (201) Bioinforma)cs 26(19) UCHIME Edgar et al. (2011) Bioinforma)cs 27(16) UPARSE
More informationCalling variants in diploid or multiploid genomes
Calling variants in diploid or multiploid genomes Diploid genomes The initial steps in calling variants for diploid or multi-ploid organisms with NGS data are the same as what we've already seen: 1. 2.
More informationContact: Raymond Hovey Genomics Center - SFS
Bioinformatics Lunch Seminar (Summer 2014) Every other Friday at noon. 20-30 minutes plus discussion Informal, ask questions anytime, start discussions Content will be based on feedback Targeted at broad
More informationNGS Data and Sequence Alignment
Applications and Servers SERVER/REMOTE Compute DB WEB Data files NGS Data and Sequence Alignment SSH WEB SCP Manpreet S. Katari App Aug 11, 2016 Service Terminal IGV Data files Window Personal Computer/Local
More informationWashington State University School of EECS Computer Science Course Assessment Report
Washington State University School of EECS Computer Science Course Assessment Report Course Number CptS 224 Course Title Programming Tools Semesters Offered Summer Spring Instructor Andrew O'Fallon 10
More informationCS 25200: Systems Programming. Lecture 11: *nix Commands and Shell Internals
CS 25200: Systems Programming Lecture 11: *nix Commands and Shell Internals Dr. Jef Turkstra 2018 Dr. Jeffrey A. Turkstra 1 Lecture 11 Shell commands Basic shell internals 2018 Dr. Jeffrey A. Turkstra
More informationUploading sequences to GenBank
A primer for practical phylogenetic data gathering. Uconn EEB3899-007. Spring 2015 Session 5 Uploading sequences to GenBank Rafael Medina (rafael.medina.bry@gmail.com) Yang Liu (yang.liu@uconn.edu) confirmation
More informationLecture 8. Sequence alignments
Lecture 8 Sequence alignments DATA FORMATS bioawk bioawk is a program that extends awk s powerful processing of tabular data to processing tasks involving common bioinformatics formats like FASTA/FASTQ,
More informationR & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:
Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationGenomic Files. University of Massachusetts Medical School. October, 2015
.. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationBy Ludovic Duvaux (27 November 2013)
Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ==========
More informationapplied regex implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1
applied regex cl implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1 Is there a regular expression for every FSM? a 1
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationGalaxy workshop at the Winter School Igor Makunin
Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis
More informationPacBio SMRT Analysis 3.0 preview
PacBio SMRT Analysis 3.0 preview David Alexander, Ph.D. Pacific Biosciences, Inc. FIND MEANING IN COMPLEXITY For Research Use Only. Not for use in diagnostic procedures. Copyright 2015 by Pacific Biosciences
More informationMasher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs
Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Anas Abu-Doleh 1,2, Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 1 Department of Biomedical Informatics 2 Department of Electrical
More informationOur Task At Hand Aggregate data from every group
Where magical things happen Our Task At Hand Aggregate data from every group That s not too bad? Make it accessible to the public Just some basic HTML? Simple enough, right? Our Real Task Manage 1 million+
More informationGenome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly
2 Sept Groups Group 5 was down to 3 people so I merged it into the other groups Group 1 is now 6 people anyone want to change? The initial drafter is not the official leader use any management structure
More informationIntroduction to High Performance Computing (HPC) Resources at GACRC
Introduction to High Performance Computing (HPC) Resources at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? Concept
More informationMapping NGS reads for genomics studies
Mapping NGS reads for genomics studies Valencia, 28-30 Sep 2015 BIER Alejandro Alemán aaleman@cipf.es Genomics Data Analysis CIBERER Where are we? Fastq Sequence preprocessing Fastq Alignment BAM Visualization
More informationAnthill User Group Meeting, 2015
Agenda Anthill User Group Meeting, 2015 1. Introduction to the machines and the networks 2. Accessing the machines 3. Command line introduction 4. Setting up your environment to see the queues 5. The different
More informationManual of mirdeepfinder for EST or GSS
Manual of mirdeepfinder for EST or GSS Index 1. Description 2. Requirement 2.1 requirement for Windows system 2.1.1 Perl 2.1.2 Install the module DBI 2.1.3 BLAST++ 2.2 Requirement for Linux System 2.2.1
More informationPractical: Using LAST and MEGAN to get a quick view of a metagenome
Practical: Using LAST and MEGAN to get a quick view of a metagenome Daniel Lundin Linneaeus University November 14, 2014 Daniel Lundin (LNU) LAST+MEGAN practical November 14, 2014 1 / 25 A GIT archive
More informationFinding the appropriate method, with a special focus on: Mapping and alignment. Philip Clausen
Finding the appropriate method, with a special focus on: Mapping and alignment Philip Clausen Background Most people choose their methods based on popularity and history, not by reasoning and research.
More informationRunning Programs in UNIX 1 / 30
Running Programs in UNIX 1 / 30 Outline Cmdline Running Programs in UNIX Capturing Output Using Pipes in UNIX to pass Input/Output 2 / 30 cmdline options in BASH ^ means "Control key" cancel a running
More informationBioinformatics Services for HT Sequencing
Bioinformatics Services for HT Sequencing Tyler Backman, Rebecca Sun, Thomas Girke December 19, 2008 Bioinformatics Services for HT Sequencing Slide 1/18 Introduction People Service Overview and Rates
More informationMiniproject 1. Part 1 Due: 16 February. The coverage problem. Method. Why it is hard. Data. Task1
Miniproject 1 Part 1 Due: 16 February The coverage problem given an assembled transcriptome (RNA) and a reference genome (DNA) 1. 2. what fraction (in bases) of the transcriptome sequences match to annotated
More informationPart 1: Basic Commands/U3li3es
Final Exam Part 1: Basic Commands/U3li3es May 17 th 3:00~4:00pm S-3-143 Same types of questions as in mid-term 1 2 ls, cat, echo ls -l e.g., regular file or directory, permissions, file size ls -a cat
More informationRAMMCAP The Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline
RAMMCAP The Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline Weizhong Li, liwz@sdsc.edu CAMERA project (http://camera.calit2.net) Contents: 1. Introduction 2. Implementation
More informationInformation Resources in Molecular Biology Marcela Davila-Lopez How many and where
Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,
More informationHyDRA Web User Guide
HyDRA Web User Guide Public Health Agency of Canada 2016-8-31 i This application was developed through collaboration between the National Laboratory for HIV Genetics and the Bioinformatics Core at the,
More informationIntroduc)on to annota)on with Artemis. Download presenta.on and data
Introduc)on to annota)on with Artemis Download presenta.on and data Annota)on Assign an informa)on to genomic sequences???? Genome annota)on 1. Iden.fying genomic elements by: Predic)on (structural annota.on
More informationUsing the Galaxy Local Bioinformatics Cloud at CARC
Using the Galaxy Local Bioinformatics Cloud at CARC Lijing Bu Sr. Research Scientist Bioinformatics Specialist Center for Evolutionary and Theoretical Immunology (CETI) Department of Biology, University
More informationPHYLOGENOMICS WORKSHOP
PHYLOGENOMICS WORKSHOP This phylogenomics tutorial is divided into 3 major sections. The first section deals with identification of orthologs from closely related plasmodium species. Second section is
More informationMeraculous De Novo Assembly of the Ariolimax dolichophallus Genome. Charles Cole, Jake Houser, Kyle McGovern, and Jennie Richardson
Meraculous De Novo Assembly of the Ariolimax dolichophallus Genome Charles Cole, Jake Houser, Kyle McGovern, and Jennie Richardson Meraculous Assembler Published by the US Department of Energy Joint Genome
More informationHigh-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines: Illumina MiSeq,
More informationLecture 3. Essential skills for bioinformatics: Unix/Linux
Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,
More informationSequence Preprocessing: A perspective
Sequence Preprocessing: A perspective Dr. Matthew L. Settles Genome Center University of California, Davis settles@ucdavis.edu Why Preprocess reads We have found that aggressively cleaning and processing
More informationEBI patent related services
EBI patent related services 4 th Annual Forum for SMEs October 18-19 th 2010 Jennifer McDowall Senior Scientist, EMBL-EBI EBI is an Outstation of the European Molecular Biology Laboratory. Overview Patent
More informationDatabase Searching Using BLAST
Mahidol University Objectives SCMI512 Molecular Sequence Analysis Database Searching Using BLAST Lecture 2B After class, students should be able to: explain the FASTA algorithm for database searching explain
More informationPandaseq Tutorial Documentation
Pandaseq Tutorial Documentation Release 0.0 Adina Howe Aug 17, 2017 Contents 1 Merging paired-end Illumina reads with pandaseq 3 2 Indices and tables 5 i ii Pandaseq Tutorial Documentation, Release 0.0
More informationA generic and modular platform for automated sequence processing and annotation. Arthur Gruber
2 A generic and modular platform for automated sequence processing and annotation Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP 2 Sequence processing and annotation
More information11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub
trinityrnaseq / RNASeq_Trinity_Tuxedo_Workshop Trinity De novo Transcriptome Assembly Workshop Brian Haas edited this page on Oct 17, 2015 14 revisions De novo RNA-Seq Assembly and Analysis Using Trinity
More informationGalaxy Platform For NGS Data Analyses
Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory Collaboratory Workshops Workshop Outline ü Day 1 UCLA galaxy and user account
More informationDNA sequences obtained in section were assembled and edited using DNA
Sequetyper DNA sequences obtained in section 4.4.1.3 were assembled and edited using DNA Baser Sequence Assembler v4 (www.dnabaser.com). The consensus sequences were used to interrogate the GenBank database
More informationKEGGscape. Release 0.8.1
KEGGscape Release 0.8.1 Oct 21, 2018 Contents 1 Installing KEGGscape 3 2 How to import KEGG pathway xml(kgml) to Cytoscape 5 2.1 Importing kgml to Cytoscape with REST endpoint...........................
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationWorkplace Risk Assessment System (WRAS) User Guide
Workplace Risk Assessment System (WRAS) User Guide This user guide provides a step by step walkthrough on the use of WRAS. Please contact the Office of Health and Safety @ : ohs@ntu.edu.sg If you have
More informationHow to use KAIKObase Version 3.1.0
How to use KAIKObase Version 3.1.0 Version3.1.0 29/Nov/2010 http://sgp2010.dna.affrc.go.jp/kaikobase/ Copyright National Institute of Agrobiological Sciences. All rights reserved. Outline 1. System overview
More informationShell Programming. Introduction to Linux. Peter Ruprecht Research CU Boulder
Introduction to Linux Shell Programming Peter Ruprecht peter.ruprecht@colorado.edu www.rc.colorado.edu Downloadable Materials Slides and examples available at https://github.com/researchcomputing/ Final_Tutorials/
More informationLinux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015
Linux command line basics III: piping commands for text processing Yanbin Yin Fall 2015 1 h.p://korflab.ucdavis.edu/unix_and_perl/unix_and_perl_v3.1.1.pdf 2 The beauty of Unix for bioinformagcs sort, cut,
More informationHymenopteraMine Documentation
HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................
More informationGenome Browser. Background and Strategy
Genome Browser Background and Strategy Contents What is a genome browser? Purpose of a genome browser Examples Structure Extra Features Contents What is a genome browser? Purpose of a genome browser Examples
More informationASAP - Allele-specific alignment pipeline
ASAP - Allele-specific alignment pipeline Jan 09, 2012 (1) ASAP - Quick Reference ASAP needs a working version of Perl and is run from the command line. Furthermore, Bowtie needs to be installed on your
More informationReview of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014
Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014 Deciphering the information contained in DNA sequences began decades ago since the time of Sanger sequencing.
More informationAccelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture
Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture Dong-hyeon Park, Jon Beaumont, Trevor Mudge University of Michigan, Ann Arbor Genomics Past Weeks ~$3 billion Human Genome
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu 1 Outline What is GACRC? What is HPC Concept? What
More informationSequencing Data. Paul Agapow 2011/02/03
Webservices for Next Generation Sequencing Data Paul Agapow 2011/02/03 Aims Assumed parameters: Must have a system for non-technical users to browse and manipulate their Next Generation Sequencing (NGS)
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationMetagenome Processing and Analysis
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Metagenome Processing and Analysis Sheetal Gosrani Follow this and additional works at: http://scholarworks.sjsu.edu/etd_projects
More informationbiokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data
biokepler: A Comprehensive Bioinforma2cs Scien2fic Workflow Module for Distributed Analysis of Large- Scale Biological Data Ilkay Al/ntas 1, Jianwu Wang 2, Daniel Crawl 1, Shweta Purawat 1 1 San Diego
More informationSEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi
SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University
More information