Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales

Size: px

Start display at page:

Download "Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales"

Sara Gregory
6 years ago
Views:

1 Microbial metabolisms in a new 2.5 km deep ecosystem created by hydraulic fracturing in shales R.A. Daly 1, M.A. Borton 1, M.J. Wilkins 1,2, D.W. Hoyt 3, D.J. Kountz 1, R.A. Wolfe 1, S.A. Welch 2, D.N. Marcus 1, R.V. Trexler 4, J.D. MacRae 5, J. A. Krzycki 1, D.R. Cole 2, P.J. Mouser 4, K.C. Wrighton 1 1 Department of Microbiology, The Ohio State University, Columbus, OH, 43214, USA 2 School of Earth Sciences, The Ohio State University, Columbus, OH 43214, USA 3 EMSL, Pacific Northwest National Laboratory, Richland, WA 99352, USA 4 Department of Civil, Environmental, and Geodetic Engineering, The Ohio State University, Columbus, OH, 43214, USA 5 Department of Civil and Environmental Engineering, University of Maine, Orono, ME, 04469, USA Quality Control (Sickle) sickle pe -f R1_All.fastq -r R2_All.fastq -t sanger -o R1_All_trimmed.fastq -p R2_All_trimmed.fastq -s R1R2_All_trimmed.fastq fq2fa --merge --filter R1_All_trimmed.fastq R2_All_trimmed.fastq R1R2_All_trimmed.fa Assembly (IDBA-UD) fq2fa --merge --filter R1_All_trimmed.fastq R2_All_trimmed.fastq R1R2_All_trimmed.fa idba_ud -r R1R2_All_trimmed.fa -o idba_assembled_output Coverage calculation (Bowtie2) bowtie2-build scaffold.fa scaffold_fa bowtie2 --fast -p 33 -x scaffold_fa -S All_mappedtoall_paired.sam -1 R1_All_trimmed.fastq -2 R2_All_trimmed.fastq --un unmapped_paired.fq --al mapped_paired.fq grep -v '^@' All_mappedtoall_paired.sam awk '{count [$3]++} END { for ( j in count ) print j, "\t"count[j] }' sort -rn -t $'\t' -k2,2 > R1R2_ALL_contig_reads_paired.txt Subassemblies (using 10% and 8% of reads for T13 and T82 samples) Daly et al.- Submission for consideration at Nature Data File 3 1

2 10% T13: python /opt/scripts/bin/pullseq_random_fastq.py -i R1_All_trimmed.fastq -o R1_All_trimmed_10_percent.fastq -s 10 python /opt/scripts/bin/pullseq_random_fastq.py -i R2_All_trimmed.fastq -o R2_All_trimmed_10_percent.fastq -s 10 time fq2fa --merge --filter R1_All_trimmed_10_percent.fastq R2_All_trimmed_10_percent.fastq R1R2_All_trimmed_10_percent.fa idba_ud -r R1R2_All_trimmed_10_percent.fa -o idba_assembled_10_percent_output 8% T82: python /opt/scripts/bin/pullseq_random_fastq.py -i /home/projects/shale/marcellus_2013/sample_ /data_qc/ _tgtgaa _L003_R1_ALL_trimmed.fastq -o R1_All_trimmed_8_percent.fastq -s 12 python /opt/scripts/bin/pullseq_random_fastq.py -i /home/projects/shale/marcellus_2013/sample_ /data_qc/ _tgtgaa _L003_R2_ALL_trimmed.fastq -o R2_All_trimmed_8_percent.fastq -s 12 time fq2fa --merge --filter R1_All_trimmed_8_percent.fastq R2_All_trimmed_8_percent.fastq R1R2_All_trimmed_8_percent.fa idba_ud -r R1R2_All_trimmed_8_percent.fa -o idba_assembled_8_percent_output Annotation pullseq.py -i scaffold.fa -m o contigs_1000.fa prodigal -i contigs_1000.fa -o contigs_1000.genes -a contigs_1000.genes.faa -d contigs_1000.genes.fna -p meta -m /opt/my_interproscan_ /interproscan /interproscan.sh -i contigs_1000.genes.faa -o combined.iprscan -f TSV -dp -appl TIGRFAM,Pfam,ProSiteProfiles,ProSitePatterns -iprlookup -goterms usearch -ublast contigs_1000.genes.faa -db /ORG-Data/Database/UniRef/uniref90.udb - maxhits 1 -evalue blast6out renamed_ublast_uniref90.b6 renamed_ublast_uniref90.b6 > temp1 awk ' $12 > 60 { print $0 }' temp1 > renamed_ublast_uniref90.b6.bit_score60.b6 usearch -ublast contigs_1000.genes.faa -db /ORG-Data/Database/KEGG/kegg-allorgs_ pep.udb -maxhits 1 -evalue blast6out renamed_ublast_kegg.b6 renamed_ublast_kegg.b6 > temp2 Daly et al.- Submission for consideration at Nature Data File 3 2

3 awk ' $12 > 60 { print $0 }' temp2 > renamed_ublast_kegg.b6.bit_score60.b6 usearch -makeudb_ublast contigs_1000.genes.faa -output contigs_1000.genes.faa.udb usearch -ublast /ORG-Data/Database/UniRef/uniref90.fasta -db contigs_1000.genes.faa.udb -maxhits 1 -evalue blast6out uniref90_ublast_renamed.b6 uniref90_ublast_renamed.b6 > temp4 awk ' $12 > 300 { print $0 }' temp4 > uniref90_ublast_renamed.b6_bit_score_300 usearch -ublast /ORG-Data/Database/KEGG/kegg-all-orgs_ pep -db contigs_1000.genes.faa.udb -maxhits 1 -evalue blast6out KEGG_ublast_renamed.b6 KEGG_ublast_renamed.b6 > temp3 awk ' $12 > 300 { print $0 }' temp3 > KEGG_ublast_renamed.b6_BIT_SCORE_300 rbh.rb --forward renamed_ublast_uniref90.b6.bit_score60.b6 --reverse uniref90_ublast_renamed.b6_bit_score_300 > renamed.unirbh.txt rbh.rb --forward renamed_ublast_kegg.b6.bit_score60.b6 --reverse KEGG_ublast_renamed.b6_BIT_SCORE_300 > renamed.keggrbh.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl1.pl renamed_ublast_uniref90.b6.bit_score60.b6 > renamed_ublast_uniref90.b6.bit_score60.b6.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl1.pl renamed_ublast_kegg.b6.bit_score60.b6 > renamed_ublast_kegg.b6.bit_score60.b6.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl2.pl renamed.unirbh.txt > renamed.unirbh.txt.out1.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl2.pl renamed.keggrbh.txt > renamed.keggrbh.txt.out1.txt Daly et al.- Submission for consideration at Nature Data File 3 3

4 perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl4_NEW.pl renamed_ublast_uniref90.b6.bit_score60.b6.out1.txt renamed_ublast_kegg.b6.bit_score60.b6.out1.txt renamed.unirbh.txt.out1.txt renamed.keggrbh.txt.out1.txt combined.iprscan > ANNOTATION_OUT_ contigs_1000.genes.faa.3.txt grep "RBH" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=kegg" > RBH_KEGG1 grep "RBH" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=uniref" >> RBH_KEGG1 grep "BLAST" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=kegg" >> RBH_KEGG1 grep "BLAST" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt grep "db=uniref" >> RBH_KEGG1 grep "IPRSCAN" ANNOTATION_OUT_contigs_1000.genes.faa.3.txt >> RBH_KEGG1 python /ORG-Data/scripts/bin/Phylogeny_Protpipe/pull_all_contig_annotations.py -i RBH_KEGG1 -o ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt perl /ORG-Data/scripts/bin/Phylogeny_Protpipe/perl6.pl ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt > T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt python /ORG-Data/scripts/bin/Phylogeny_Protpipe/write_annotation_to_fasta.py -i contigs_1000.genes.faa -o contigs_1000.genes.faa.3.4 -a ANNOTATION_OUT_contigs_1000.genes.faa.3.txt_FINAL.txt -j T0 python /ORG-Data/scripts/bin/Phylogeny_Protpipe/write_annotation_to_fasta.py -i contigs_1000.genes.faa -o contigs_1000.genes.fna.3.4 -a ANNOTATION_OUT_contigs_1000.genes.fna.3.txt_FINAL.txt -j T0 sed -i "s/^/t0_/g" T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt grep 'Unknown_Function' contigs_1000.genes.faa.3.4 > T0.contigs_1000.genes.faa.3.4.5_unknown_headers sed -i "s/>//g" T0.contigs_1000.genes.faa.3.4.5_unknown_headers awk -F' Unknown_Function' '{print $1 "\tf\tunknown Function"}' T0.contigs_1000.genes.faa.3.4.5_unknown_headers >> T0.ANNOTATION_OUT.txt_FINAL_RANKED.txt Estimated genome completion (Amphora2) perl /opt/amphora2/scripts/markerscanner.pl -DNA../Marinobacter_T13_10_percent.contigs_1000.genes.fna Evalue 1e-3 Daly et al.- Submission for consideration at Nature Data File 3 4

5 perl /opt/amphora2/scripts/markeraligntrim.pl -WithReference -OutputFormat phylip perl /opt/amphora2/scripts/phylotyping.pl -CPUs 20 > Marinobacter_T13_10_percent _phylotype_1e-20.result python /opt/scripts/bin/phylogeny_protpipe/single_copy_genes_make_table.py -i Marinobacter_T13_10_percent _phylotype_1e-20.result -t Mixed 16S rrna gene reconstruction from reads (EMIRGE) longest_sequence_fastq.py -i R1_All_trimmed.fastq longest_sequence_fastq.py -i R2_All_trimmed.fastq emirge.py DIR -1../R1_All_trimmed.fastq -2../R2_All_trimmed.fastq -f /opt/emirgemaster/ssuref_111_candidate_db.fasta -b /opt/emirgemaster/ssu_candidate_db_btindex -l 114 -i 500 -s 150 -n 50 -a 20 --phred33 emirge_rename_fasta.py iter.50 > T0_renamed.fasta Identification of CRISPR repeat and spacer sequences (CRASS) crass../r1r2_all_trimmed.fa crisprtools stat -aph crass.crispr > crisprtools_stat.out crisprtools extract -o crisprtools_extract -s -xc -d -f crass.crispr cat crisprtools_extract/*_direct_repeats.fa > All_direct_repeats.fa pullseq.py -i /home/projects/shales/hilary_morrison/project_dco_wrighton/sample_kelly_wrighto n_1/r1r2_trimmed_assembled/scaffold.fa -m o contigs_5000.fa makeblastdb -in contigs_5000.fa -dbtype nucl blastn -db contigs_5000.fa -query All_T0_direct_repeats.fa -out DR_to_scaffolds_5000_blastn -outfmt 6 -num_threads 10 -evalue 1e-8 awk '{print $2 }' DR_to_scaffolds_5000_blastn > DR_Scaffolds_5000.txt pullseq_header_name.py -i contigs_5000.fa -o scaffolds_5000_bactdr.fa -n DR_Scaffolds_5000_bact.txt -e F pullseq_header_name.py -i contigs_5000.fa -o scaffolds_5000_minusbactdr.fa -n DR_Scaffolds_5000_bact.txt -e T Daly et al.- Submission for consideration at Nature Data File 3 5

6 makeblastdb -in scaffolds_5000_minusbactdr.fa -dbtype nucl cat crisprtools_extract/*_spacers.fa > All_spacers.fa blastn -db scaffolds_5000_minusbactdr.fa -query All_T0_spacers.fa -out SP_to_scaffolds_5000_blastn -outfmt 6 crass_parsing.pl All_T0_direct_repeats.fa DR_to_scaffolds_5000_blastn > crass_summary_dr.txt sed -i '/^$/d' crass_summary_dr.txt sort -u -k2,2 crass_summary_dr.txt > crass_summary_dr1.txt crass_parsing.pl All_T0_spacers.fa SP_to_scaffolds_5000_blastn > crass_summary_sp.txt sed -i '/^$/d' crass_summary_sp.txt sort -u -k2,2 crass_summary_sp.txt > crass_summary_sp1.txt Daly et al.- Submission for consideration at Nature Data File 3 6

Sequence Alignment: BLAST

E S S E N T I A L S O F N E X T G E N E R A T I O N S E Q U E N C I N G W O R K S H O P 2015 U N I V E R S I T Y O F K E N T U C K Y A G T C Class 6 Sequence Alignment: BLAST Be able to install and use