School of Energy and Environment, City University of Hong Kong, Hong Kong. SeqMatic LLC, Fremont, CA, 94539, United States of America

Size: px
Start display at page:

Download "School of Energy and Environment, City University of Hong Kong, Hong Kong. SeqMatic LLC, Fremont, CA, 94539, United States of America"

Transcription

1 Skin fungal community: the effects of hosts, co-colonizing bacteria, and environmental fungi in shaping and expanding the continental pan-mycobiome Marcus H. Y. Leung 1, Kelvin C. K. Chan 2, and Patrick K. H. Lee 1 * 1 School of Energy and Environment, City University of Hong Kong, Hong Kong 2 SeqMatic LLC, Fremont, CA, 94539, United States of America *Correspondence: B5423-AC1, School of Energy and Environment, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong patrick.kh.lee@cityu.edu.hk; Tel: (852) ; Fax: (852) This document contains original in-house codes and scripts used to generate the results (including Fig. 1, 2, 3, 5 and Additional Files 1-5, and 7-8) described in the manuscript. The document is divided into the following sections: 1) In-house script for read quality-filtering 2) In-house scripts for OTU-clustering and quality control, including chimera and contaminants removal 3) In-house scripts for alpha-diversity analyses 4) In-house scripts for beta-diversity analysis 5) In-house scripts for cross-kingdom alpha/beta-diversity comparisons 6) In-house script for taxonomic analysis 7) In-house scripts for Malassezia species-level taxonomic analysis 8) In-house script for multi-study comparison plot generation Please note that while in-house scripts required for results and figure generation are presented on this document, some minor tasks (such as table reformatting) were performed in Microsoft Excel as described below. For example, the determination of average alpha-diversity values for each sample following ten rounds of rarefaction was performed using Pivot Table function on Excel. Also note that the exact same scripts will not function across different computers. It is the responsibility of the readers to understand the scripts included here and modify accordingly. Some of the R packages required for the following scripts are: - devtools - wilkoxmisc - ggplot2 - reshape2 - plyr - pgirmess They can be installed by the input in R (for example): install.packages( pgirmess ) 1

2 1) In-house script for read quality-filtering Fastq/fasta reads preparation for read quality control and OTU clustering. Following FLASH forward and reverse reads alignment for each sample, merged reads from all samples (one merged reads file per sample) are concatenated into one fastq file: 1.1) cat *.fastq > combined.fastq The resulting merged fastq file was used as input for read quality control steps using usearch commands as described in main text Methods section. 2

3 2) In-house scripts for OTU-clustering and quality control, including chimera and contaminants removal OTU-clustering: following read quality-filtering and demutiplexing from usearch, usearch cluster_otu was used to generate OTU fasta file. Using the OTU fasta file as an input, the following perl script was used to generate a fasta file with OTU named as numbers, and a fasta file with the renamed OTUs: 2.1) assign_otu_numbers.pl #!/usr/bin/perl use Modern::Perl 2014; use autodie; $ ++; open IN, '<', 'OTUs.fasta'; open OUT, '>', 'OTUs_numbered.fasta'; open MAP, '>', 'OTU_to_reference_sequence.tidy.txt'; say MAP "OTU\tReferenceSequence\tSize"; my $OTU = 0; while (<IN>) { chomp; print "\r$. lines processed" unless $. % 1000; if (/^>(?<read>.+);size=(?<size>\d+)$/) { $OTU++; say MAP "$OTU\t$+{read\t$+{size"; say OUT ">OTU_$OTU"; else { say OUT; say "\r$. lines processed"; close IN; close OUT; close MAP; This output file was used to perform taxonomic classification using QIIME s assign_taxonomy.py script. Following chimera detection using usearch uchime_ref command, script below was used to generate txt file containing a list of chimeric OTUs: 2.2) make_chimeras_list.pl #!/usr/bin/perl use Modern::Perl 2014; use autodie; $ ++; open IN, '<', './chimeras.fasta'; open OUT, '>', 'chimeras.tidy.txt'; say OUT "OTU"; while (<IN>) { chomp; next unless /^>/; (my $OTU) = $_ =~ /^>(.+)/; 3

4 say OUT $OTU; close IN; close OUT; OTU table was prepared by compiling the following input files: -OTU_to_reference_sequence.tidy.txt (output from 2.1) -OTU_numbered_tax_assignments.txt (output of assign_taxonomy.py QIIME script) -readmap.uc (output of usearch usearch_global command) And generates the following output files: -OTU_table.tidy.txt (OTU table including OTUs that are chimeric and contaminant) -singletons.txt (txt file containing a list of singleton OTUs) 2.3) prepare_otu_table.pl #!/usr/bin/perl use Modern::Perl 2014; use autodie; $ ++; # Load OTU reference sequences say "Loading OTU reference seqences"; open OTUREFMAP, '<', './OTU_to_reference_sequence.tidy.txt'; my %OTUofRefSeq; while (<OTUREFMAP>) { next if $. == 1; chomp; print "\r$. lines processed" unless $. % 1000; (my $OTU, my $refseq) = split(/\t/, $_); $OTUofRefSeq{$refSeq = $OTU; say "\r$. lines processed"; close OTUREFMAP; # Load OTU taxonomies say "Loading OTU taxonomies"; open TAX, '<', './OTUs_numbered_tax_assignments.txt'; my %taxonomy; while (<TAX>) { chomp; print "\r$. lines processed" unless $. % 1000; = split /\t/, $_; my $OTU = $line[0]; if ($line[1] eq 'Unassigned') = ('') x 7; else = split(/;\s/, $line[1]); s/^. say "\r$. lines processed"; 4

5 close TAX; # Count reads for each OTU say "Counting reads for each OTU"; open READMAP, '<', 'readmap.uc'; my %readcount; my %OTUReadCount; while (<READMAP>) { chomp; print "\r$. lines processed" unless $. % 1000; next if /^N/; = split(/\t/, $_); (my $read, my $OTU) (my $sample) = $read =~ /^([^\ ]+)/; $readcount{$otu{$sample++; $OTUReadCount{$OTU++; say "\r$. lines processed"; close READMAP; # Produce list of singletons say "Producing list of singletons"; my %singletons; open SINGLETONS, '>', 'singletons.txt'; say SINGLETONS "OTU"; foreach my $OTU (keys %OTUReadCount) { if ($OTUReadCount{$OTU == 1) { say SINGLETONS $OTU; $singletons{$otu = 1; close SINGLETONS; # Produce OTU table say "Producing OTU table"; open OTUTABLE, '>', 'OTU_table.tidy.txt'; say OTUTABLE "OTU\tSample\tCount\tKingdom\tPhylum\tClass\tOrder\tFamily\tGe nus\tspecies"; foreach my $OTU (sort keys %readcount) { # Skip singletons next if exists $singletons{$otu; foreach my $sample (sort keys %{$readcount{$otu) { say OTUTABLE "$OTU\t$sample\t$readCount{$OTU{$sample\t", close OTUTABLE; Following creation of OTU table from 2.3), will need to identify contaminant OTUs from the table, and remove from the OTU table later. This is performed by detecting lineages that are present in negative controls in more than 5% of reads. The script below takes in OTU_table.tidy.txt file from 2.3), and generates two output files: 5

6 -contaminant_lineages.tidy.txt (a list of lineages deemed contaminants) -contaminants.txt (a list of OTUs deemed contaminants) 2.4) classify_contaminants.r # Libraries library(wilkoxmisc) # List of blank samples BlankSamples <- c("name_of_negative_sample(s)") # Read in OTU table OTUTable <- read.tidy("otu_table.tidy.txt") # Collapse by lineage OTUTable <- within(otutable, Lineage <- factor(paste(kingdom, Phylum, Class, Order, Family, Genus, Species))) OTUsByLineage <- unique(otutable[, c("otu", "Lineage")]) OTUTable <- ddply(otutable,.(sample, Lineage), summarise, Count = sum(count),.progress = "time") # Select blank samples Blank <- subset(otutable, Sample %in% BlankSamples) # Add relative abundances Blank <- add.relative.abundance(blank) # Aggregate Blank <- ddply(blank,.(lineage), summarise, RelativeAbundance = sum(relativeabundance)) # Calculate value for cutoff Cutoff <- sum(blank$relativeabundance) * 0.05 # Trim contaminant list to lineages above cutoff Blank <- Blank[which(Blank$RelativeAbundance > Cutoff), ] # Write contaminant lineages to file write.tidy(blank, "contaminant_lineages.tidy.txt") # Sort OTUs into Contaminant/Non-contaminant Contaminants <- within(otusbylineage, Contaminant <- ifelse(lineage %in% Blank$Lineage, "Contaminant", "Noncontaminant")) Contaminants$Lineage <- NULL # Write to file write.tidy(contaminants, "contaminants.txt") Having identified chimeric OTUs (chimeras.tidy.txt from 2.2) and contaminant OTUs (contaminants.txt from 2.4), these files will be used to identify OTUs to be removed from OTU_table.tidy.txt. The output file will be OTU_table_clean.tidy.txt containing high-qualilty, non-chimeric, and non-contaminating OTUs. 6

7 2.5) clean_otu_table.r # Libraries library(wilkoxmisc) # Read in OTU table OTUTable <- read.tidy("otu_table.tidy.txt") # Add fate column OTUTable$Fate <- rep(na, nrow(otutable)) # BLANK SAMPLES BlankSamples <- c("blk", "Blk2") # Remove blank samples OTUTable <- OTUTable[which(! OTUTable$Sample %in% BlankSamples), ] # CHIMERAS ## Read in list of chimeras Chimeras <- read.tidy("chimeras.tidy.txt") Chimeras <- as.character(chimeras$otu) # Mark chimeric OTUs OTUTable$Fate <- ifelse( OTUTable$OTU %in% Chimeras & is.na(otutable$fate), 'Chimera', OTUTable$Fate ) # CONTAMINANTS ## Read in list of contaminants Contaminants <- read.tidy("contaminants.txt") Contaminants <- as.character(contaminants[which(contaminants$contaminant == 'Contaminant'), "OTU"]) # Mark contaminant OTUs OTUTable$Fate <- ifelse( OTUTable$OTU %in% Contaminants & is.na(otutable$fate), 'Contaminant', OTUTable$Fate ) ## OUTPUT # Summarise OTUs by fate and write to file OTUFates <- unique(otutable[c("otu", "Fate")]) write.tidy(otufates, "OTU_fates.tidy.txt") # Remove failures from OTU table and write to file OTUTable <- OTUTable[which(is.na(OTUTable$Fate)), ] OTUTable$Fate <- NULL write.tidy(otutable, "OTU_table_clean.tidy.txt") Prepare clean OTU_table to format readable for biom_convert command in QIIME: 7

8 2.6) cast_otu_table.r # Libraries library(wilkoxmisc) library(reshape2) # Read in OTU table OTUCounts <- read.tidy("otu_table_clean.tidy.txt") # Cast OTUCounts <- dcast(otucounts, OTU ~ Sample, value.var = "Count", fill = 0) # Write write.tidy(otucounts, "OTU_table_clean.cast.txt") The output OTU_table_clean.cast.txt can be used as input for biom_convert command in QIIME. 8

9 3) In-house scripts for alpha-diversity analyses Following rarefaction using QIIME scripts multiple_rarefaction.py and alpha_diversity.py, ten separate rarefied alpha diversity txt files were created. The cat function was used to combine all ten files into a single combined file: 3.1) cat alpha_diversity_rarefied_files_*.txt > alpha_diversity_combined.txt From alpha_diversity_combined.txt, table was reorganized using Microsoft Excel to create the average alpha diversity measurements (Observed OTUs/Chao1/Simpson s/shannon) for each sample. This average file is saved as alpha_average_1175.txt. Data from Additional File 1 originates from this file. For Mann-Whitney and Kruskal-Wallis statistical tests (data shown in Additional File 2), the commands wilcox.test and kruskal.test were computed (commands that are not in-house). Also required for this script is a Metadata txt file, created during sample collection, created manually on Microsoft Excel. 3.2) alpha_statistical_test.r library(wilkoxmisc) library(devtools) library(ggplot2) #Open data and meta files and merge Alpha <- read.tidy("alpha_average_1175.txt") Meta <- read.tidy("metadata.txt") Merge <- merge(alpha, Meta, by = "Sample", all.x = TRUE) #Perform Kruskal-Wallis test or Mann-Whitney test for statistical significance of average alpha-diversities between variables Wilcox.test(Observed_OTU~Gender, data=merge) Kruskal.test(Observed OTU~Age_Group, data=merge) #For significant Kruskal-Wallis comparisons, kruskalmc command in pgirmess package is used to perform multiple pairwise comparisons (requires library(pgirmess)) library(pgirmess) kruskalmc(merge$observed_otu,merge$age_group, data=merge) Using alpha_average_1175.txt as input, the following R script was entered to generate plots via ggplot. The script generates plots on Additional File 3. Also required for this script is a Metadata txt file, created during sample collection, created manually on Microsoft Excel. 3.3) make_alpha_plot.r library(wilkoxmisc) library(devtools) library(ggplot2) 9

10 #Open data and meta files and merge Alpha <- read.tidy("alpha_average_1175.txt") Meta <- read.tidy("metadata.txt") Merge <- merge(alpha, Meta, by = "Sample", all.x = TRUE) #Plot boxplot based on comparisons (single example provided) Plot <- ggplot(merge, aes(x = Age_Group, y = Observed_f)) Plot <- Plot + geom_boxplot() Plot <- Plot + coord_flip() Plot <- Plot + xlab(paste0("age Group")) + ylab(paste0("observed Number of OTUs")) Plot <- Plot + theme_classic() Plot <- Plot + theme(legend.title = element_blank()) Plot <- Plot + theme(axis.title=element_text(size=18, face="bold")) Plot <- Plot + theme(axis.text=element_text(size=14, face="bold")) Plot <- Plot + theme(legend.text = element_text(size=14, face = "bold")) ggsave("age_group_observed_1175.png") 10

11 4) In-house scripts for beta-diversity analysis QIIME script beta_diversity.py generates two output matrix files, one for Bray- Curtis (abundance-weighted, taxonomic-based beta-diversity analysis) and one for Jaccard (abundance-nonweighted, taxonomic-based beta-diversity analysis). For each matrix file, the following script was used to determine ANOSIM and statistical significance for predictive variables (data in Additional File 2). 4.1) anosim.r #Load libraries (these packages are required to draw PCoA plot and run ANOSIM) library(wilkoxmisc) library(ggplot2) library(devtools) #Open beta diversity output matrix file, as well as metadata file Beta <- read.dist("bray_curtis_rarefied_otu_table_clean.txt") Meta <- read.tidy("../metadata/metadata_rarefied_1175_dot.txt") BetaMatrix <- as.matrix(beta) #Perform ANOSIM #First list samples from both beta diversity matrix file and metadata, to check if the two lists are identical AllSamples <-data.frame(sample = row.names(betamatrix)) AllSamples <- merge(allsamples, Meta, by = "Sample", all.x = TRUE) #Check that the two lists are identical (output is either TRUE or FALSE) sum(row.names(betamatrix)==allsamples$sample) == length(allsamples$sample) #Perform ANOSIM based on grouping of your choice (e.g. Gender/Household/Age_Group etc.) Beta <- as.dist(beta) ANOSIM <- anosim(beta, grouping = AllSamples$Gender) #Load ANOSIM statistics and significance ANOSIM$statistic ANOSIM$signif 4.2) inter_vs_intra_comparison.r #This script rearranges beta-diversity distance matrix files to compare intra/inter-group community dissimilarities library(reshape2) #Load weighted Jaccard distances Beta <- as.matrix(read.dist("binary_jaccard_rarefied_otu_table_clean.t xt")) 11

12 #Melt Beta <- melt(beta, value.name = "Distance") names(beta)[1:2] <- c("sample1", "Sample2") #Add household and type Samples <- read.tidy("metadata.txt")[c("sample", "Location", "Individual","Area","Anatomy","Age_Group","Gender")] Beta <- merge(beta, Samples, by.x = "Sample1", by.y = "Sample", all.x = TRUE) names(beta)[4:5] <- c("location1", "Individual1") Beta <- merge(beta, Samples, by.x = "Sample2", by.y = "Sample", all.x = TRUE) names(beta)[6:7] <- c("area1", "Anatomy1") names(beta)[8:9] <- c("age_group1","gender1") names(beta)[10:11] <-c("location2","individual2") names(beta)[12:13] <-c("area2","anatomy2") names(beta)[14:15] <- c("age_group2","gender2") #Remove self-self samples Beta <- Beta[which(! Beta$Sample1 == Beta$Sample2), ] Beta$HouseholdType <- ifelse(beta$location1 == Beta$Location2, "Within households", "Between households") Beta$IndividualType <- ifelse(beta$individual1 == Beta$Individual2, "Within Individuals", "Between Individuals") Beta$SiteType <- ifelse(beta$area1 == Beta$Area2, "Same Site","Different Site") Beta$AnatomyType <- ifelse(beta$anatomy1 == Beta$Anatomy2, "Same Anatomical Site","Different Anatomical Site") Beta$AgeType <- ifelse(beta$age_group1 == Beta$Age_Group2, "Same Age Group","Different Age Group") Beta$GenderType <- ifelse(beta$gender1 == Beta$Gender2, "Same Gender","Different Gender") write.tidy(beta, "Beta_Comparison_Jaccard.txt") The resulting Beta_Comparison_Jaccard.txt file contains Jaccard distances between pairwise samples, and columns indicating whether the comparison is between the same group in question. The average values for each group can be determined from Microsoft Excel, and statistical significance in mean beta dissimilarities between comparison groups can be determined from wilcox.test and kruskal.test commands. Using Beta_Comparison_Jaccard.txt and the corresponding Bray-Curtis file, density plots can be constructed. Below shows the example of density plot as shown in Fig. 1b. 4.3) draw_density_plot.r Beta <- read.tidy("beta_comparison_jaccard.txt") library(plyr) mu <- ddply(beta, "IndividualType", summarise, grp.mean=mean(distancef)) Beta$IndividualType <- factor(beta$individualtype, levels = c("within Individuals","Between Cohabitants","Between Households")) 12

13 Plot <- ggplot(beta, aes(x = Distance, colour = IndividualType)) Plot <- Plot + stat_density(aes(group=individualtype, color=individualtype),position="identity",geom="line",size=2)+ scale_y_continuous(expand=c(0,0),limits = c(0,10)) Plot <- Plot + geom_segment(aes(x=0.744,y=0,xend=0.744,yend=4.5),size=1,linetype="d otdash",color="#e41a1c") Plot <- Plot + geom_segment(aes(x=0.792,y=0,xend=0.792,yend=7.0),size=1,linetype="d otdash",color="#377eb8") Plot <- Plot + geom_segment(aes(x=0.822,y=0,xend=0.822,yend=7.0),size=1,linetype="d otdash",color="#4daf4a") Plot <- Plot + theme_classic() ylab <- paste0("density (%)") xlab <- paste0("normalized Binary Jaccard Distance") Plot <- Plot + ylab(ylab) + xlab(xlab) Plot <- Plot + theme(axis.title = element_text(size = 20)) Plot <- Plot + theme(legend.title = element_blank()) Plot <- Plot + theme(legend.text = element_text(size = 14, face = "bold")) Plot <- Plot + theme(axis.text = element_text(size = 16, face = "bold")) Plot <- Plot + theme(legend.position = "bottom") Plot <- Plot + scale_colour_brewer(palette="set1") Plot <- Plot + guides(fill = guide_legend(override.aes = list(colour = NULL))) Plot <- Plot + geom_text(aes(0.70,6, label="mean = 0.744",angle=315),color="#e41a1c",size=6,face="bold") Plot <- Plot + geom_text(aes(0.75,8.5, label="mean = 0.792",angle=315),color="#377eb8", size=6,face="bold") Plot <- Plot + geom_text(aes(0.80,8.5, label="mean = 0.822",angle=315),color="#4daf4a", size=6,face="bold") ggsave("density_individual_jaccard.png",width=8,height=7) To sub-select data based on particular factor (e.g. divide density plot data by skin site), create separate txt files containing only data from particular factor (e.g. Additional Files 4 and 5). 4.4) subdivide_beta_data.r Beta <- read.tidy("beta_comparison_jaccard.txt") SameSite <- Beta[which(Beta$SiteType == "Same Site"), ] Forehead <- SameSite[which(SameSite$Site1 == "Forehead"), ] LeftForearm <- SameSite[which(SameSite$Site1 == "Left Forearm"), ] RightForearm <- SameSite[which(SameSite$Site1 == "Right Forearm"), ] LeftPalm <- SameSite[which(SameSite$Site1 == "Left Palm"), ] RightPalm <- SameSite[which(SameSite$Site1 == "Right Palm"), ] Write.tidy(Forehead, "Jaccard_Forehead.txt") Write.tidy(LeftForearm, "Jaccard_LF.txt") Write.tidy(RightForearm, "Jaccard_RF.txt") Write.tidy(LeftPalm, "Jaccard_LP.txt") Write.tidy(RightPalm, "Jaccard_RP.txt") Each resulting output file can be plugged into 4.3) to generate data and plots shown in Additional Files 2, 4, and 5). 13

14 5) In-house scripts for cross-kingdom alpha/beta-diversity comparisons Cross-domain alpha analysis was computed by first merging bacterial alpha diversity data from previous publication 1 with fungal data from this work. Plot as shown in Fig. 2a provided by code below. Spearman correlation values as shown in Additional File 1, tab Cross-Domain Alpha Correlation computed as shown below. 5.1) cross_domain_alpha.r library(ggplot2) Fungus <- read.tidy("alpha_average_1175.txt") Bacteria <- read.tidy("alpha_average_bacteria.txt") #Merge fungal and bacterial data, and include only samples with alpha data for both domains (some samples were removed from both studies following insufficient reads for normalization Table <- merge(fungus,bacteria,by="sample",all.x = FALSE) #Add metadata Meta <- read.tidy("metadata.txt") Table <- merge(table,meta,by="sample",all.x = FALSE) write.tidy(table,"fun_bac_alpha.txt") #Plot Plot <- ggplot(table, aes(x = Observed_f, y = Observed_b, colour = Anatomy)) Plot <- Plot + geom_point(size=4) Plot <- Plot + theme_classic() Plot <- Plot + xlab(paste0("observed Number of Fungal OTUs")) + ylab(paste0("observed Number of Bacterial OTUs")) Palette <- c("#9b30ff","#20b2aa","#ff4500") Plot <- Plot + scale_colour_manual(values=palette) #Calculate slope, intercept, and draw linear regression line coef(lm(observed_b~observed_f, data=table)) (Intercept) Observed_f Plot <- Plot + geom_abline(intercept = 963, slope = 7.5, colour = "blue", size = 1) Plot <- Plot + theme(legend.title = element_blank()) Plot <- Plot + theme(axis.title=element_text(size=18, face="bold")) Plot <- Plot + theme(axis.text=element_text(size=14, face="bold")) Plot <- Plot + theme(legend.text = element_text(size=14, face = "bold")) ggsave("fun_bac_correlation.png") #Test for Spearman correlation 1 Supplementary Data 5 file from Leung MHY, Wilkins D, and Lee PKH. Sci Rep. 2015;5:

15 cor.test(~observed_b+observed_f, Table, method = "spearman") Spearman's rank correlation rho data: Observed_b and Observed_f S = , p-value = 2.724e-07 alternative hypothesis: true rho is not equal to 0 sample estimates: rho Correlation values and significance for other comparisons (e.g. Chao1, Shannon, etc.) shown in Additional File 1 computed in the same way, substituting cor.test command by corresponding alpha indices: cor.test(~chao1_b+shannon_f, Table, method = "spearman") Spearman's rank correlation rho data: Chao_b and Simpson_f S = , p-value = alternative hypothesis: true rho is not equal to 0 sample estimates: rho Similarly, for cross-kingdom beta analysis, bacterial pairwise UniFrac data from previous study (raw data from was combined to perform correlation analysis. The files for respective bacterial and fungal beta-diversity dissimilarities for each sample pair are merged into one combined file: 5.2) cross_domain_beta.r #Open fungal Bray-Curtis sample pairwise dissimilarity file betafung <- read.tidy("beta_comparison_bray.txt") #Add column Comparison. This will be used as the common column for merging with bacterial beta-dissimilarity pairwise file betafung$comparison <- paste0(betafung$sample1, " vs. ", betafung$sample2) write.tidy(betafung, "Beta_Comparison_Fungus.txt") #Open bacterial UniFrac sample pairwise dissimilarity file betabac <- read.tidy("beta_comparison_bacteria.txt") #Add column Comparison. This will be used as the common column for merging with fungal beta-distance pairwise file betabac$comparison <- paste0(betabac$sample1, " vs. ", betabac$sample2) write.tidy(betabac, "Beta_Comparison_Bacteria.txt") In Microsoft Excel, only include columns Distance_b and Comparison, and save as Beta_Comparison_Bacteria_Simple.txt ) Return to R, and merge Beta_Comparison_Fungus.txt and Beta_Comparison_Bacteria_Simple.txt. 15

16 #Open fungal Bray-Curtis sample pairwise dissimilarity file betafung <- read.tidy("beta_comparison_bray.txt") #Open bacterial UniFrac sample pairwise dissimilarity file betabac <- read.tidy("beta_comparison_bacteria_simple.txt") Merge <- merge(betafung, betabac, by = Comparison, all.x = TRUE Write.tidy(Merge, "Bacteria_Fungus_BC_Merged.txt") The file Bacteria_Fungus_BC_Merged.txt is Additional File 1, tab Cross-Domain Beta Data. To calculate cross-domain beta-diversity Spearman correlation for Within Individuals, Within Household, and Between Household, Cross domain beta data needed to be divided by these three groups. In all cases, Bacteria_Fungus_BC_Merged.txt was used as input: 5.3) cross_domain_beta_subselect.r Table <- read.tidy("bacteria_fungus_bc_merged.txt") #Within Individual Individual <- Table[which(Table$IndividualType == "Within Individuals"), ] #Within Household Household <- Table[which(Table$IndividualType == "Between Individuals Within Households"), ] #Between household Different <- Table[which(Table$IndividualType == "Between Households"), ] #Save each file write.tidy(individual,"spearman_individual.txt") write.tidy(household, "spearman_household.txt") write.tidy(different,"spearman_different.txt") For each output file, remove all columns except the two containing actual beta values (one for bacteria, one for fungal). Rename the two columns as col1 and col2, and save with same file names adding _simple.txt. (e.g. spearman_different_simple.txt ). Return to R, and calculate Spearman correlation for each comparison group: 5.4) cross_domain_spearman.r Ind <- read.tidy(spearman_individual_simple.txt") cor.test(ind$col1,ind$col2,method="spearman") 16

17 Spearman's rank correlation rho data: Ind$col1 and Ind$col2 S = , p-value = 1.172e-12 alternative hypothesis: true rho is not equal to 0 sample estimates: rho House <- read.tidy(spearman_household_simple.txt") cor.test(house$col1,house$col2,method="spearman") Spearman's rank correlation rho data: House$beta_f and House$beta_b S = , p-value = 8.074e-11 alternative hypothesis: true rho is not equal to 0 sample estimates: rho Diff <- read.tidy("spearman_different_simple.txt") cor.test(diff$col1,diff$col2,method="spearman") > cor.test(diff$col1,diff$col2,method="spearman") Spearman's rank correlation rho data: Diff$col1 and Diff$col2 S = e+12, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho For other group comparisons and their correlations shown in Additional File 1 tab Cross-Domain Beta Correlation, sub-select comparison group in question and repeat script 5.3). To determine linear regression slope and intercept, and plot Fig. 2b-d (example below shown for Fig. 2d): 5.5) plot_beta_regression.r library(ggplot2) Diff <- read.tidy("spearman_different_simple.txt") coef(lm(col2~col1, data=diff)) (Intercept) col Different <- read.tidy("spearman_different.txt") Plot <- ggplot(different, aes(x = beta_f, y = beta_b, colour = IndividualType)) Plot <- Plot + geom_point(size=1) Palette <- c("orange") Plot <- Plot + scale_colour_manual(values=palette) Plot <- Plot + theme_classic() 17

18 xlab <- paste0("fungal Bray-Curtis Dissimilarity Between Samples") ylab <- paste0("bacterial UniFrac Distance Between Samples") Plot <- Plot + xlab(xlab) + ylab(ylab) Plot <- Plot + theme(legend.title = element_blank()) Plot <- Plot + theme(axis.title=element_text(size=18, face="bold")) Plot <- Plot + theme(axis.text=element_text(size=14, face="bold")) Plot <- Plot + theme(legend.text = element_text(size=14, face = "bold")) Plot <- Plot + geom_abline(intercept = 0.236, slope = , colour = "purple", size = 1) Plot <- Plot + theme(legend.position = "bottom") ggsave("spearman_bacteria_fungus_different.png") 18

19 6) In-house script for taxonomic analysis R-script takes in clean OTU table from 2.6, and creates txt file indicating top taxa of a particular taxonomic rank. This output can be used as input to construct visual plot in ggplot. Also required Metadata txt file, containing sample information, created on Microsoft Excel during sample collection. 6.1) R < make_tax_plot.r library(wilkoxmisc) library(reshape2) library(ggplot2) #Open taxonomy OTU table OTU <- read.tidy("otu_table_clean.tidy.txt") Meta <- read.tidy("metadata.txt") #Tabulate read counts by genus OTU <- ddply(otu,.(sample, Genus), summarise, count = sum(count)) #Convert count to relativeabundance and add column OTU <- ddply(otu,.(sample), mutate, RelativeAbundance = (count * 100) / sum(count)) #collapse taxa table to only 5 or 8 top phyla, genus, family, etc (require reshape2). OTUTable <- collapse.taxon.table(otu, n = 10, Rank = "Genus") #Merge relative abundance table and metatable together OTUTable <- merge(otutable, Meta, by = "Sample", all.x = TRUE) write.tidy(otutable, "Top10Genus.txt") #Plot OTUTable <- read.tidy("top10genus.txt") OTUTable$Genus <- factor(otutable$genus, levels = c("aspergillus","candida","cryptococcus","malassezia","penicil lium","sporobolomyces","unclassified Basidiomycota Genus","Unclassified Saccharomycetales Genus","Unclassified Sporidiobolales Genus","Minor/Unclassified")) Plot <- ggplot(otutable, aes(x = Sample, y = RelativeAbundance, fill = Genus, order= -as.numeric(genus))) Plot <- Plot + geom_bar(aes(x=as.factor(sample)),stat="identity", width=1.5) + scale_y_continuous(expand = c(0,0)) Plot <- Plot + facet_grid(anatomy~location, scales = "free_x") Plot <- Plot + xlab(paste0("sample by Household and Anatomy")) Plot <- Plot + theme_classic() Plot <- Plot + scale_fill_brewer(palette = "Set3") Plot <- Plot + theme(axis.text.y = element_text(size = 8)) Plot <- Plot + theme(axis.text.x = element_blank()) Plot <- Plot + theme(axis.ticks.x = element_blank()) Plot <- Plot + theme(panel.margin = unit(0.5, "lines")) Plot <- Plot + theme(axis.title = element_text(size=16)) Plot <- Plot + theme(legend.title = element_text(size=12)) Plot <- Plot + theme(legend.text = element_text(size=10)) ggsave("taxonomy_by_anatomy_location.png") Plot from this script was as shown in Fig

20 7) In-house scripts for Malassezia species-level taxonomic analysis OTUs that were classified as Malassezia according to curated database were identified using Microsoft Excel s pivot table function, where OTUs were selected as rows and datatable filtered to show only OTUs where Genus = Malassezia. This list of OTUs were saved onto a txt file (Malassezia_OTU.txt). Reads clustered into these OTUs will need to be selected from other non-malassezia reads from the OTU fasta file. This can be performed using an in-house perl script. In order for the script to work properly, the following input files are required: - list of OTUs to be selected out (Malassezia OTUs) - original OTU fasta file (containing all OTUs, output of usearch cluster_otus command) The output will be a fasta file containing only OTUs classified as Malassezia. Please note that this script was used either when only analyzing Hong Kong data, or in conjunction with data from United States during the multi-study analysis (Additional Files 7 and 8). 7.1) select_otus.pl #!/usr/bin/perl use Modern::Perl 2014; use autodie; use Getopt::Long; use File::Slurp qw(read_file); $ ++; my $USAGE = q/usage: perl select_otus_and_samples.pl -o <list of OTUs> -f <reads fasta file> -u <output fasta file> /; my $OTUFile; my $fastafile; my $outputfile; GetOptions ( 'o=s' => \$OTUFile, 'f=s' => \$fastafile, 'u=s' => \$outputfile, ) or die $USAGE; die $USAGE unless $OTUFile && $fastafile && $outputfile; say "List of OTUs: $OTUFile"; say "Input fasta: $fastafile"; say "Output fasta: $outputfile"; #Read list of wanted OTUs say "Reading list of wanted OTUs..."; 20

21 my %OTUs = map { $_ => 1 read_file($otufile, chomp => 1); say scalar keys %OTUs, " OTUs wanted"; #Extract wanted reads say "Extracting wanted reads from fasta file..."; open FASTA, '<', $fastafile; open OUT, '>', $outputfile; my $read; while (<FASTA>) { chomp; print "$. lines processed\r" unless $. % 1000; if (/^>(.+)/) { $read = $1; next unless exists $reads{$read; say OUT; say "$. lines processed"; close FASTA; close OUT; say "Wanted reads written to $outputfile"; The output of this script was then used to perform taxonomic classification using USEARCH as described in main text Methods section. OTU table containing read counts for each sample and each Malassezia OTU is merged with usearch output file with species-level classification for each Malassezia OTU. The example script below contains inputs from sequences of Hong Kong and United States studies. 7.2) taxonomy_species.r library(reshape2) library(ggplot2) #Open OTU table with only OTUs classified as Malassezia Table <- read.tidy("findley_otu_table_w_malassezia_only.txt") #Open Table with usearch species-level information for each OTU OTU <- read.tidy("usearch_results_findley_malassezia97_blast.txt") OTU <- merge(table,otu,by = "OTU", all.x = TRUE) write.tidy(otu,"findley_otu_w_malassezia_species.txt") head(otu) OTU Sample Study Count Kingdom Phylum 1 SRR SRR Bethesda 8 Fungi Basidiomycota 2 SRR SRR Bethesda 105 Fungi Basidiomycota 3 SRR SRR Bethesda 53 Fungi Basidiomycota 4 SRR SRR Bethesda 52 Fungi Basidiomycota 21

22 5 SRR SRR Bethesda 106 Fungi Basidiomycota 6 SRR SRR Bethesda 23 Fungi Basidiomycota Class Order 1 Basidiomycota_class_incertae_sedis Malasseziales 2 Basidiomycota_class_incertae_sedis Malasseziales 3 Basidiomycota_class_incertae_sedis Malasseziales 4 Basidiomycota_class_incertae_sedis Malasseziales 5 Basidiomycota_class_incertae_sedis Malasseziales 6 Basidiomycota_class_incertae_sedis Malasseziales Family Genus BLASTSpecies 1 Malasseziales_family_incertae_sedis Malassezia M. globosa 2 Malasseziales_family_incertae_sedis Malassezia M. globosa 3 Malasseziales_family_incertae_sedis Malassezia M. globosa 4 Malasseziales_family_incertae_sedis Malassezia M. globosa 5 Malasseziales_family_incertae_sedis Malassezia M. globosa 6 Malasseziales_family_incertae_sedis Malassezia M. globosa #Tabulate read counts by species OTU <- ddply(otu,.(sample, Species), summarise, Count = sum(count)) #Convert count to relativeabundance and add column OTU <- ddply(otu,.(sample), mutate, RelativeAbundance = (Count * 100) / sum(count)) #collapse taxa table to only 5 or 8 top phyla, genus, family, etc (require reshape2). OTUTable <- collapse.taxon.table(otu, n = 7, Rank = "Species") #Open metatable Meta <- read.tidy( Metadata.txt ) #Merge relative abundance table and metatable together OTUTable <- merge(otutable, Meta, by = "Sample", all.x = TRUE) write.tidy(otutable, "TopMalasseziaSpecies.txt") The resulting output contains relative abundance values of each sample, and also metadata containing sample information. Data was subsequently reorganized for Additional Files 7 and 8. 22

23 8) In-house script for multi-study comparison plot generation Following OTU table construction for each of the Hong Kong and United States studies, taxonomic information at genus, family, order, and class levels were used to generate pan-microbiome plot (Fig. 5). The number of different taxa at these ranks was manually tabulated, and a txt file panmicrobiome.txt was constructed in Microsoft Excel. This txt file is subsequently used as input to generate Fig ) draw_pan_plot.r library(ggplot2) Table <- read.tidy("panmicrobiome.txt") head(table) Comparison Type Number 1 Hong Kong only Genus Hong Kong + Bethesda Genus Hong Kong + Bethesda + Berkeley Genus Hong Kong only Family 67 5 Hong Kong + Bethesda Family 74 6 Hong Kong + Bethesda + Berkeley Family 78 Table$Type <- factor(table$type, levels = c("genus","family","order","class")) Table$Comparison <- factor(table$comparison, levels = c("hong Kong only","hong Kong + Bethesda", "Hong Kong + Bethesda + Berkeley")) Plot <- ggplot(table,aes(x=comparison,y=number,fill=type)) Plot <- Plot + geom_bar(stat="identity",position="dodge") Plot <- Plot + theme_classic() Plot <- Plot + theme(axis.text.x = element_text(angle=45, hjust=1, size=14,face="bold")) xlab <- paste0("study Included") ylab <- paste0("total Number of Taxa") Plot <- Plot + xlab(xlab) + ylab(ylab) Plot <- Plot + theme(axis.text.y = element_text(size=14,face="bold")) Plot <- Plot + theme(axis.title = element_text(size=18, face="bold")) Plot <- Plot + theme(legend.text = element_text(size=14,face="bold")) Plot <- Plot + theme(legend.title = element_blank()) ggsave("panmicrobiome.png") 23

1 Abstract. 2 Introduction. 3 Requirements. 4 Procedure. Qiime Community Profiling University of Colorado at Boulder

1 Abstract. 2 Introduction. 3 Requirements. 4 Procedure. Qiime Community Profiling University of Colorado at Boulder 1 Abstract 2 Introduction This SOP describes QIIME (Quantitative Insights Into Microbial Ecology) for community profiling using the Human Microbiome Project 16S data. The process takes users from their

More information

QIIME and the art of fungal community analysis. Greg Caporaso

QIIME and the art of fungal community analysis. Greg Caporaso QIIME and the art of fungal community analysis Greg Caporaso Sequencing output (454, Illumina, Sanger) fastq, fasta, qual, or sff/trace files Metadata mapping file www.qiime.org Pre-processing e.g., remove

More information

Getting started: Analysis of Microbial Communities

Getting started: Analysis of Microbial Communities Getting started: Analysis of Microbial Communities June 12, 2015 CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com support@clcbio.com

More information

CLC Microbial Genomics Module USER MANUAL

CLC Microbial Genomics Module USER MANUAL CLC Microbial Genomics Module USER MANUAL User manual for CLC Microbial Genomics Module 1.1 Windows, Mac OS X and Linux October 12, 2015 This software is for research purposes only. CLC bio, a QIAGEN Company

More information

Ordination (Guerrero Negro)

Ordination (Guerrero Negro) Ordination (Guerrero Negro) Back to Table of Contents All of the code in this page is meant to be run in R unless otherwise specified. Load data and calculate distance metrics. For more explanations of

More information

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113 Index A Add-on packages information page, 186 187 Linux users, 191 Mac users, 189 mirror sites, 185 Windows users, 187 aggregate function, 62 Analysis of variance (ANOVA), 152 anova function, 152 as.data.frame

More information

Tutorial. OTU Clustering Step by Step. Sample to Insight. June 28, 2018

Tutorial. OTU Clustering Step by Step. Sample to Insight. June 28, 2018 OTU Clustering Step by Step June 28, 2018 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com ts-bioinformatics@qiagen.com

More information

Tutorial. OTU Clustering Step by Step. Sample to Insight. March 2, 2017

Tutorial. OTU Clustering Step by Step. Sample to Insight. March 2, 2017 OTU Clustering Step by Step March 2, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

OTU Clustering Using Workflows

OTU Clustering Using Workflows OTU Clustering Using Workflows June 28, 2018 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com ts-bioinformatics@qiagen.com

More information

USEARCH Suite and UPARSE Pipeline. Susan Huse Brown University August 7, 2015

USEARCH Suite and UPARSE Pipeline. Susan Huse Brown University August 7, 2015 USEARCH Suite and UPARSE Pipeline Susan Huse Brown University August 7, 2015 USEARCH Robert Edgar USEARCH and UCLUST Edgar (201) Bioinforma)cs 26(19) UCHIME Edgar et al. (2011) Bioinforma)cs 27(16) UPARSE

More information

Genboree Microbiome Toolset - Tutorial. Create Sample Meta Data. Previous Tutorials. September_2011_GMT-Tutorial_Single-Samples

Genboree Microbiome Toolset - Tutorial. Create Sample Meta Data. Previous Tutorials. September_2011_GMT-Tutorial_Single-Samples Genboree Microbiome Toolset - Tutorial Previous Tutorials September_2011_GMT-Tutorial_Single-Samples We will be going through a tutorial on the Genboree Microbiome Toolset with publicly available data:

More information

amplicon_sequencing_pipeline_doc Documentation

amplicon_sequencing_pipeline_doc Documentation amplicon_sequencing_pipeline_doc Documentation Release Thomas Gurry and Claire Duvallet Dec 27, 2017 Contents: 1 Quickstart 3 1.1 Prepare your data............................................. 3 1.2 Run

More information

Taxonomic classification of SSU rrna community sequence data using CREST

Taxonomic classification of SSU rrna community sequence data using CREST Taxonomic classification of SSU rrna community sequence data using CREST 2014 Workshop on Genomics, Cesky Krumlov Anders Lanzén Overview 1. Familiarise yourself with CREST installation...2 2. Download

More information

Community analysis of 16S rrna amplicon sequencing data with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

Community analysis of 16S rrna amplicon sequencing data with Chipster. Eija Korpelainen CSC IT Center for Science, Finland Community analysis of 16S rrna amplicon sequencing data with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? How to operate the Chipster software Community

More information

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core) Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data

More information

James Robert White, Cesar Arze, Malcolm Matalka, the CloVR team, Owen White, Samuel V. Angiuoli & W. Florian Fricke

James Robert White, Cesar Arze, Malcolm Matalka, the CloVR team, Owen White, Samuel V. Angiuoli & W. Florian Fricke CloVR-16S: Phylogenetic microbial community composition analysis based on 16S ribosomal RNA amplicon sequencing standard operating procedure, version 1.1 James Robert White, Cesar Arze, Malcolm Matalka,

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

for statistical analyses

for statistical analyses Using for statistical analyses Robert Bauer Warnemünde, 05/16/2012 Day 6 - Agenda: non-parametric alternatives to t-test and ANOVA (incl. post hoc tests) Wilcoxon Rank Sum/Mann-Whitney U-Test Kruskal-Wallis

More information

MetaCom Sample Data and Tutorial

MetaCom Sample Data and Tutorial Table of Contents I. Recommended Additional Programs A. Notepad ++ B. SQLite Manager Add-on for Firefox II. Sample Input Data Formats A. FNA output (from 454 Pyrosequencing) B. FASTA file C. TXT file with

More information

Projection with Public Data (PPD)

Projection with Public Data (PPD) Projection with Public Data (PPD) Goal To compare users 16S rrna data with published datasets by processing and normalization them together, and projecting into 3D PCoA plot for visual comparative analysis

More information

MEGAN5 tutorial, September 2014, Daniel Huson

MEGAN5 tutorial, September 2014, Daniel Huson MEGAN5 tutorial, September 2014, Daniel Huson This tutorial covers the use of the latest version of MEGAN5. Here is an outline of the steps that we will cover. Note that the computationally most timeconsuming

More information

dbcamplicons pipeline Bioinformatics

dbcamplicons pipeline Bioinformatics dbcamplicons pipeline Bioinformatics Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu Workshop dataset: Slashpile

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

Notes for week 3. Ben Bolker September 26, Linear models: review

Notes for week 3. Ben Bolker September 26, Linear models: review Notes for week 3 Ben Bolker September 26, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information

MetAmp: a tool for Meta-Amplicon analysis User Manual

MetAmp: a tool for Meta-Amplicon analysis User Manual November 12, 2014 MetAmp: a tool for Meta-Amplicon analysis User Manual Ilya Y. Zhbannikov 1, Janet E. Williams 1, James A. Foster 1,2,3 3 Institute for Bioinformatics and Evolutionary Studies, University

More information

Data analysis of 16S rrna amplicons Computational Metagenomics Workshop University of Mauritius

Data analysis of 16S rrna amplicons Computational Metagenomics Workshop University of Mauritius Data analysis of 16S rrna amplicons Computational Metagenomics Workshop University of Mauritius Practical December 2014 Exercise options 1) We will be going through a 16S pipeline using QIIME and 454 data

More information

FreeJSTAT for Windows. Manual

FreeJSTAT for Windows. Manual FreeJSTAT for Windows Manual (c) Copyright Masato Sato, 1998-2018 1 Table of Contents 1. Introduction 3 2. Functions List 6 3. Data Input / Output 7 4. Summary Statistics 8 5. t-test 9 6. ANOVA 10 7. Contingency

More information

Intro to R for Epidemiologists

Intro to R for Epidemiologists Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance

More information

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 Who s ahead in the polls? 2/86 What values are displayed in this chart? 3/86

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

User Manual for MEGAN V6.10.6

User Manual for MEGAN V6.10.6 User Manual for MEGAN V6.10.6 Daniel H. Huson December 20, 2017 Contents Contents 1 1 Introduction 3 2 Getting Started 5 3 Obtaining and Installing the Program 5 4 Program Overview 6 5 Importing, Reading

More information

Package StructFDR. April 13, 2017

Package StructFDR. April 13, 2017 Type Package Package StructFDR April 13, 2017 Title False Discovery Control Procedure Integrating the Prior Structure Information Version 1.2 Date 2017-04-12 Author Jun Chen Maintainer Jun Chen

More information

AcaStat User Manual. Version 10 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

AcaStat User Manual. Version 10 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved. AcaStat User Manual Version 10 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents NEW IN VERSION 10... 6 INTRODUCTION... 7 GETTING HELP...

More information

Fast, Easy, and Publication-Quality Ecological Analyses with PC-ORD

Fast, Easy, and Publication-Quality Ecological Analyses with PC-ORD Emerging Technologies Fast, Easy, and Publication-Quality Ecological Analyses with PC-ORD JeriLynn E. Peck School of Forest Resources, Pennsylvania State University, University Park, Pennsylvania 16802

More information

mealybugs Documentation

mealybugs Documentation mealybugs Documentation Release 1.0 Thierry Gosselin June 09, 2014 Contents 1 Computer hardware requirements 3 2 Getting prepared with files 5 3 Start Mothur 7 4 Reducing sequencing and PCR errors 9 5

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

Cluster Randomization Create Cluster Means Dataset

Cluster Randomization Create Cluster Means Dataset Chapter 270 Cluster Randomization Create Cluster Means Dataset Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. Examples of such clusters

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3A Visualizing Data By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to visualize data. If you intend to

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

Stats fest Multivariate analysis. Multivariate analyses. Aims. Multivariate analyses. Objects. Variables

Stats fest Multivariate analysis. Multivariate analyses. Aims. Multivariate analyses. Objects. Variables Stats fest 7 Multivariate analysis murray.logan@sci.monash.edu.au Multivariate analyses ims Data reduction Reduce large numbers of variables into a smaller number that adequately summarize the patterns

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS TO SAS NEED FOR SAS WHO USES SAS WHAT IS SAS? OVERVIEW OF BASE SAS SOFTWARE DATA MANAGEMENT FACILITY STRUCTURE OF SAS DATASET SAS PROGRAM PROGRAMMING LANGUAGE ELEMENTS OF THE SAS LANGUAGE RULES FOR SAS

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Package BDMMAcorrect

Package BDMMAcorrect Type Package Package BDMMAcorrect March 6, 2019 Title Meta-analysis for the metagenomic read counts data from different cohorts Version 1.0.1 Author ZHENWEI DAI Maintainer ZHENWEI

More information

CloVR-ITS: Automated ITS amplicon sequence analysis pipeline for the characterization of fungal communities standard operating procedure, version 1.

CloVR-ITS: Automated ITS amplicon sequence analysis pipeline for the characterization of fungal communities standard operating procedure, version 1. CloVR-ITS: Automated ITS amplicon sequence analysis pipeline for the characterization of fungal communities standard operating procedure, version 1.0 James Robert White, the CloVR team, Owen White, Samuel

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Seed. sequence editor

Seed. sequence editor Seed sequence editor Software and documentation Tomáš Větrovský vetrovsky@biomed.cas.cz Version 1.1.33 November30, 2012 Table of contents General information 3 Introduction 3 Program structure 3 Instalation

More information

Kraken: ultrafast metagenomic sequence classification using exact alignments

Kraken: ultrafast metagenomic sequence classification using exact alignments Kraken: ultrafast metagenomic sequence classification using exact alignments Derrick E. Wood and Steven L. Salzberg Bioinformatics journal club October 8, 2014 Märt Roosaare Need for speed Metagenomic

More information

What do I do if my blast searches seem to have all the top hits from the same genus or species?

What do I do if my blast searches seem to have all the top hits from the same genus or species? What do I do if my blast searches seem to have all the top hits from the same genus or species? If the bacterial species you are using to annotate is clinically significant or of great research interest,

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3B Building Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to engineer or construct

More information

(Refer Slide Time: 01:12)

(Refer Slide Time: 01:12) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #22 PERL Part II We continue with our discussion on the Perl

More information

R commander an introduction

R commander an introduction R commander an introduction free, user-friendly, and powerful software Ho Kim SCHOOL OF PUBLIC HEALTH, SNU Useful sites R is a free software with powerful tools The Comprehensive R Archives Network http://cran.r-project.org/

More information

Introduction Accessing MICS Compiler Learning MICS Compiler CHAPTER 1: Searching for Data Surveys Indicators...

Introduction Accessing MICS Compiler Learning MICS Compiler CHAPTER 1: Searching for Data Surveys Indicators... Acknowledgement MICS Compiler is a web application that has been developed by UNICEF to provide access to Multiple Indicator Cluster Survey data. The system is built on DevInfo technology. 3 Contents Introduction...

More information

Multivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)

Multivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2) Multivariate analyses in ecology Cluster (part 2) Ordination (part 1 & 2) 1 Exercise 9B - solut 2 Exercise 9B - solut 3 Exercise 9B - solut 4 Exercise 9B - solut 5 Multivariate analyses in ecology Cluster

More information

Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution

Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution 1. Introduction Sabiha Barlaskar, Dragutin Petkovic SFSU CS Department

More information

Daniel H. Huson. August 3, Contents 1. 1 Introduction 3. 2 Getting Started 5. 4 Licensing 6. 5 Program Overview 7. 7 The NCBI Taxonomy 9

Daniel H. Huson. August 3, Contents 1. 1 Introduction 3. 2 Getting Started 5. 4 Licensing 6. 5 Program Overview 7. 7 The NCBI Taxonomy 9 User Manual for MEGAN V5.5.3 Daniel H. Huson August 3, 2014 Contents Contents 1 1 Introduction 3 2 Getting Started 5 3 Obtaining and Installing the Program 5 4 Licensing 6 5 Program Overview 7 6 Importing,

More information

Daniel H. Huson. September 11, Contents 1. 1 Introduction 3. 2 Getting Started 5. 4 Program Overview 6. 6 The NCBI Taxonomy 9.

Daniel H. Huson. September 11, Contents 1. 1 Introduction 3. 2 Getting Started 5. 4 Program Overview 6. 6 The NCBI Taxonomy 9. User Manual for MEGAN V4.70.4 Daniel H. Huson September 11, 2012 Contents Contents 1 1 Introduction 3 2 Getting Started 5 3 Obtaining and Installing the Program 5 4 Program Overview 6 5 Importing, Reading

More information

Nature Methods: doi: /nmeth Supplementary Figure 1

Nature Methods: doi: /nmeth Supplementary Figure 1 Supplementary Figure 1 Schematic representation of the Workflow window in Perseus All data matrices uploaded in the running session of Perseus and all processing steps are displayed in the order of execution.

More information

Excel. Dashboard Creation. Microsoft # KIRSCHNER ROAD KELOWNA, BC V1Y4N TOLL FREE:

Excel. Dashboard Creation. Microsoft # KIRSCHNER ROAD KELOWNA, BC V1Y4N TOLL FREE: Microsoft Excel Dashboard Creation #280 1855 KIRSCHNER ROAD KELOWNA, BC V1Y4N7 250-861-8324 TOLL FREE: 1-877-954-8433 INFO@POWERCONCEPTS.CA WWW.POWERCONECPTS.CA Dashboard Creation Contents Process Overview...

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

HybridCheck User Manual

HybridCheck User Manual HybridCheck User Manual Ben J. Ward February 2015 HybridCheck is a software package to visualise the recombination signal in assembled next generation sequence data, and it can be used to detect recombination,

More information

Minitab 18 Feature List

Minitab 18 Feature List Minitab 18 Feature List * New or Improved Assistant Measurement systems analysis * Capability analysis Graphical analysis Hypothesis tests Regression DOE Control charts * Graphics Scatterplots, matrix

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Analytics:- Class Room: Training Fee & Duration : 23K & 3 Months Online: Training Fee & Duration : 25K & 3 Months Learning SAS: Getting Started with SAS Basic

More information

Data Science Essentials

Data Science Essentials Data Science Essentials Lab 6 Introduction to Machine Learning Overview In this lab, you will use Azure Machine Learning to train, evaluate, and publish a classification model, a regression model, and

More information

Package ggextra. April 4, 2018

Package ggextra. April 4, 2018 Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',

More information

Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence

Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence Tutorial: Using the SFLD and Cytoscape to Make Hypotheses About Enzyme Function for an Isoprenoid Synthase Superfamily Sequence Requirements: 1. A web browser 2. The cytoscape program (available for download

More information

Creating elegant graphics in R with ggplot2

Creating elegant graphics in R with ggplot2 Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is

More information

# Call plot plot(gg)

# Call plot plot(gg) Most of the requirements related to look and feel can be achieved using the theme() function. It accepts a large number of arguments. Type?theme in the R console and see for yourself. # Setup options(scipen=999)

More information

Package igc. February 10, 2018

Package igc. February 10, 2018 Type Package Package igc February 10, 2018 Title An integrated analysis package of Gene expression and Copy number alteration Version 1.8.0 This package is intended to identify differentially expressed

More information

PERL Scripting - Course Contents

PERL Scripting - Course Contents PERL Scripting - Course Contents Day - 1 Introduction to PERL Comments Reading from Standard Input Writing to Standard Output Scalar Variables Numbers and Strings Use of Single Quotes and Double Quotes

More information

Machine Learning Techniques for Bacteria Classification

Machine Learning Techniques for Bacteria Classification Machine Learning Techniques for Bacteria Classification Massimo La Rosa Riccardo Rizzo Alfonso M. Urso S. Gaglio ICAR-CNR University of Palermo Workshop on Hardware Architectures Beyond 2020: Challenges

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels.

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels. Manage. Analyze. Discover. NEW FEATURES BioNumerics Seven comes with several fundamental improvements and a plethora of new analysis possibilities with a strong focus on user friendliness. Among the most

More information

Interval Estimation. The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use.

Interval Estimation. The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use. Interval Estimation It is a common requirement to efficiently estimate population parameters based on simple random sample data. In the R tutorials of this section, we demonstrate how to compute the estimates.

More information

An Introduction and User Guide for mcagui Last Modified 3/22/2011

An Introduction and User Guide for mcagui Last Modified 3/22/2011 An Introduction and User Guide for mcagui Last Modified 3/22/2011 Package Maintainer: Wade K. Copeland (wade@kingcopeland.com) Authors: Wade K. Copeland Vandhana Krishnan Daniel Beck Matt Settles James

More information

VEGETATION DESCRIPTION AND ANALYSIS

VEGETATION DESCRIPTION AND ANALYSIS VEGETATION DESCRIPTION AND ANALYSIS LABORATORY 5 AND 6 ORDINATIONS USING PC-ORD AND INSTRUCTIONS FOR LAB AND WRITTEN REPORT Introduction LABORATORY 5 (OCT 4, 2017) PC-ORD 1 BRAY & CURTIS ORDINATION AND

More information

Identifying Updated Metadata and Images from a Content Provider

Identifying Updated Metadata and Images from a Content Provider University of Iowa Libraries Staff Publications 4-8-2010 Identifying Updated Metadata and Images from a Content Provider Wendy Robertson University of Iowa 2010 Wendy C Robertson Comments Includes presenter's

More information

The ctest Package. January 3, 2000

The ctest Package. January 3, 2000 R objects documented: The ctest Package January 3, 2000 bartlett.test....................................... 1 binom.test........................................ 2 cor.test.........................................

More information

Individual Covariates

Individual Covariates WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation

More information

Package optimus. March 24, 2017

Package optimus. March 24, 2017 Type Package Package optimus March 24, 2017 Title Model Based Diagnostics for Multivariate Cluster Analysis Version 0.1.0 Date 2017-03-24 Maintainer Mitchell Lyons Description

More information

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in

More information

Microsoft Office Excel 2013 Courses 24 Hours

Microsoft Office Excel 2013 Courses 24 Hours Microsoft Office Excel 2013 Courses 24 Hours COURSE OUTLINES FOUNDATION LEVEL COURSE OUTLINE Getting Started With Excel 2013 Starting Excel 2013 Selecting the Blank Worksheet Template The Excel 2013 Cell

More information

User's guide: Manual for V-Xtractor 2.0

User's guide: Manual for V-Xtractor 2.0 User's guide: Manual for V-Xtractor 2.0 This is a guide to install and use the software utility V-Xtractor. The software is reasonably platform-independent. The instructions below should work fine with

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

Introduction. What is finxl? Why use finxl?

Introduction. What is finxl? Why use finxl? 2 Introduction This document provides a how to guide to assist new users or infrequent users of finxl. It is designed to describe the salient features, the general work flow, and how to create simple yet

More information

Working with Census Data Excel 2013

Working with Census Data Excel 2013 Working with Census Data Excel 2013 Preparing the File If you see a lot of little green triangles next to the numbers, there is an error or warning that Excel is trying to call to your attention. In my

More information

Importing and processing a DGGE gel image

Importing and processing a DGGE gel image BioNumerics Tutorial: Importing and processing a DGGE gel image 1 Aim Comprehensive tools for the processing of electrophoresis fingerprints, both from slab gels and capillary sequencers are incorporated

More information

Introduction to Nesstar

Introduction to Nesstar Introduction to Nesstar Nesstar is a software system for online data analysis. It is available for use with many of the large UK surveys on the UK Data Service website. You will know whether you can use

More information

FANTOM: Functional and Taxonomic Analysis of Metagenomes

FANTOM: Functional and Taxonomic Analysis of Metagenomes FANTOM: Functional and Taxonomic Analysis of Metagenomes User Manual 1- FANTOM Introduction: a. What is FANTOM? FANTOM is an exploratory and comparative analysis tool for Metagenomic samples. b. What is

More information

Data Science Essentials Lab 5 Transforming Data

Data Science Essentials Lab 5 Transforming Data Data Science Essentials Lab 5 Transforming Data Overview In this lab, you will learn how to use tools in Azure Machine Learning along with either Python or R to integrate, clean and transform data. Collectively,

More information

Package PathoStat. February 25, 2018

Package PathoStat. February 25, 2018 Type Package Package PathoStat February 25, 2018 Title PathoStat Statistical Microbiome Analysis Package Version 1.5.1 Date 2017-12-06 Author Solaiappan Manimaran , Matthew

More information

Fact Sheet No.1 MERLIN

Fact Sheet No.1 MERLIN Fact Sheet No.1 MERLIN Fact Sheet No.1: MERLIN Page 1 1 Overview MERLIN is a comprehensive software package for survey data processing. It has been developed for over forty years on a wide variety of systems,

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

MetaPhyler Usage Manual

MetaPhyler Usage Manual MetaPhyler Usage Manual Bo Liu boliu@umiacs.umd.edu March 13, 2012 Contents 1 What is MetaPhyler 1 2 Installation 1 3 Quick Start 2 3.1 Taxonomic profiling for metagenomic sequences.............. 2 3.2

More information

Indian Institute of Technology Kharagpur. PERL Part II. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

Indian Institute of Technology Kharagpur. PERL Part II. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T. Indian Institute of Technology Kharagpur PERL Part II Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T. Kharagpur, INDIA Lecture 22: PERL Part II On completion, the student will be able

More information

Package OTUbase. R topics documented: January 28, Type Package

Package OTUbase. R topics documented: January 28, Type Package Type Package Package OTUbase January 28, 2019 Title Provides structure and functions for the analysis of OTU data Provides a platform for Operational Taxonomic Unit based analysis Version 1.32.0 Date 2010-09-10

More information

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table Q Cheat Sheets What to do when you cannot figure out how to use Q What to do when the data looks wrong Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

HORIZONTAL GENE TRANSFER DETECTION

HORIZONTAL GENE TRANSFER DETECTION HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all

More information