Package cosmiq. April 11, 2018

Similar documents
Package CorrectOverloadedPeaks

LC-MS Data Pre-Processing. Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics University of North Carolina at Charlotte

Package specl. November 17, 2017

Adaptive Processing of LC/MS Metabolomics data

Package HiResTEC. August 7, 2018

Real-time Data Compression for Mass Spectrometry Jose de Corral 2015 GPU Technology Conference

Package msdata. October 2, 2018

Package xcms. April 15, 2017

Package Metab. September 18, 2018

Annotation of LC-MS metabolomics datasets by the metams package

Package IPO. October 9, 2018

SIMAT: GC-SIM-MS Analayis Tool

Advance XCMS Data Processing. H. Paul Benton

Package MassSpecWavelet

Preparing to analyze LC MS data. Synopsis

Thermo Xcalibur Getting Started (Quantitative Analysis)

Tutorial 2: Analysis of DIA/SWATH data in Skyline

Reference Manual MarkerView Software Reference Manual Revision: February, 2010

Retention Time Locking with the MSD Productivity ChemStation. Technical Overview. Introduction. When Should I Lock My Methods?

Package biosigner. March 6, 2019

TraceFinder Analysis Quick Reference Guide

Skyline MS1 Full Scan Filtering

OpenLynx User's Guide

Large and Sparse Mass Spectrometry Data Processing in the GPU Jose de Corral 2012 GPU Technology Conference

Agilent G2727AA LC/MS Data Browser Quick Start

Package envipick. June 6, 2016

TraceFinder Shortcut Menus Quick Reference Guide

MRMPROBS tutorial. Hiroshi Tsugawa RIKEN Center for Sustainable Resource Science

Analysis of GC-MS metabolomics data with metams

Tutorial 7: Automated Peak Picking in Skyline

What s New in Empower 3

BIOINF 4399B Computational Proteomics and Metabolomics

Direct Infusion Mass Spectrometry Processing (DIMaSP) Instructions for use

Package stattarget. December 5, 2017

LC-MS Peak Annotation and Identification

Package mzr. November 6, 2018

Package IsoCorrectoR. R topics documented:

SpectroDive 8 - Coelacanth. User Manual

Statistical Process Control in Proteomics SProCoP

MALDIquant: Quantitative Analysis of Mass Spectrometry Data

Skyline Targeted MS/MS

Labelled quantitative proteomics with MSnbase

MRMPROBS tutorial. Hiroshi Tsugawa RIKEN Center for Sustainable Resource Science MRMPROBS screenshot

Protein Deconvolution Quick Start Guide

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

An R package to process LC/MS metabolomic data: MAIT (Metabolite Automatic Identification Toolkit)

ChromQuest 5.0 Quick Reference Guide

Building Agilent GC/MSD Deconvolution Reporting Libraries for Any Application Technical Overview

Package readmzxmldata

TraceFinder Analysis Quick Reference Guide

Agilent MassHunter Qualitative Data Analysis

Progenesis CoMet User Guide

Fatty Acid Methyl Ester (FAME) RTL Databases for GC and GC/MS

Package SIMLR. December 1, 2017

Xcalibur Library Browser

Project Report on. De novo Peptide Sequencing. Course: Math 574 Gaurav Kulkarni Washington State University

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert

Package ALS. August 3, 2015

Improved Centroid Peak Detection and Mass Accuracy using a Novel, Fast Data Reconstruction Method

PRM Method Development and Data Analysis with Skyline. With

Chromatography Software Training Materials. Contents

Fusion AE LC Method Validation Module. S-Matrix Corporation 1594 Myrtle Avenue Eureka, CA USA Phone: URL:

TraceFinder Administrator Quick Reference Guide

Package Risa. November 28, 2017

High-throughput Processing and Analysis of LC-MS Spectra

New Features of Deconvolution Reporting Software Revision A.02 Technical Note

QuantWiz: A Parallel Software Package for LC-MS-based Label-free Protein Quantification

Q Exactive HF-X Software Manual

OCAP: An R package for analysing itraq data.

To get started download the dataset, unzip files, start Maven, and follow steps below.

Agilent EZChrom Elite. PDA Analysis

Package readbrukerflexdata

Corra v2.0 User s Guide

The TargetSearch Package

Package EventPointer

Agilent 6400 Series Triple Quadrupole LC/MS System

Package ASICS. January 23, 2018

An Evolutionary Programming Algorithm for Automatic Chromatogram Alignment

Parametric time warping of peaks with the ptw package

Agilent MassHunter Workstation Software 7200 Accurate-Mass Quadrupole Time of Flight GC/MS

R e p o r t E d i t o r M a n u a l

Progenesis LC-MS Tutorial Including Data File Import, Alignment, Filtering, Progenesis Stats and Protein ID

Package rtandem. June 30, 2018

Chromeleon / MSQ Plus Operator s Guide Document No Revision 03 October 2009

R-software multims-toolbox. (User Guide)

Package mspurity. November 16, 2018

MassHunter Pesticides PCD or PCDL Quick Start Guide

An introduction to Baitmet package

QuiC 1.0 (Owens) User Manual

Introduction to Time-of-Flight Mass

NMRProcFlow Macro-command Reference Guide

Skyline Targeted Method Refinement

This manual describes step-by-step instructions to perform basic operations for data analysis.

MassHunter File Reader

Spectroscopic Analysis: Peak Detector

Skyline irt Retention Time Prediction

UPDATING PESTICIDE RETENTION TIME LIBRARIES FOR THE AGILENT INTUVO 9000 GC

Getting Started. Environmental Analysis Software

Progenesis CoMet User Guide

Progenesis QI for proteomics User Guide. Analysis workflow guidelines for DDA data

Transcription:

Type Package Package cosmiq April 11, 2018 Title cosmiq - COmbining Single Masses Into Quantities Version 1.12.0 Author David Fischer <dajofischer@googlemail.com>, Christian Panse <cp@fgcz.ethz.ch>, Endre Laczko <endre.laczko@fgcz.uzh.ch> Maintainer David Fischer <dajofischer@googlemail.com>, Christian Panse <cp@fgcz.ethz.ch> Depends R (>= 3.0.2), Rcpp Imports pracma, xcms, MassSpecWavelet, faahko Suggests RUnit, BiocGenerics, BiocStyle cosmiq is a tool for the preprocessing of liquid- or gas - chromatography mass spectrometry (LCMS/GCMS) data with a focus on metabolomics or lipidomics applications. To improve the detection of low abundant signals, cosmiq generates master maps of the mz/rt space from all acquired runs before a peak detection algorithm is applied. The result is a more robust identification and quantification of low-intensity MS signals compared to conventional approaches where peak picking is performed in each LCMS/GCMS file separately. The cosmiq package builds on the xcmsset object structure and can be therefore integrated well with the package xcms as an alternative preprocessing step. License GPL-3 URL http://www.bioconductor.org/packages/devel/bioc/html/cosmiq.html Collate combine_spectra.r peakdetection.r eicmatrix.r retention_time.r quantify_combined.r create_datamatrix.r cosmiq.r LazyData false biocviews MassSpectrometry, Metabolomics NeedsCompilation yes R topics documented: combine_spectra...................................... 2 cosmiq............................................ 3 1

2 combine_spectra create_datamatrix...................................... 5 eicmatrix.......................................... 7 peakdetection........................................ 8 quantify_combined..................................... 9 retention_time........................................ 10 Index 12 combine_spectra Combine mass spectra of each scan and each file into a single master spectrum combine_spectra imports each raw file using xcmsraw and assigns each ion to a previously defined mass vector, which is created using bin size parameter mzbin. This process is repeated for each raw file. combine_spectra(xs, mzbin=0.003, linear=false, continuum=false) xs mzbin linear continuum xcmsset object Bin size for the generation of mass vector logical. If TRUE, linear vector will be generated with mzbin increments. If FALSE, mass vector will be generated using a non-linear function. This option is recommended for TOF-type mass detectors boolean flag. default value is FALSE. This processing step calculates a combined mass spectrum. Mass spectra not only from all scans of a single LCMS run alone are combined but from all acquired datasets. As a result, signal to noise ratio increases for each additional LCMS run. David Fischer 2013 cdfpath <- file.path(find.package("faahko"), "cdf") my.input.files <- dir(c(paste(cdfpath, "WT", sep='/'), paste(cdfpath, "KO", sep='/')), full.names=true)

cosmiq 3 # create xcmsset object xs <- new("xcmsset") xs@filepaths <- my.input.files x<-combine_spectra(xs=xs, mzbin=0.25, linear=true, continuum=false) plot(x$mz, x$intensity, type='l', xlab='m/z', ylab='ion intensity') cosmiq cosmiq - main wrapper function This is the main wrapper function for the package cosmiq. Every processing step of cosmiq will be calculated during this function, including mass spectra combination, detection of relevant masses, generation and combination of extracted ion chromatograms, detection of chromatographic peaks, Localisation and quantification of detected peaks. combine_spectra imports each raw file using xcmsraw and assigns each ion to a previously defined mass vector, which is created using bin size parameter mzbin. This process is repeated for each raw file. cosmiq (files, RTinfo=TRUE, mzbin=0.003, center=0, linear=false, profstep=1, retcorrect=true, continuum=false, rtcombine=c(0,0), scales=c(1:10), SNR.Th=10, SNR.area=20, RTscales=c(1:10, seq(12,32,2)), RTSNR.Th=20, RTSNR.area=20, mintr=0.5) files RTinfo rtcombine mzbin center linear String vector containing exact location of each file Logical indicating whether retention time should be used or not. If FALSE, the quantitative information will be retrieved as a sum of ion intensities whithin a retention time window. This retention time window Numerical, with two entries. If no retention time information is to be used, rtcombine provides the retention time window where ion intensities will be summed for quantification. If rtcombine = c(0,0), All scans will be used Bin size for the generation of mass vector Indicates number of file which will be used for retention time alignment. If zero, file will be automatically selected based on maximum summed ion intensity of the combined mass spectrum. logical. If TRUE, linear vector will be generated with mzbin increments. If FALSE, mass vector will be generated using a non-linear function. This option is recommended for TOF-type mass detectors

4 cosmiq profstep retcorrect continuum scales SNR.Th SNR.area RTscales RTSNR.Th RTSNR.area mintr step size (in m/z) to use for profile generation from the raw data files. Logical, should retention time correction be used? default = TRUE Logical, is continuum data used? default = FALSE Peak Detection of Mass Peaks in the combined mass spectrum: scales for continuous wavelet transformation (CWT). Peak Detection of Mass Peaks in the combined mass spectrum: Signal to noise ratio threshold Peak Detection of Mass Peaks in the combined mass spectrum: Area around the peak for noise determination. Indicates number of surrounding peaks on the first CWT scale. default = 20 Peak Detection of Chromatographic peaks on the combined extracted ion chromatogram: scales for continuous wavelet transformation (CWT), see also the peakdetectioncwt function of the MassSpecWavelet package. Peak Detection of Chromatographic peaks on the combined extracted ion chromatogram: Signal to noise ratio threshold Peak Detection of Chromatographic peaks on the combined extracted ion chromatogram: Area around the peak for noise determination. Indicates number of surrounding peaks on the first CWT scale. default = 20 Minimal peak width intensity treshold (in percentage), for which two overlapping peaks are considered as separated. The default calue is 0.5. Current data processing tools focus on a peak detection strategy where initially each LCMS run is treated independently from each other. Subsequent data processing for the alignment of different samples is then calculated on reduced peak tables. This function involves the merging of all LCMS datasets of a given experiment as a first step of raw data processing. The merged LCMS dataset contains an overlay and the sum of ion intensities from all LCMS runs. Peak detection is then performed only on this merged dataset and quantification of the signals in each sample is guided by the peak location in the merged data. See also xcms::retcor.obiwarp and cosmiq::create_datamatrix which executes each step of the cosmiq function. For running examples please read the package vignette or?cosmiq::create_datamatrix. Christan Panse, David Fischer 2014 ## Not run: # the following lines consume 2m17.877s # on a Intel(R) Xeon(R) CPU X5650 @ 2.67GHz library(faahko) cdfpath <- file.path(find.package("faahko"), "cdf") my.input.files <- dir(c(paste(cdfpath, "WT", sep='/'), paste(cdfpath, "KO", sep='/')), full.names=true) x <- cosmiq(files=my.input.files, mzbin=0.25, SNR.Th=0, linear=true)

create_datamatrix 5 image(t(x$eicmatrix^(1/4)), col=rev(gray(1:20/20)), xlab='rt', ylab='m/z', axes=false ) head(xcms::peaktable(x$xcmsset)) ## End(Not run) create_datamatrix Quantifying mz/rt intensities using peak locations from master map create_datamatrix identifies mz/rt peak boundaries in each raw file using the information from a master mass spectrum and master EIC. For each mz/rt location, the peak volume is calculated and stored in a report table. create_datamatrix(xs,rxy) xs rxy xcmsset object matrix containing mz and RT boundaries for each detected peak With the information about their position in the combined datasets, each individual mz/rt feature is located in the raw data. David Fischer 2013 ## Not run: cdfpath <- file.path(find.package("faahko"), "cdf") my.input.files <- dir(c(paste(cdfpath, "WT", sep='/'), paste(cdfpath, "KO", sep='/')), full.names=true) # # create xcmsset object # todo xs <- new("xcmsset") # consider only two files!!! xs@filepaths <- my.input.files[1:2]

6 create_datamatrix class<-as.data.frame(c(rep("ko",2),rep("wt", 0))) rownames(class)<-basename(my.input.files[1:2]) xs@phenodata<-class x<-combine_spectra(xs=xs, mzbin=0.25, linear=true, continuum=false) plot(x$mz, x$intensity, type='l', xlab='m/z', ylab='ion intensity') xy <- peakdetection(x=x$mz, y=x$intensity, scales=1:10, SNR.Th=0.0, SNR.area=20, mintr=0.5) id.peakcenter<-xy[,4] plot(x$mz, x$intensity, type='l', xlim=c(440,460), xlab='m/z', ylab='ion intensity') points(x$mz[id.peakcenter], x$intensity[id.peakcenter], col='red', type='h') # create dummy object xs@peaks<-matrix(c(rep(1, length(my.input.files) * 6), 1:length(my.input.files)), ncol=7) colnames(xs@peaks) <- c("mz", "mzmin", "mzmax", "rt", "rtmin", "rtmax", "sample") xs<-xcms::retcor(xs, method="obiwarp", profstep=1, distfunc="cor", center=1) eicmat<-eicmatrix(xs=xs, xy=xy, center=1) # process a reduced mz range for a better package build performance (eicmat.mz.range<-range(which(475 < xy[,1] & xy[,1] < 485))) eicmat.filter <- eicmat[eicmat.mz.range[1]:eicmat.mz.range[2],] xy.filter <- xy[eicmat.mz.range[1]:eicmat.mz.range[2],] # # determine the new range and plot the mz versus RT map (rt.range <- range(as.double(colnames(eicmat.filter)))) (mz.range<-range(as.double(row.names(eicmat.filter)))) image(log(t(eicmat.filter))/log(2), col=rev(gray(1:20/20)), xlab='rt [in seconds]', ylab='m/z', axes=false, main='overlay of 12 samples using faahko') axis(1, seq(0,1, length=6), round(seq(rt.range[1], rt.range[2], length=6))) axis(2, seq(0,1, length=4), seq(mz.range[1], mz.range[2], length=4)) #

eicmatrix 7 # determine the chromatographic peaks rxy<-retention_time(xs=xs, RTscales=c(1:10, seq(12,32, by=2)), xy=xy.filter, eicmatrix=eicmat.filter, RTSNR.Th=120, RTSNR.area=20) rxy.rt <- (rxy[,4] - rt.range[1])/diff(rt.range) rxy.mz <- (rxy[,1] - mz.range[1])/diff(mz.range) points(rxy.rt, rxy.mz, pch="x", lwd=2, col="red") xs<-create_datamatrix(xs=xs, rxy=rxy) peaktable <- xcms::peaktable(xs) head(peaktable) ## End(Not run) eicmatrix Generate matrix of combined extraced ion chromatograms (EICs) for each selected mass window, eicmatrix calculates EICs of every raw file and combines them together. It is recommended to correct the retention time first using retcor.obiwarp eicmatrix(xs, xy, center) xs xy center xcmsset object table including mz location parameters file number which is used as a template for retention time correction For each detected mass, an extracted ion chromatogram (EIC) is calculated. In order to determine the elution time for each detected mass, the EICs of every mass are combined between all acquired runs. Make sure that xcms-retention time correction (centwave) was applied to the dataset. The output will be a matrix of EIC s David Fischer 2013

8 peakdetection ## Not run: # see package vignette section # 'Generation and combination of extracted ion chromatograms' xs<-create_datamatrix(xs=xs, rxy=rxy) ## End(Not run) peakdetection An algorithm for the detection of peak locations and boundaries in mass spectra or ion chromatograms peakdetection uses Continuous wavelet transformation (CWT) to determine optimal peak location. A modified algorithm of Du et al. (2006) is used to localize peak positions. peakdetection(scales, y, x, SNR.Th, SNR.area, mintr) scales y x SNR.Th SNR.area mintr vector with the scales to perform CWT vector of ion intensities vector of mz bins Signal-to-noise threshold Window size for noise estimation Minimal peak width intensity treshold (in percentage), for which two overlapping peaks are considered as separated. default is set to 0.5. A peak detection algorithm based on continuous wavelet transformation (CWT) is used for this step (modified from Du et al., 2006). Peak detection based on CWT has the advantage that a sliding scale of wavelets instead of a single filter function with fixed wavelength is used. This allows for a flexible and automatic approximation of the peak width. As a result it is possible to locate both narrow and broad peaks within a given dynamic range. David Fischer 2013 References Du, P., Kibbe, W. A., & Lin, S. M. (2006). Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics, 22(17), 2059-2065. doi:10.1093/bioinformatics/btl355

quantify_combined 9 cdfpath <- file.path(find.package("faahko"), "cdf") my.input.files <- dir(c(paste(cdfpath, "WT", sep='/'), paste(cdfpath, "KO", sep='/')), full.names=true) # create xcmsset object xs <- new("xcmsset") xs@filepaths <- my.input.files op<-par(mfrow=c(3,1)) x<-combine_spectra(xs=xs, mzbin=0.25, linear=true, continuum=false) plot(x$mz, x$intensity, type='h', main='original', xlab='m/z', ylab='ion intensity') xy <- peakdetection(x=x$mz, y=x$intensity, scales=1:10, SNR.Th=1.0, SNR.area=20, mintr=0.5) id.peakcenter<-xy[,4] plot(x$mz[id.peakcenter], x$intensity[id.peakcenter], type='h', main='filtered') plot(x$mz, x$intensity, type='l', xlim=c(400,450), main='zoom', log='y', xlab='m/z', ylab='ion intensity (log scale)') points(x$mz[id.peakcenter], x$intensity[id.peakcenter],col='red', type='h') quantify_combined Generate report with combined ion intensities quantify_combined calculates a data table of mz intensities from a sum of ions within a selected mz window and from all MS scans. quantify_combined(xs,xy, rtcombine)

10 retention_time xs xy rtcombine xcmsset object table including mz location parameters Numerical, with two entries. If no retention time information is to be used, rtcombine provides the retention time window where ion intensities will be summed for quantification. If rtcombine = c(0,0), All scans will be used This function is only used, when no retention time information should be used, for example with direct infusion experiments. David Fischer 2013 ## Not run: # see package vignette ## End(Not run) retention_time Detection of chromatographic peak locations from extracted ion chromatograms (EICs) For each EIC in the EIC matrix, retention_time localizes chromatographic peaks using peakdetection. retention_time(xy, xs, eicmatrix, RTscales, RTSNR.Th, RTSNR.area, mintr) xy xs eicmatrix RTscales RTSNR.Th table including mz location parameters xcmsset object Matrix containing extracted ion chromatograms (EICs) Peak Detection of Chromatographic peaks on the combined extracted ion chromatogram: scales for continuous wavelet transformation (CWT), see also the peakdetectioncwt function of the MassSpecWavelet package. Peak Detection of Chromatographic peaks on the combined extracted ion chromatogram: Signal to noise ratio threshold

retention_time 11 RTSNR.area mintr Peak Detection of Chromatographic peaks on the combined extracted ion chromatogram: Area around the peak for noise determination. Indicates number of surrounding peaks on the first CWT scale. default = 20 Minimal peak width intensity treshold (in percentage), for which two overlapping peaks are considered as separated (default = 0.5) Based on the combined extracted ion chromatograms, there is another peak detection step to be performed. The same algorithm as described for the peak picking of mass signals (continuous wavelet transformation) is also used for peak picking in the retention time domain. David Fischer 2013 ## Not run: # see package vignette ## End(Not run)

Index combine_spectra, 2 cosmiq, 3 create_datamatrix, 5 eicmatrix, 7 peakdetection, 8 quantify_combined, 9 retention_time, 10 12