Advance XCMS Data Processing. H. Paul Benton

Similar documents
LC-MS Data Pre-Processing. Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics University of North Carolina at Charlotte

Progenesis CoMet User Guide

Package IPO. October 9, 2018

Reference Manual MarkerView Software Reference Manual Revision: February, 2010

An R package to process LC/MS metabolomic data: MAIT (Metabolite Automatic Identification Toolkit)

MRMPROBS tutorial. Hiroshi Tsugawa RIKEN Center for Sustainable Resource Science

Progenesis CoMet User Guide

Progenesis LC-MS Tutorial Including Data File Import, Alignment, Filtering, Progenesis Stats and Protein ID

OpenLynx User's Guide

LC-MS Peak Annotation and Identification

Tutorial 2: Analysis of DIA/SWATH data in Skyline

Annotation of LC-MS metabolomics datasets by the metams package

Retention Time Locking with the MSD Productivity ChemStation. Technical Overview. Introduction. When Should I Lock My Methods?

Package CorrectOverloadedPeaks

Analysis of GC-MS metabolomics data with metams

Progenesis QI for proteomics User Guide. Analysis workflow guidelines for DDA data

Data processing. Filters and normalisation. Mélanie Pétéra W4M Core Team 31/05/2017 v 1.0.0

Package HiResTEC. August 7, 2018

Package cosmiq. April 11, 2018

Package xcms. April 15, 2017

SUPPLEMENTARY INFORMATION

MS data processing. Filtering and correcting data. W4M Core Team. 22/09/2015 v 1.0.0

QuiC 1.0 (Owens) User Manual

Retention time prediction with irt for scheduled MRM and SWATH MS

MRMPROBS tutorial. Hiroshi Tsugawa RIKEN Center for Sustainable Resource Science MRMPROBS screenshot

High-throughput Processing and Analysis of LC-MS Spectra

Statistical Process Control in Proteomics SProCoP

TraceFinder Analysis Quick Reference Guide

Direct Infusion Mass Spectrometry Processing (DIMaSP) Instructions for use

To get started download the dataset, unzip files, start Maven, and follow steps below.

Protein Deconvolution Quick Start Guide

BIOINF 4399B Computational Proteomics and Metabolomics

Tutorial 7: Automated Peak Picking in Skyline

Resting state network estimation in individual subjects

Package envipick. June 6, 2016

QuantWiz: A Parallel Software Package for LC-MS-based Label-free Protein Quantification

Thermo Xcalibur Getting Started (Quantitative Analysis)

Adaptive Processing of LC/MS Metabolomics data

R-software multims-toolbox. (User Guide)

TraceFinder Administrator Quick Reference Guide

Labelled quantitative proteomics with MSnbase

BIOINFORMATICS. A geometric approach for the alignment of liquid chromatography mass spectrometry data

Data Processing for Small Molecules

Agilent MassHunter Qualitative Data Analysis

User Manual MarkerView Software User Manual Revision: February, 2010

Skyline High Resolution Metabolomics (Draft)

Parametric time warping of peaks with the ptw package

Improved Centroid Peak Detection and Mass Accuracy using a Novel, Fast Data Reconstruction Method

Fmri Spatial Processing

Guidelines for Setting Auto-Peak Width in ApexTrack Processing Methods and its Effect on USP Resolution and USP Plate Count

MS-FINDER tutorial. Last edited in Aug 10, 2018

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

MassChroQ manual. Mass Chromatogram Quantication software

QuiC 2.1 (Senna) User Manual

NMRProcFlow Macro-command Reference Guide

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

What s New in Empower 3

Skyline irt Retention Time Prediction

Package biosigner. March 6, 2019

Package stattarget. December 5, 2017

Real-time Data Compression for Mass Spectrometry Jose de Corral 2015 GPU Technology Conference

MALDIquant: Quantitative Analysis of Mass Spectrometry Data

Skyline Targeted MS/MS

Metabolomic Data Analysis with MetaboAnalyst

Validation of a Direct Analysis in Real Time Mass Spectrometry (DART-MS) Method for the Quantitation of Six Carbon Sugar in Saccharification Matrix

Automated AFM Image Processing User Manual

An R package to process LC/MS metabolomic data: MAIT (Metabolite Automatic Identification Toolkit)

Curatr: a web application for creating, curating, and sharing a mass spectral library

Package stattarget. December 23, 2018

To 3D or not to 3D? Why GPUs Are Critical for 3D Mass Spectrometry Imaging Eri Rubin SagivTech Ltd.

Mnova Training Basics

This manual describes step-by-step instructions to perform basic operations for data analysis.

Andreas Reimann Berlin im Juli 2009

Large and Sparse Mass Spectrometry Data Processing in the GPU Jose de Corral 2012 GPU Technology Conference

Skyline MS1 Full Scan Filtering

The organization of the human cerebral cortex estimated by intrinsic functional connectivity

Chromeleon / MSQ Plus Operator s Guide Document No Revision 03 October 2009

1. What software do I need for itraq or TMT? : Quantitation FAQ 2009 Matrix Science

Corra v2.0 User s Guide

Fusion AE LC Method Validation Module. S-Matrix Corporation 1594 Myrtle Avenue Eureka, CA USA Phone: URL:

Package Metab. September 18, 2018

Xcalibur Library Browser

Getting low-resolution GC-MS data on the JEOL GCMate Introduction

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

Agilent MSD Productivity ChemStation G1701 & G1710. Performance Report Parameters

REVIEW. MassFinder 3. Navigation of the Chromatogram

TraceFinder Shortcut Menus Quick Reference Guide

EPBscore user guide. Requirements:

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

PRM Method Development and Data Analysis with Skyline. With

Tackling Latency via Replication in Distributed Systems

Quick Guide to the Star Bar

Agilent EZChrom Elite. PDA Analysis

MS-FINDER tutorial. Last edited in Sep. 10, 2018

Supplementary Figure 1. Decoding results broken down for different ROIs

Agilent G6854 MassHunter Personal Pesticide Database

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Agilent G6854AA MassHunter Personal Pesticide Database Kit Quick Start Guide

RMassBank for XCMS. Erik Müller. January 4, Introduction 2. 2 Input files LC/MS data Additional Workflow-Methods 2

Public Cloud Leverage For IT/Business Alignment Business Goals Agility to speed time to market, adapt to market demands Elasticity to meet demand whil

Transcription:

Advance XCMS Data Processing H. Paul Benton

Reminder of what we re trying to do Peak Detection Grouping Groups similar Peaks across replicates Retention Time Alignment Statistical Analysis of Classes

Total Ion Chromatograms TIC 0 50000 100000 150000 200000 250000 pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_b01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_c01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_d01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_e01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_ee01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_f01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_g01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_h01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_i01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_j01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_k01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_L_201.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_L_301.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_L_401.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_L01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_m01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_n01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_o01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_p01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_q01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc1_r01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc2_a01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc2_b01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc3_a01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc3_b01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc4_a01.CDF pav_20111003_pq AN_HILIC_Aq_tissue_pos_qc4_b01.CDF 0 200 400 600 800 1000 1200 1400 Retention Time

Parameters Matter! step = 1 step = 0.1

Peak Detection Easy!

Right?

Peak detection Data comes in two types in MS : centroid & profile Generally high resolution or low resolution ~ high mass accuracy or low mass accuracy Two main choices in XCMS MatchedFilter - profile low res CentWave - centroid high res

Hat fitting Different hat for different heads (& faces apparently) A hat has to fit well so it must be sized

MatchedFilter

MatchedFilter : wearing hats Bin of each X m/z Apply a filter function to the data Any peak above a s/n ratio is selected Peak is selected to filter baseline

Matched Filter : Sizing the hats Profile Data - profmethod binlinbase - profile data bin - centroid data profstep - Bin Size Peak Width - FWHM

Matched Filter : Sizing the hats Profile Data - profmethod binlinbase - profile data bin - centroid data profstep - Bin Size Peak Width - FWHM

Missiles are like ions! Kalman filtering

Tracking Missiles is like tracking LC-MS traces Trace backward along the trace This will define the area of the bin

Tracking Missiles is like tracking LC-MS traces Trace backward along the trace This will define the area of the bin

CentWave - ppm

CentWave - ppm

CentWave - ppm

CentWave 2 Regions of Interest (ROI) Found using Kalman Filter Often over estimates

CentWave 2 Regions of Interest (ROI) Found using Kalman Filter Often over estimates

Dynamic Binning

Find and integrate the peak Wavelet formation are then used over the ROIs to find the peak Several passes of wavelets are used until the correction fit is found (mexican hat wavelets)

General Principals Peak Detection Grouping Groups similar Peaks across replicates Retention Time Alignment Statistical Analysis of Classes

Grouping First time using all of the files Looks for closely clustered/ dense peaks across multiple files. This is a well behaved group with a peak in each replicate for each class Once peaks are grouped they re know as a group or feature

This would have npeak > number of samples You could play with parameters settings Global parameters :( Use different method :-)

Grouping = Nearest Based on mzmine grouping/alignment algorithm Uses nearest neighbor estimation.

Group.nearest RT m/z

Group.nearest RT m/z

Group.nearest RT m/z

General Principals Peak Detection Grouping Groups similar Peaks across replicates Retention Time Alignment Statistical Analysis of Classes

Retention Time alignment XCMS finds well behaved groups These include group that have missing peaks, extra peaks or perfect groups (parameters) Missing < n/2!! Median found for each group Local regression used for each sample to find the deviation profile

Retention time alignment - loess median rt of each well behaved group vs rt of each file A good spread of anchors/ well behaved peak groups

Alignment Parameters: missing = number of peaks removed from a well behaved peak group extra = Number of additional peak in a well behaved peak group span = Amount of smoothing in regression fitting! Very sensitive! ~ smaller value more local alignment, larger more global alignment.

Missing = 1 4 samples each

Extra = 1

Retention time alignment obiwarp John T. Prince and Edward M. Marcotte Chromatographic Alignment of ESI-LC-MS Proteomics Data Sets by Ordered Bijective Interpolated Warping Analytical Chemistry, 2060 78 (17), 6140-6152 obi-warp.sourceforge.net - original program Retention time correction based on spectra similarity Doesn t rely on detected feature ~~ sort of No initial grouping needed

Retention time alignment obiwarp Uses a warping technique to warp data to a median chromatogram. This acts as a mold which other spectra are warped to Uses a dynamic programming to find path of greatest similarity between median chromatogram and current chromatogram

FillPeaks Going back to each file to find any intensity that wasn t peak picked Peak intensity from fillpeaks

Finally!! We have all of our data corrected in a form we can use. Lets look at some data processing: heatmaps PCA Some Stats

WAIT!!!!

Normalisation needed?

Correlation heat map of the Bonferroni corrected ANOVA p-values

Summary XCMS processes LC-MS data and is complex XCMS processes LC-MS data and uses some simple algorithms. There are multiple algorithm for different jobs/data types.

Boxes and Foxes XCMS is all about boxes Boxes are sly and slippery and are the main problem in data analysis If you re having issues try changing alignment methods and thinking about how much deviation in m/z or RT the data has before and post alignment

CAMERA Same compound should be at the same retention time Same compound should have a linear relationship Using linear correlation and RT windows adducts/ isotopes are labeled

On-wards to biology Network maps from related metabolites

Thank You! Questions? Many more updates coming soon including speed and more stats Prof. Gary Siuzdak The whole xcms team Dr. Colin Smith