Real-time Data Compression for Mass Spectrometry Jose de Corral 2015 GPU Technology Conference

Similar documents
Large and Sparse Mass Spectrometry Data Processing in the GPU Jose de Corral 2012 GPU Technology Conference

MRMPROBS tutorial. Hiroshi Tsugawa RIKEN Center for Sustainable Resource Science

QuiC 1.0 (Owens) User Manual

CS/EE 217 Midterm. Question Possible Points Points Scored Total 100

MRMPROBS tutorial. Hiroshi Tsugawa RIKEN Center for Sustainable Resource Science MRMPROBS screenshot

Tutorial 2: Analysis of DIA/SWATH data in Skyline

Hardware-Accelerated 3D Visualization of Mass Spectrometry Data

Progenesis QI for proteomics User Guide. Analysis workflow guidelines for DDA data

To 3D or not to 3D? Why GPUs Are Critical for 3D Mass Spectrometry Imaging Eri Rubin SagivTech Ltd.

Reference Manual MarkerView Software Reference Manual Revision: February, 2010

MS data processing. Filtering and correcting data. W4M Core Team. 22/09/2015 v 1.0.0

Package cosmiq. April 11, 2018

New features and Workflow Enhancements Empower 3 Software

Automated Submission for Analysis Processing (ASAP) Operator s Guide Dionex Corporation

BIOINF 4399B Computational Proteomics and Metabolomics

Data preprocessing Functional Programming and Intelligent Algorithms

This manual describes step-by-step instructions to perform basic operations for data analysis.

Protein Deconvolution Quick Start Guide

Warped parallel nearest neighbor searches using kd-trees

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Thermo Xcalibur Getting Started (Quantitative Analysis)

Statistical Process Control in Proteomics SProCoP

Lecture 4: Image Processing

L10 Layered Depth Normal Images. Introduction Related Work Structured Point Representation Boolean Operations Conclusion

TraceFinder Analysis Quick Reference Guide

GPGPU: Parallel Reduction and Scan

Data processing. Filters and normalisation. Mélanie Pétéra W4M Core Team 31/05/2017 v 1.0.0

Progenesis CoMet User Guide

Andreas Reimann Berlin im Juli 2009

Skyline Targeted MS/MS

GPU-accelerated data expansion for the Marching Cubes algorithm

What s New in Empower 3

Slide 1. Technical Aspects of Quality Control in Magnetic Resonance Imaging. Slide 2. Annual Compliance Testing. of MRI Systems.

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

MSFragger Manual. (build )

Package CorrectOverloadedPeaks

REAL-TIME ADAPTIVITY IN HEAD-AND-NECK AND LUNG CANCER RADIOTHERAPY IN A GPU ENVIRONMENT

SequencePro Data Analysis Application. User Guide

CSE 599 I Accelerated Computing - Programming GPUS. Memory performance

QuiC 2.1 (Senna) User Manual

Cost Optimal Parallel Algorithm for 0-1 Knapsack Problem

MassLynx 4.1 SCN 804. Release Notes. SQ Detector 2

Annotation of LC-MS metabolomics datasets by the metams package

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Impact Analysis for Software changes in OpenLAB CDS A.01.04

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam

2, the coefficient of variation R 2, and properties of the photon counts traces

FFT-Based Astronomical Image Registration and Stacking using GPU

Package envipick. June 6, 2016

03 - Reconstruction. Acknowledgements: Olga Sorkine-Hornung. CSCI-GA Geometric Modeling - Spring 17 - Daniele Panozzo

Xcalibur Library Browser

DRAM Bank Organization


An Evolutionary Programming Algorithm for Automatic Chromatogram Alignment

Practical OmicsFusion

Current Perspectives on Clinical Mass Spectrometry Auto-Data Review: An Innovative Solution

PROTEOMIC COMMAND LINE SOLUTION. Linux User Guide December, B i. Bioinformatics Solutions Inc.

MASPECTRAS Users Guide

SIMAT: GC-SIM-MS Analayis Tool

EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

Process size is independent of the main memory present in the system.

Review for the Final

R-software multims-toolbox. (User Guide)

CUB. collective software primitives. Duane Merrill. NVIDIA Research

GPU-Accelerated Incremental Correlation Clustering of Large Data with Visual Feedback

Segmenting Lesions in Multiple Sclerosis Patients James Chen, Jason Su

Agilent Triple Quadrupole LC/MS Peptide Quantitation with Skyline

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Simulation of Flow Development in a Pipe

Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth

Project Report on. De novo Peptide Sequencing. Course: Math 574 Gaurav Kulkarni Washington State University

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

Harris: Quantitative Chemical Analysis, Eight Edition CHAPTER 05: QUALITY ASSURANCE AND CALIBRATION METHODS

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

Image Compression With Haar Discrete Wavelet Transform

6. Parallel Volume Rendering Algorithms

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.

Retention Time Locking with the MSD Productivity ChemStation. Technical Overview. Introduction. When Should I Lock My Methods?

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy

OpenLynx User's Guide

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose Joe Stam

Agilent MassHunter Easy Access Software

Performance impact of dynamic parallelism on different clustering algorithms

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Local invariant features

LC-MS Data Pre-Processing. Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics University of North Carolina at Charlotte

Matching and Recognition in 3D. Based on slides by Tom Funkhouser and Misha Kazhdan

SpectroDive 8 - Coelacanth. User Manual

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Organic Computing DISCLAIMER

Agilent 6100 Series Quad LC/MS System

Skyline MS1 Full Scan Filtering

Challenges in data compression for current and future imagers and hyperspectral sounders

Very large searches present a number of challenges. These are the topics we will cover during this presentation.

Skyline Targeted Method Refinement

Sample Sizes: up to 1 X1 X 1/4. Scanners: 50 X 50 X 17 microns and 15 X 15 X 7 microns

PROTOCOL: GCP MS Analysis (MSA)

Prefix Scan and Minimum Spanning Tree with OpenCL

Transcription:

Real-time Data Compression for Mass Spectrometry Jose de Corral 2015 GPU Technology Conference 2015 Waters Corporation 1

Introduction to LC/IMS/MS LC/IMS/MS is the combination of three analytical techniques Liquid Chromatography (LC) Ion Mobility Separation (IMS) Mass Spectrometry (MS) Main applications Biopharmaceutical Life Sciences Food safety Environmental Protection Clinical Chemical Materials 2015 Waters Corporation 2

Introduction to LC/IMS/MS LC/IMS/MS instruments produce ions and generates 3D data with these three axes Mass to charge ratio of ions (m/z) (mass) Ion mobility drift time (drift) Chromatographic time (time) Ion intensity is recorded at each point in this volume In each axis, intensity takes the form of peaks (mostly Gaussian) 2015 Waters Corporation 3

LC/IMS/MS Data Visual Example IMS (drift axis) has been summed to display this 3D image (LC/MS data) 2015 Waters Corporation 4

LC/IMS/MS Data Visual Example A close up of same LC/MS data 2015 Waters Corporation 5

LC/IMS/MS Data Size Data volume from a single analysis is typically large (and growing). For example About 200,000 data points in the mass axis 200 data points in the drift axis About 3,000 data points in the time axis Volume contains 120 billion data points 480 GB of storage for a 4 byte recorded ion intensity Fortunately, the data is sparse Only non-zero intensity values are recorded Data size varies but still may be large (20GB ~ 30GB) Each instrument runs 10 or more analysis daily Two data volumes per analysis 2015 Waters Corporation 6

LC/IMS/MS Data Acquisition LC/IMS/MS instruments generate mass scans of m/z values (every ~5 msec) 200 consecutive scans represent 200 ion mobility values (drift) at a point in chromatographic time Acquisition time is the chromatographic time (time axis) Each acquisition sampling time (~1 sec) we get a full plane of data (mass x drift) Acquisition time 10 min to 2 hours drift 200,000 1 sec 200 mass time 2015 Waters Corporation 7

LC/IMS/MS Data Compression Specific for LC/IMS/MS data Not a general purpose compression method Takes advantage of properties of the data How the data is generated and its features (Gaussian peaks) How the data is processed to get analytical results Lossy compression method based on relevance Discards data points that do not contribute to the analytical results Keeps all other data points unmodified Real-time compression Data is compressed in real-time during acquisition 2015 Waters Corporation 8

Compression Algorithm Overview The algorithm aims to remove certain points with non-zero intensity Points deemed irrelevant for results The algorithm has four main steps Before start, data must have all zero intensity data points restored (full data) A. For each data point, compute the sum of all intensities inside a volume centered at the data point. B. Compare each sum value to a given threshold. C. If the sum is above the threshold, keep all points inside the volume. Otherwise, tag the points inside the volume as candidates to be discarded. D. Once all data points have been processed as above, remove all points that remain tagged as candidates to be discarded. 2015 Waters Corporation 9

Compression Algorithm Overview Step A sum volume dimensions are not constant Computed from the Gaussian peak width of the data in each axis Dimension in the mass and drift axes increase with the axis 2015 Waters Corporation 10

Compression Algorithm Computation Simple algorithm but compute intense For example, lets assume a volume of just 5x9x7 (mass x drift x time) for step A A naïve implementation would generate 314 sum operations per data point In a data set with 120 billion data points it means 37.7 trillion sum operations Of course, the actual implementation uses far less sum operations per data point But still close to a trillion sum operations Step B would require 120 billion compare operations in this data set example Step C requires a similar amount of computation as step A 2015 Waters Corporation 11

Compression Algorithm Memory A large amount of device memory is needed to buffer real-time data Consequence of requirement of full data (with zeros) to run the algorithm At a minimum, need to buffer the dimension of the sum volume in the time axis Device memory requirements are reduced by tiling the data Only one tile copied to device memory Not a trivial tiling Different tile size and boundary criteria in each axis Tiles in main memory have zero intensity data points removed o Faster transfers, less main memory Zero intensity data points must be restored after copying each tile to device memory 2015 Waters Corporation 12

Step A Implementation A. For each data point, compute the sum of all intensities inside a volume centered at the data point. The sums computation is a box filter convolution in 3D with a variable box volume Convolution implemented as a separate convolution A 1D convolution in each axis with a variable sum window Ring buffers in shared memory are used to speedup computation Coalesced data is read from device memory only once Make the computation independent of the window width Enable computation in-place to save device memory 2015 Waters Corporation 13

Step B Implementation B. Compare each sum value to a given threshold. The comparison result is stored in a single bit flag Flag is set if sum is above the threshold Effective use of the warp vote function ballot() Sets the data in adequate format for step C computation Saves device memory 2015 Waters Corporation 14

Step C Implementation C. If the sum is above the threshold, keep all points inside the volume. Otherwise, tag the points inside the volume as candidates to be discarded. Doing this computation traversing all volumes in parallel is inefficient Only volumes with the comparison flag set do some work All points inside volumes with the comparison flag set must be visited Some points may be visited multiple times Instead, approach the problem from the perspective of the point to be kept/discarded Check the comparison flag of all volumes the point contributes to. Keep the point if it contributes to any volume with the comparison flag set. Discard it otherwise. 2015 Waters Corporation 15

Step C Implementation A 1D example where point P contributes to six volumes Point V1 is the center of a volume with a window width of 5 (the red volume) Point V2 is the center of a volume with a window width of 7 (the green volume) The red volume is the furthest volume to the left P contributes The green volume is the furthest volume to the right P contributes P contributes to all volumes with center between V1 and V2 We have to examine only those six comparison flags 2015 Waters Corporation 16

Step C Implementation The implementation is similar to the sums computation in step A But summing comparison flags instead of ion intensities This is called the keeps computation For each data point, compute the sum of all comparison flags inside a volume The volume encloses the center point of all sum volumes the point contributes to. If the sum of the flags is above zero, the point is kept. Discarded otherwise. 2015 Waters Corporation 17

Step C Implementation The keeps computation is a box filter convolution in 3D with a variable box volume Convolution implemented as a separate convolution A 1D convolution in each axis with a variable sum window Ring buffers in shared memory are used the same way as in step A Flags are read from device memory only once Computation in-place and independent of the keep window width The sum of flags in each axis is, in turn, stored in a single bit flag Flag is set if sum is above zero Next axis convolution uses this flag as input 2015 Waters Corporation 18

Results As expected, data size reduction is data dependent Varies between 30% and 90% (data reduced to 70% - 10% of original) High computation performance on a Tesla K40 Compressing simultaneously two data volumes totaling 164 billion points (36.6 GB) takes about 90 seconds Well beyond the requirements for real-time compression (acquisition time was 75 min) At least a speed up of 70x over a CPU implementation 2015 Waters Corporation 19

Results Analytical results are essentially the same With correct volume dimensions in step A and the right threshold in step B Setting the threshold too high degrades the results Data Orig. size Comp. size Orig. result Comp. result 500 ng Ecoli 14.5 GB 9.1 GB 969 proteins 969 proteins 100 ng Ecoli 8.6 GB 3.9 GB 840 proteins 836 proteins Metabolomics set (10) 2.6 1.0 GB 1.2 0.06 GB Indistinguishable PCA scores Side benefit of compression algorithm Post-acquisition data processing takes less time 2015 Waters Corporation 20

Visual Results (LC/MS data) Original data (11.1 GB) Compressed data (3.4 GB) 2015 Waters Corporation 21

Visual Results (LC/MS data) Notice noise peaks near edge of larger peaks are preserved A simple thresholding algorithm would have removed them too 2015 Waters Corporation 22

Questions? jose.de.corral@waters.com 2015 Waters Corporation 23