The Parallel Bayesian Toolbox for High-performance Bayesian Filtering in Metrology

Similar documents
APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

An Optimal Algorithm for Prufer Codes *

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

An Entropy-Based Approach to Integrated Information Needs Assessment

Wavefront Reconstructor

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Parallelism for Nested Loops with Non-uniform and Flow Dependences

The Codesign Challenge

Classifying Acoustic Transient Signals Using Artificial Intelligence

S1 Note. Basis functions.

High-Boost Mesh Filtering for 3-D Shape Enhancement

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Feature Reduction and Selection

Fusion Performance Model for Distributed Tracking and Classification

Hermite Splines in Lie Groups as Products of Geodesics

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Smoothing Spline ANOVA for variable screening

Cluster Analysis of Electrical Behavior

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Mathematics 256 a course in differential equations for engineering students

A Binarization Algorithm specialized on Document Images and Photos

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Wishing you all a Total Quality New Year!

Parallel matrix-vector multiplication

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Support Vector Machines

A Robust Method for Estimating the Fundamental Matrix

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Lecture 4: Principal components

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Simulation Based Analysis of FAST TCP using OMNET++

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

X- Chart Using ANOM Approach

Object Contour Tracking Using Multi-feature Fusion based Particle Filter

CS 534: Computer Vision Model Fitting

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

3D vector computer graphics

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Lecture 5: Multilayer Perceptrons

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Backpropagation: In Search of Performance Parameters

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

THE PULL-PUSH ALGORITHM REVISITED

APPLICATION OF AN AUGMENTED REALITY SYSTEM FOR DISASTER RELIEF

Monte Carlo Rendering

Problem Set 3 Solutions

Solving two-person zero-sum game by Matlab

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

What are the camera parameters? Where are the light sources? What is the mapping from radiance to pixel color? Want to solve for 3D geometry

y and the total sum of

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

TN348: Openlab Module - Colocalization

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

RADIX-10 PARALLEL DECIMAL MULTIPLIER

Methodology of optimal sampling planning based on VoI for soil contamination investigation

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

Analysis of Continuous Beams in General

Contours Planning and Visual Servo Control of XXY Positioning System Using NURBS Interpolation Approach

Classification / Regression Support Vector Machines

Cell Count Method on a Network with SANET

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Programming in Fortran 90 : 2017/2018

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

S.P.H. : A SOLUTION TO AVOID USING EROSION CRITERION?

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding


Unsupervised Learning and Clustering

Edge Detection in Noisy Images Using the Support Vector Machines

METRIC ALIGNMENT OF LASER RANGE SCANS AND CALIBRATED IMAGES USING LINEAR STRUCTURES

Six-Band HDTV Camera System for Color Reproduction Based on Spectral Information

Structural Optimization Using OPTIMIZER Program

An Image Fusion Approach Based on Segmentation Region

Bayesian evidence for Finite element model updating

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

Anytime Predictive Navigation of an Autonomous Robot

Module 6: FEM for Plates and Shells Lecture 6: Finite Element Analysis of Shell

The Shortest Path of Touring Lines given in the Plane

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Transcription:

10.2478/msr-2013-0047 MEASUREMENT SCIENCE REVIEW, Volume 13, No. 6, 2013 The Parallel Bayesan Toolbox for Hgh-performance Bayesan Flterng n Metrology E. Garca, T. Hausotte Insttute Manufacturng Metrology, Unversty Erlangen-Nuremberg, 91052 Erlangen, Germany, garca@fmt.fau.de The Bayesan theorem s the most used nstrument for stochastc nferencng n nonlnear dynamc systems and also the fundament of measurement uncertanty evaluaton n the GUM. Many powerful algorthms have been derved and appled to numerous problems. The most wdely used algorthms are the broad famly of Kalman flters (KFs), the grd-based flters and the more recent partcle flters (PFs). Over the last 15 years, especally PFs are ncreasngly the subject of researches and engneerng applcatons such as dynamc coordnate measurements, estmatng sgnals from nosy measurements and measurement uncertanty evaluaton. Ths s rooted n ther ablty to handle arbtrary nonlnear and/or non-gaussan systems as well as n ther easy codng. They are samplng-based sequental Monte-Carlo methods, whch generate a set of samples to compute an approxmaton of the Bayesan posteror probablty densty functon. Thus, the PF faces the problem of hgh computatonal burden, snce t converges to the true posteror when number of partcles N P. In order to solve these computatonal problems a hghly parallelzed C++ lbrary, called Parallel Bayesan Toolbox (PBT), for mplementng Bayes flters (BFs) was developed and released as open-source software, for the frst tme. In ths paper the PBT s presented, analyzed and verfed wth respect to effcency and performance appled to dynamc coordnate measurements of a photogrammetrc coordnate measurng machne (CMM) and ther onlne measurement uncertanty evaluaton. Keywords: Partcle flter, Kalman flter, Bayesan estmaton, parallel hgh-performance computng, CUDA, open source. T 1. INTRODUCTION RADITIONAL coordnate measurng machnes (CMM) were ncreasngly completed or superseded by moble, dynamc coordnate measurng machnes (DMM), such as laser tracer or photogrammetry systems (.e. camerabased systems). They allow non-contact, n-stu and onlne coordnate measurements of complex movng objects over large measurement volumes along wth excellent precson and hgh measurement rate drectly n the shop-floor [1, 2]. Typcal applcatons are dynamc moton measurements (traveled paths, postons, dsplacements, veloctes, acceleratons, jers, orentatons and vbratons), gudance and calbraton of machnes, robot trajectory measurements, deformaton analyss as well as assembly and nspecton of parts and structures. Often, DMM are portable optcal measurement systems, whch perform onlne measurements, mostly n unstable envronments. For these dynamc measurements,.e. tracng the temporal evoluton of an object s poston, t s n general not possble to model, quantfy or compensate sgnfcant error sources, such as optcal aberratons (e.g. reflecton errors, changng refractve ndex), magng errors or dynamc errors (e.g. moton artfacts), durng the measurement process. As a result the obtaned measurements exhbt a lower sgnal-to-nose rato and reduced measurement accuracy. Furthermore t s not possble to evaluate the measurement uncertanty accordng to the Gude to the expresson of uncertanty n measurement (GUM) due to the model mperfecton and dynamc characterstc of the measurement process. A soluton to mprove the measurement accuracy and estmate the uncertanty of such dynamc measurements s gven by the Bayesan nference. Based on a model of the process under nvestgaton, t allows the onlne computaton of the probablty densty functons (PDFs) of the output quanttes and thus dervng the optmal estmate for them. In the frst part of ths paper the prncpals of measurement uncertanty evaluaton accordng to the GUM and the fundamentals of Bayesan estmaton theory wll be brefly restated. The second part of the paper s devoted to ntroduce and demonstrate the Parallel Bayesan Toolbox (PBT) and ts features n order to mplement Bayesan estmaton technques n a straghtforward and effcent way. In order to determne the requred computaton tme for such post-processng, several benchmars for synthetc computaton tass problems as well as dynamc coordnate measurement tass wll be presented. 2. BASIC PRINCIPLES OF MEASUREMENT UNCERTAINTY EVALUATION A measurement result s generally expressed as a sngle measured quantty value and a measurement uncertanty [5, defnton 2.9]. The de facto nternatonal standards to evaluate, calculate and express uncertanty n all nds of measurements are the GUM and ts extenson the GS1 [3, 4]. These documents provde gudance on the uncertanty evaluaton as a two-stage process: formulaton and calculaton. The formulaton stage nvolves developng a measurement model relatng output (measurand) y to nput quanttes x 1,,x N, ncorporatng correctons and other effects as necessary and fnally assgnng probablty dstrbutons to the nput quanttes on the bass of avalable nowledge. The calculaton stage conssts of propagatng the probablty dstrbutons for the nput quanttes through the measurement model y=f(x 1,,x N ) to obtan the probablty dstrbuton for the output quantty. From ths probablty densty functon the expected value and the standard devaton of the output quantty as well as the coverage nterval are 315

calculated. In order to not go beyond the scope of ths paper, t s referred to GUM and GS1 [3, 4] for further detals of the propagaton of dstrbuton. Fnally, the measurement uncertanty s stated as uncertanty budget, whch should nclude the measurement model, estmates, and measurement uncertantes assocated wth the quanttes n the measurement model, covarances, type of appled probablty densty functons, degrees of freedom, type of evaluaton of measurement uncertanty and the coverage nterval. Here, the optmal soluton would be an analytcal calculaton of the output PDF. But as ths s not possble n general, especally for nonlnear dynamc processes, sequental Monte Carlo methods are utlzed to acheve a Bayesan estmaton of the output densty, as descrbed n the next secton. 3. BAYESIAN ESTIMATION A measurement process generates dsperse (uncertan) observatons of a true (but unnown) measurand. Ths s an erroneous transformaton due to ubqutous errors. Thus, a complete, exact or unque reconstructon of the true measurand s always mpossble. But usng Bayes theorem t s possble to estmate the true measurand from observatons, snce the functonal relaton between both s preserved. Ths estmaton problem can be solved as a sequental probablstc nference problem for nonlnear dynamc systems. The unnown measurand (e.g. coordnates) s modeled as a state of a dynamc system (e.g. movng object) that evolves over tme and s observed wth a partcular measurement process. Ths allows the estmaton of measurable states (e.g. poston) as well as derved or not drectly measurable states (e.g. veloctes, acceleraton and orentatons) of arbtrary dynamc systems n a statstcally sound way gven uncertan, nosy or ncomplete observatons. The nonlnear dynamc system s descrbed by the general dscrete-tme stochastc state space model respectvely. The stochastc state-space model, together wth the nown statstcs of the nose random varables as well as the pror dstrbutons of the system states, defnes a probablstc model of how the systems evolves over tme and how we naccurately observe the evoluton of ths hdden state. The objectve s to estmate the hdden system states x n a recursve fashon (.e. sequental update of prevous estmate) as measurement y becomes avalable. Ths s the central ssue of the sequental probablstc nference theory, whch s also referred to as onlne, sequental or teratve flterng. From a Bayesan perspectve, the complete soluton to ths problem s gven by the condtonal posteror densty p( x 1: ) of the state x, tang all observatons y = y, y, K, y nto account (Fg.1.). { } 1: 1 2 The computaton of the Bayesan posteror densty p( x ) requres the ncorporaton of all measurements 1: y 1: n one step (batch processng). A method to recursvely update the posteror densty as new observatons arrve s gven by the recursve Bayesan estmaton algorthm, whch s faster and allows an onlne processng of data wth lower storage costs and a quc adapton to changng data characterstcs. x Bayesan nference p( y ) p( x ) p( x ) = 1: 1: p( y ) x = f ( x, v ) 1 y = h( x, w ) 1: y Fg.1. Recursve Bayesan estmaton for dynamc systems. x = f ( x, v ), 1 1 y = h ( x, w ), (1) Usng the Bayes theorem and the dscrete-tme stochastc state space model (1), the posteror densty can be derved and factored nto the followng recursve update form where x s the hdden system state vector evolvng over tme (dscrete tme ndex) accordng to the possbly nonlnear state transton functon f, the last state x 1 and the process nose v 1. The observaton y related to the state vector va the nonlnear observaton functon and the measurement nose w, whch s corruptng the measurement of the state. Ths state space model corresponds to a frst order hdden Marov model [6, 7] wth an ntal probablty densty, p( x 0 ) the state transton probablty p( x 1) and the observaton probablty densty p( y ). Both the state transton densty and the observaton lelhood are fully specfed by f and the process nose probablty p( v ) and by h and the observaton nose densty p( w ), h p p( y ) p( x ) p( y ) p( x ) 1: 1: 1 ( x 1: ) = =. p( y1: ) p( y 1: 1) The computaton of the Bayesan recurson essentally conssts of two steps: the predcton step p( x ) p( x ) and the update step 1 1: 1 1: 1 p( x 1: 1, y ) p( x 1: ). The predcton step nvolves usng the process functon and the prevous posteror densty p( x 1 1: 1) to calculate the pror densty of the state x at tme va the Chapman- Kolmogorov equaton = 1: 1 1 1 1: 1 1 (2) p( x ) p( x ) p( x ) dx, (3) 316

where the state transton PDF s gven by 1 p( x ) = δ ( x f ( x, v )) p( v ) dv. (4) 1 1 The pror of the state x at tme does not ncorporate the latest measurement y,.e. the probablty for x s only based on prevous observatons. When the new uncertan measurement y becomes avalable the update step s carred out and the posteror PDF of the current state x s computed from the predcted pror (3) and the new measurement va the Bayes theorem (2), where the observaton lelhood PDF and the normalzng constant n the denomnator are gven by: p( y ) = δ ( y h ( x, w )) p( w ) dw (5) = p( y ) p( y ) p( x ) dx. (6) 1: 1 1: 1 The equatons (2-6) formulate the Bayesan soluton to recursvely estmate the unnown system states (measurands) of a nonlnear dynamc system from uncertan, nosy or ncomplete measurements. But ths s only a conceptual soluton n the sense that n general the ntegrals cannot be determned analytcally. Furthermore, the algorthmc mplementaton of ths soluton requres the storage of the entre (non-gaussan) posteror PDF as an nfnte dmensonal matrx, because generally t cannot be completely descrbed by a suffcent statstc of fnte dmenson. Only n some restrcted cases a closed-form recursve soluton s possble. For example wth restrcton to lnear, Gaussan systems a closed-form recursve soluton s gven by the famous Kalman flter [8]. In most stuatons, ether both the dynamc and the measurement process or only one of them s nonlnear. Thus the mult-dmensonal ntegrals are not tractable and approxmatve solutons such as sequental Monte Carlo methods have to be used [8]. 3.1. Sequental Monte Carlo methods Sequental Monte Carlo (SMC) methods are stochastc samplng-based approaches, whereby the Monte Carlo ntegraton wth sequental mportance samplng (SIS) s used to solve the hgh dmensonal ntegrals of the Bayesan recurson. They mae no explct assumpton about the form of the posteror densty and approxmate the Bayesan ntegrals wth fnte sums. These methods have the advantage of not beng subject to the curse of dmensonalty as well as not beng constraned to lnear or Gaussan models. The basc dea n SMC methods s to represent the Bayesan posteror densty p( x 1: ) by a set of random samples wth assocated weghts w (also referred to as partcles) and to recursvely compute the posteror based on these weghted samples, that s 1 δ( ) s the Drac-delta functon. w N s 1: w δ = 1 p( x ) ( x x ), (7) p( y ) p( x 1) = w 1, q( x 1, y ) where p( x 1: ) s the true posteror, q( ) s the mportance samplng densty, x and w wth = 1, K, N P are the random samples and assocated weghts, whsh normalze to w = 1. In every tme step the samples are drawn from the mportance densty, snce t s not possble to sample from the posteror densty drectly. For ths reason the choce of q( ) s a crucal desgn parameter n SIS methods. The objectve s to draw samples n the regon where the target state les n (regon of mportance ) to acheve good state estmatons and hgh computatonal effcency. Wth an ncreasng number of partcles N P the Monte Carlo Approxmaton becomes an equvalent representaton of the true posteror densty and the SMC methods converges to the true Bayesan estmate. An ntrnsc problem wth SIS s the crcumstance, that after a few teratons, only some partcles have sgnfcant weghts, and the others are almost zero, ths s called weght degeneracy or sample mpovershment problem. In order to mprove the sample effcency, usually some nd of resamplng (selecton) scheme s ntroduced, that avods the problem of degenerate partcles. The most common resamplng strateges are multnomal, systematc, stratfed or resdual resamplng [9]. The resamplng duplcates the partcles wth hgh weghts wth several chldren, and assgns them equal weghts. Ths elmnates partcles wth nsgnfcant weghts and chooses more partcles n more probable regons. In general, the total number of partcles s ept constant. Ths method of samplng s termed samplng mportance resamplng (SIR). The resultng algorthmc mplementaton consstng of predctng, updatng and resamplng the partcles (weghted samples) s called partcle flter. The PF s very general and very easy to code but faces the aforementoned problem of hgh computatonal burden, snce t only converges to the true posteror when the number of partcles goes to nfnty. Thus, there s a drect dependency between the number of partcles and the achevable estmaton accuracy. The more partcles were used the better s the approxmaton of the true Bayesan posteror densty. 4. PARALLEL BAYESIAN TOOLBOX The PF faces the problem of hgh computatonal burden, snce t converges to the true posteror when number of partcles N P. For typcal low dmensonal estmaton problems the PF requres 2 to 6 orders of magntude more computatonal throughput than the extended KF (EKF), to acheve the same accuracy [10]. In order to solve these computatonal problems and assst users n effcently mplementng Bayes flters (BFs), a hghly parallelzed C++ lbrary, called Parallel Bayesan (8) 317

Toolbox (PBT) was developed and released as open-source software, for the frst tme [11]. The PBT s a very flexble, hgh-performance and platform ndependent 2 C++ programmng lbrary for mplementng the most common Bayes flter - that s PFs, KFs and combnatons of both - n an easly understandable, hgh level language, followng the MATLAB/Octave programmng language, for flterng, smoothng and predctng applcatons. Ths s acheved by usng the Armadllo lbrary [12] for easy codng, optmzed lnear algebra lbrares 3 for numercal computatons on central processng unts (CPUs) and Nvda's Compute Unfed Devce Archtecture (CUDA) framewor for performng computatons on graphcs processng unts (GPUs). The source code of the PBT can be dstrbuted and/or modfed under the terms of the Bereley Software Dstrbuton (BSD) lcense or the GNU General Publc Lcense (GPL). Ths dual lcense allows the usage of the PBT n open source, but also n propretary applcatons. The PBT was desgned and mplemented that the followng requrements are met. Free and open source: In order to analyze, modfy and reuse software, ts source code has to be avalable under an approprate lcense. Independent and flexble: Software should be ndependent of hard- or software (.e. ndependent of CPUs, GPUs, operatng systems, software lbrares etc.) and ndependent of any partcular applcaton (such as tracng, robotcs, sgnal/mage processng, econometrcs etc.). Performance: For hgh computatonal throughput, all avalable CPUs and GPUs should be used optmally. Modularty: Software should consst of nterchangeable, modular functonal unts n order to mprove customzaton and reusablty. Smple to code and easy to understand: Source code must be easy to understand, smple to read and easy to mantan. Matlab/GNU Octave Interface: Software should offer an nterface to Matlab/GNU Octave, because they are de facto standards for techncal computng, data analyss and algorthm development. Although there s free and open source software avalable for mplementng Kalman flters (see [18-19] and references heren) and/or partcle flters (see [20-23] and references heren), none fulfls all requrements stated above 4. Especally the last four crtera are not addressed adequately. In partcular, no software exhbts an easy to use lnear algebra lbrary wth bndngs to optmzed numercal lbrares n order to transparently and optmally uses modern mult-core CPU/GPU archtectures. Also a proper modularzaton to easly customze nose sources, resamplng or state estmaton methods and a Matlab/GNU Octave nterface are mssng. Implementng an extra MEX/OCT nterface, whch converts data types between Matlab/GNU Octave and the software lbrary, s an addtonal tedous and error-prone 2 Currently supported are Mcrosoft Wndows and Lnux on 32-bt or 64-bt processors. 3 Supported lbrares are Basc Lnear Algebra Subprograms (BLAS), Lnear Algebra PACKage (LAPACK), Automatcally Tuned Lnear Algebra Software (ATLAS), AMD Core Math Lbrary and Intel Math Kernel Lbrary (MKL). 4 A very good overvew of KF software s gven n [17]. Some lbrares for PF are compared n chapter 7 of the desgn document of The Bayesan Flterng Lbrary avalable under [20]. wor for practtoners and decreases the overall computatonal throughput. The PBT tself uses CMae [13] as cross-platform buld system and conssts of fve major modules. Frst, a BF module that provdes all data management and nterfaces. Ths module receves all measured data and dstrbutes t to the other parts of the framewor. The other modules are the state space model (SSM), descrbng the process and measurement equaton, the resamplng module mplementng all common resamplng strateges, the nose sources module provdng samplng and evaluatng functons for all dstrbutons defned n [3, 4] and the state estmaton module for computng a state estmate from the Bayesan posteror densty n every tme step. Addtonally the toolbox features nterfaces to the numercal computaton systems MATLAB and GNU Octave as well as to the C++ development envronment Automotve Data and Tme Trggered Framewor (ADTF). The overall archtecture of the PBT s depcted n Fg.2. Example C++ code for mplementng EKF and PF usng PBT s gven n Fg.3. and 4 respectvely. Ths demonstrates the easy and straghtforward usage of PBT. In the followng sectons, the man components and ther subsystems wll be explaned n more detal. Fg.2. Archtecture of the Parallel Bayesan Toolbox. A. Bayes Flter module Ths module has the role of a coordnator. It provdes the flter algorthm and t organzes the data transfer between all other modules, as shown n Fg.3. The Bayes flter module calls the process and measurement functon of the SSM n the predcton and update step, respectvely. It s also used for the computaton of the partcle weghts. B. State space model Ths module contans the dynamc model of the processor system under nvestgaton and defnes ts process functon ffun() and measurement functon hfun(). The PBT offers predefned models for smple dynamcs le steady state, lnear, crcular or nematc trajectores [14] to assst the user. Dynamc coordnate measurements of all standard geometrcal elements can be descrbed wth these ncluded models. More complex models can be mplemented by the user. In the case of KFs the user has to provde system, covarance and Jacoban matrces of the process and measurement equatons. In the case of PFs the user has to specfy ffun()and hfun() wth all nose sources. 318

C. Resamplng module The resamplng module mplements multnomal, systematc, stratfed and resdual. The resamplng step s only appled to PFs. It s the only bottlenec n the whole toolbox, because some calculatons cannot be parallelzed, such as computng the cumulatve sum. Fg.3. C++ mplementaton of EKF usng the PBT. E. State estmaton module In every tme step, the state estmaton model s used to compute a state estmate from the partcle approxmaton of the Bayesan posteror densty. The standard estmator n every flter confguraton s the weghted mean estmaton. Other avalable estmators are medan, robust mean, - means, mean-shft and best partcle estmator. The latter are especally useful n applcatons wth mult-modal lelhood functons, e.g. n localzaton. 5. PERFORMANCE BENCHMARKS Three dfferent performance benchmars were performed, n order to test the computaton throughput of lnear algebra calculatons n PBT. Frst, the multplcaton of non-square matrces was analyzed, snce ths s the most often used mathematcal operaton n BFs. Furthermore, n lterature s only the case of square matrces consdered. On numerous systems the matrx-matrx multplcaton was performed wth the dmensons d = (3, 15)(100, 500, 2000, 10000). A representatve realzed runtme graph s shown n fgure 5 (a) and (b). In most confguratons CPU and GPU show relatvely equal performance. Ths s because both are optmzed for these operatons. However, the NVIDIA Tesla GPU generaton (or newer generatons), specalzed for scentfc computatons, outperforms all other hardware confguratons. Regardng ths, the matrx-matrx multplcaton on GPUs s only useful n combnaton wth new GPU generatons and/or wth other operatons that are more computatonally ntensve, e.g. coordnates transformatons presented next. Fg.4. C++ mplementaton of PF usng the PBT. D. Nose sources module Ths module s most useful n the case of PFs. It conssts of random number generators and probablty densty functons for numerous dstrbutons. It s ntended to help the user to model arbtrary nose sources n the SSM. For modelng nose sources/uncertantes accordng to GUM and GS1 [3, 4], the PBT ncludes functons for all PDFs that a specfed n the GUM. Fg.5. Performance comparson of (a)-(b) matrx-matrx multplcaton of numerous CPUs, (c) the transformaton of Cartesan to polar coordnates and (d) of Wener process acceleraton model (DWPA) of numerous CPUs (dashed lnes) and GPUs (sold lnes). 319

Second, the performance of the nonlnear transformaton between Cartesan and polar coordnates was analyzed. Ths s one of the most often nonlnear transformatons n flterng and localzaton applcatons. The results are gven n Fg.4.(c). It can be seen that wth ncreasng number of vector elements (random numbers) the computaton tme of CPUs gets hgher, whereas the computaton tme of GPUs remans nearly constant. The reasons for ths result are the specal functon unts of GPUs for computng sne and cosne as well as the hgh parallelzaton of modern GPUs. Whle CPUs can compute 8 operatons n parallel, unspecalzed manstream GPUs can compute 1536 operatons at once (assumng 48 CUDA cores each wth 32 threads). Ths comples wth the results n [15]. Thrd, the achevable performance of flterng the moton of a pont mass wth SIR PF usng the dscrete Wener process acceleraton model (DWPA) as nematc model [14] was examned. The benchmar results, for ths very often used model, can be found n Fg.5.(d). For small matrces the runtme per one operaton s relatvely equal for GPU and CPU. But begnnng wth approxmately 1000 partcles the GPUs are faster than the CPU mplementatons. The lower rse of GPU runtme leads to the concluson that GPUs should be used for PF confguratons wth more than 1000 partcles, at least. All benchmar results revealed that usng GPUs for computatons can yeld a sgnfcant performance mprovement, especally for modern GPU archtectures. over 10 runs. The followng software was used for complng the lbrares: Mcrosoft Wndows 7 (64-bt), Mcrosoft Vsual C++ 2008 Compler, Armadllo 3.2.2, Intel MKL 10.3 Update 9 (64-bt), CUDA Toolt v4.2 and Boost 1.45. Because Bayes++ and BFL do not offer bndngs for optmzed lnear algebra lbrares 5, they are outperformed by PBT runnng on CPU as well as on GPU n case of usng more than 500 partcles. In the case of few partcles the CPU-based computaton outperforms GPU computng, due to memory data management overhead and memory transfer tme between host and GPU devce. But n the long run, GPU tremendously outperform CPU computng, because the ntal small transfer tme occurs only once. Fg.7. Comparson of runtme of PBT, Bayes++ and BFL. As mentoned n the begnnng of secton 4, the PBT features nterfaces to MATLAB and GNU Octave. The runtme performance of the PBT usng the MATLAB MEX nterface was further analyzed n a next experment. Here, the same flter tas was used to examne the performance mprovement of PBT compared to a natve MATLAB v7.14.0.739 (64-bt) mplementaton. The tests were run on Intel Xeon E5620 2.40 GHz CPU wth Nvda Tesla C2075 GPU. The benchmar results n Fg.8. show that PBT-MEX realzes throughout better computaton tmes than the natve MATLAB mplementaton. As n fgure 7 t can be seen that n case of more than 2000 partcles PBT-MEX on GPU s superor to PBT-MEX on CPU. Fg.6. Measurement setup for dynamc coordnate measurements. After these systhetc benchmars the dynamc measurement tas of tracng a freely movng target (marer) wth a moble, photogrammetrc stereo-camera system s examned. The tracng target was moved by a hgh accuracy XY lnear stage as depcted n Fg.6. along a gven trajetory. 1000 data ponts were recorded and afterwards fltered wth a SIR PF usng the former DWPA model. Ths postprocessng was performed offlne. In order to analyze the runtme for such data processng for later onlne computatons, a SIR PF was mplemented wth PBT, Bayes++ [23] and the Bayesan Flterng Lbrary (BFL) [21]. These are the only avalable open source lbrares that are under actve development and feature nterfaces and modules to mplement partcle flters n a comparable way to PBT. The benchmar results of a sngle flterng step for all three lbrares are gven n Fg.7. Shown are the averaged values Fg.8. Comparson of runtme of MATLAB and PBT-MEX. 5 Both use the Boost lbrares for lnear algebra and statstcs, whereby BFL could also use LTI matrx lbrary and Newmat lbrary as external matrx lbrares, see [21]. 320

6. CONCLUSIONS In ths paper the open source Parallel Bayesan Toolbox (PBT) for mplementng Bayes flter was presented. The toolbox s a very flexble, hgh-performance and platform ndependent C++ programmng lbrary for mplementng KFs, PFs and combnatons of both flter types. The presented benchmar results have shown that PBT s the fastest avalable open-source lbrary for mplementng Bayesan flterng, currently. Due to the hgh computatonal burden of such flters, the runtme performance s crucal for realzng an onlne processng of measured data e.g. dynamc coordnate measurements as n [16, 17]. For the frst tme, an open-source lbrary s avalable that features a smple MATLAB-le hgh level language and effcently utlze standard CPU/GPU archtectures for mplementng hgh-performance Bayesan flterng wthout the need of specal computng hardware. ACKLOWLEDGMENT The research was supported by the German Research Foundaton (Deutsche Forschungsgemenschaft) under the DFG-Project Number WE 918/34-1. REFERENCES [1] Schwene, H., Neuschaefer-Rube, U., Pfefer, T., Kunzmann, H. (2002). Optcal methods for dmensonal metrology n producton engneerng. CIRP Annals - Manufacturng Technology, 51 (2), 685-699. [2] Estler, W.T., Edmundson, K.L., Peggs, G.N., Parer, D.H. (2002). Large-scale metrology an update. CIRP Annals - Manufacturng Technology, 51 (2), 587-609. [3] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP and OIML. (2008). Evaluaton of measurement data - Gude to the expresson of uncertanty n measurement (GUM 1995 wth mnor correctons). JCGM 100:2008. [4] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP and OIML. (2008). Evaluaton of measurement data - Supplement 1 to the Gude to the expresson of uncertanty n measurement - Propagaton of dstrbutons usng a Monte Carlo method. JCGM 101:2008. [5] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP and OIML. (2008). Internatonal vocabulary of metrology Basc and general concepts and assocated terms (VIM). JCGM 200:2008. [6] Cappé, O., Moulnes, E., Ryden, T. (2005). Inference n Hdden Marov Models. Sprnger. [7] Fraser, A.M. (2008). Hdden Marov Models and Dynamcal Systems (1st ed.). Socety for Industral and Appled Mathematcs. [8] Doucet, A., de Fretas, N., Gordon, N. (2001). Sequental Monte Carlo Methods n Practce. Sprnger. [9] Douc, R., Cappé, O., Moulnes, E. (2005). Comparson of resamplng schemes for partcle flterng. In Image and Sgnal Processng and Analyss : 4th Internatonal Symposum (ISPA 2005), 15-17 September 2005. IEEE, 64-69. [10] Daum, F., Huang, J. (2003). Curse of dmensonalty and partcle flters. In Aerospace Conference, 8-15 March 2003. IEEE, Vol. 4, 4-1979-4-1993. [11] Google Project Hostng. Parallel Bayesan Toolbox. http://code.google.com/p/parallel-bayesan-toolbox/. [12] Sanderson, C. (2010). Armadllo: An open source C++ lnear algebra lbrary for fast prototypng and computatonally ntensve experments. Techncal Report. NICTA. [13] CMae home page. http://www.cmae.org/. [14] Bar-Shalom, Y., L, X.-R., Krubarajan, T. (2001). Estmaton wth Applcatons to Tracng and Navgaton : Theory Algorthms and Software. Wley Blacwell. [15] Rosenband, D.L., Rosenband, T. (2009). A desgn case study: CPU vs. GPGPU vs. FPGA. In Formal Methods and Models for Co-Desgn : 7th IEEE/ACM Internatonal Conference (MEMOCODE 09), 13-15 July 2009. IEEE, 69-72. [16] Garca, E., Hausotte, T., Amthor, A. (2013). Bayes flter for dynamc coordnate measurements Accuracy mprovment, data fuson and measurement uncertanty evaluaton. Measurement, 46 (9), 3737-3744. [17] Garca, E., Zschegner, N., Hausotte, T. (2013). Parallel hgh-performance computng of Bayes estmaton for sgnal processng and metrology. In Computng, Management and Telecommuncatons (ComManTel), 21-24 January 2013. IEEE, 212-218. [18] Welch, G., Bshop, G. The Kalman Flter homepage. http://www.cs.unc.edu/~welch/alman. [19] Identfcaton and Decson Mang Research Group, Unversty of West Bohema. Nonlnear Estmaton Framewor homepage. http://nft.y.zcu.cz/nef. [20] Cambrdge Unversty. Sequental Monte Carlo methods (Partcle flterng) homepage. http://wwwsgproc.eng.cam.ac.u/smc/software.html. [21] The Orocos Project. The Bayesan Flterng Lbrary. http://www.orocos.org/bfl. [22] BPS Project. BPS (Bayesan nference wth nteractng Partcle Systems) homepage. http://alea. bordeaux.nra.fr/bps. [23] Mchael Stevens. Bayes++. Open source Bayesan Flterng classes. http://bayesclasses.sourceforge.net. Receved August 1, 2013. Accepted December 12, 2013. 321