CUDA accelerated fault tree analysis with C-XSC

Similar documents
Floating-point Precision vs Performance Trade-offs

Modern GPUs (Graphics Processing Units)

GPU Implementation of a Multiobjective Search Algorithm

c-xsc R. Klatte U. Kulisch A. Wiethoff C. Lawo M. Rauch A C++ Class Library for Extended Scientific Computing Springer-Verlag Berlin Heidelberg GmbH

Comparison of Packages for Interval Arithmetic

Moving MATLAB Algorithms into Complete Designs with Fixed-Point Simulation and Code Generation

COMPUTER-ASSISTED PROOFS AND SYMBOLIC COMPUTATIONS * Walter Krämer

An update on Scalable Implementation of Primitives for Homomorphic EncRyption FPGA implementation using Simulink Abstract

A Parameterized Floating-Point Formalizaton in HOL Light

CME 213 S PRING Eric Darve

GPU Programming for Mathematical and Scientific Computing

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

Optimization Problems Under One-sided (max, min)-linear Equality Constraints

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

Modelling Geometrical Tolerances with Intervals Using ISO-Standard STEP

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

A Detailed GPU Cache Model Based on Reuse Distance Theory

Lecture 1: Introduction and Basics

Supercomputing the Cascade Processes of Radiation Transport

A MATLAB Interface to the GPU

Lecture 6. Abstract Interpretation

Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Using C-XSC in a Multi-Threaded Environment

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Similarity Measures of Pentagonal Fuzzy Numbers

Journal of mathematics and computer science 13 (2014),

Internet Routing Games

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

COS Lecture 13 Autonomous Robot Navigation

Offload acceleration of scientific calculations within.net assemblies

CUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University

Parallel Programming. Michael Gerndt Technische Universität München

Speeding up MATLAB Applications Sean de Wolski Application Engineer

Cuda C Programming Guide Appendix C Table C-

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

Floating-Point Arithmetic

EFFICIENT RELIABILITY AND UNCERTAINTY ASSESSMENT ON LIFELINE NETWORKS USING THE SURVIVAL SIGNATURE

Acceleration of SAT-based Iterative Property Checking

Modelling and simulation of seismic reflectivity

Decision Fusion using Dempster-Schaffer Theory

CS516 Programming Languages and Compilers II

Accelerating CFD with Graphics Hardware

International Journal of Computer Science and Network (IJCSN) Volume 1, Issue 4, August ISSN

COMPUTER ARCHITECTURE

GPU & Computer Arithmetics

MPC Toolbox with GPU Accelerated Optimization Algorithms

Using Graphics Chips for General Purpose Computation

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

Uses of GPU Powered Interval Optimization for Parameter Identification in the Context of SO Fuel Cells

Memory Bandwidth and Low Precision Computation. CS6787 Lecture 10 Fall 2018

Optimization of HOM Couplers using Time Domain Schemes

AN ANALYSIS ON MARKOV RANDOM FIELDS (MRFs) USING CYCLE GRAPHS

Introduction to Matlab GPU Acceleration for. Computational Finance. Chuan- Hsiang Han 1. Section 1: Introduction

Representation of Action Spaces in Multiple Levels of Detail

A Parallel Decoding Algorithm of LDPC Codes using CUDA

A technique for adding range restrictions to. August 30, Abstract. In a generalized searching problem, a set S of n colored geometric objects

Simulation of one-layer shallow water systems on multicore and CUDA architectures

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture


A Bibliography of Publications of Jingling Xue

GPU Programming Using NVIDIA CUDA

Accelerating Double Precision FEM Simulations with GPUs

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

FCUDA: Enabling Efficient Compilation of CUDA Kernels onto

Warps and Reduction Algorithms

PORTFOLIO OPTIMISATION

2 Computation with Floating-Point Numbers

Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

efmea RAISING EFFICIENCY OF FMEA BY MATRIX-BASED FUNCTION AND FAILURE NETWORKS

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

GPU programming. Dr. Bernhard Kainz

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

Bichromatic Line Segment Intersection Counting in O(n log n) Time

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Accelerating Correlation Power Analysis Using Graphics Processing Units (GPUs)

Assessment of Human Skills Using Trapezoidal Fuzzy Numbers

Offloading Java to Graphics Processors

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

COMPUTATIONAL OPTIMIZATION OF A TIME-DOMAIN BEAMFORMING ALGORITHM USING CPU AND GPU

Computer Arithmetic. 1. Floating-point representation of numbers (scientific notation) has four components, for example, 3.

Developing a Data Driven System for Computational Neuroscience

[ NOTICE ] YOU HAVE TO INSTALL ALL FILES PREVIOUSLY, BECAUSE A INSTALLATION TIME IS TOO LONG.

MatCL - OpenCL MATLAB Interface

Implementation of the finite-difference method for solving Maxwell`s equations in MATLAB language on a GPU

A PACKAGE FOR DEVELOPMENT OF ALGORITHMS FOR GLOBAL OPTIMIZATION 1

Interval Arithmetic and Computational Science: Performance Considerations

Extending the Range of C-XSC: Some Tools and Applications for the use in Parallel and other Environments

Verification of Numerical Results, using Posits, Valids, and Quires

Partial Wave Analysis using Graphics Cards

GPU-centric communication for improved efficiency

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

Visualizing Variability Models Using Hyperbolic Trees

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

Simultaneous Solving of Linear Programming Problems in GPU

Floating-Point Numbers in Digital Computers

Estimation d arrondis, analyse de stabilité des grands codes de calcul numérique

Transcription:

CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1, Michael Beer 2 1 Department of Computer and Cognitive Sciences (INKO) University of Duisburg-Essen Duisburg, Germany 2 Institute for Risk & Uncertainty University of Liverpool Liverpool, UK 19.09.2012 1 / 19

1 2 Verification CUDA Fault Tree Analysis 3 C++ and CUDA Evaluation 4 Future Work 2 / 19

of verified fault tree analysis in C++ using high-performance GPU 1 computing Issues Using GPU accelerated high-performance features to 1 Reduce the trade-off between computation accuracy and computation time 2 Use directed rounding based on the IEEE 754-2008 standard on the GPU 1 Graphics Processing Unit 3 / 19

Verification Verification CUDA Fault Tree Analysis Definition We use verification in its narrow sense of referring to a mathematical proof for correctness of a result obtained by a computer calculation. Tools Interval arithmetic provided by C-XSC Floating point arithmetic with directed rounding Central Processing Unit (CPU) Compute Unified Device Architecture (CUDA) 4 / 19

A short introduction to CUDA Verification CUDA Fault Tree Analysis Compute Unified Device Architecture (CUDA) High Performance GPU architecture Single Instruction, Multiple Data (SIMD) implementation Up to 2 10 CUDA cores on the NVIDIA GTX 590 Restriction to NVIDIA graphic cards Support of IEEE 754 floating point operations Double precision Directed rounding to the next floating point number (such as fl (x) and fl (x) with x R ) 5 / 19

Fault Tree Analysis Verification CUDA Fault Tree Analysis Fundamentals The implementation is based on The approach by Traczinsky et al. (2006) Verified on modern computer systems CUDA 6 / 19

Verification CUDA Fault Tree Analysis 7 / 19

Complexity Verification CUDA Fault Tree Analysis Computation step Each computation of a logical gate (AND- or OR-gate) has a complexity of O(n 3 ): Computation of each interval element (O (n n)) Computation of the mass assignment for each interval Total complexity: O(n 3 ) Improvements The algorithm can be improved to obtain an upper bound of complexity slightly smaller than O(n 3 ). 8 / 19

Verification under CUDA C++ and CUDA Evaluation Goal Compute correct results on computer systems using finite floating point arithmetic Approach Directed rounding (GPU source code) Interval arithmetic (C-XSC in CPU source code) 9 / 19

Interval Notation C++ and CUDA Evaluation Real Intervals (IR) x = [x, x] x x x, x, x and x R Machine Intervals (IF) x = [x, x] x x x, x, x and x F\{Not a number, ± } Description x is an interval from the set IR or IF x is the infimum/minimum of x x is the supremum/maximum of x 10 / 19

Verification under CUDA C++ and CUDA Evaluation Goal Compute correct results on computer systems using finite floating point arithmetic Problem Let x = 1 3 and x R 2 x + x 2 3 }{{} in floating point arithmetic 3 [fl (x + x), fl (x + x) ] }{{}}{{} lower bound upper bound 11 / 19

Verification under CUDA C++ and CUDA Evaluation Let x and y be two scale elements (intervals) and m x and m y the corresponding mass assignments Lower Failure Bound (OR-Gate) lb = fl ( fl ( x + y ) fl ( x y )) with x, y [0, 1], m lb = fl (m x m y ) with m x, m y [0, 1]. 12 / 19

Verification under CUDA C++ and CUDA Evaluation Let x and y be two scale elements (intervals) and m x and m y the corresponding mass assignments Lower Failure Bound (AND-Gate) lb = fl ( x y ) with x, y [0, 1] ub = fl (x y) with x, y [0, 1] m = fl (m x m y ) with m x, m y [0, 1]. 13 / 19

Computation time C++ and CUDA Evaluation Wall-clock time [s] spend on computation Configurations: Benchmark 1 (B1): n = 200, f = 20, l = 100 Benchmark 2 (B2): n = 5000, f = 100, l = 60 C++(LB) a C++(UB) DSI b (LB) DSI(UB) B1 7 7 1685 1712 B2 721 654 48070 46160 a C++ utilizing C-XSC and CUDA b DSI 3.5.2 and INTLAB V6 14 / 19

Computation time C++ and CUDA Evaluation 10 5 C++ & CUDA MATLAB & INTLAB 10 4 Wall-clock time [s] 10 3 10 2 10 1 benchmark 1 (LB) benchmark 1 (UB) benchmark 2 (LB) benchmark 2 (UB) Figure : Wall-clock time [s] spend on computation (logarithmic) 15 / 19

Future Work Achievements Reduction of the trade-off between accuracy and computation time Verified computation on the GPU using CUDA 16 / 19

Future Work Future Work Perspective Using high performance computing In MATLAB utilizing the MEX-Interface with CUDA and C-XSC To compute Markov set chains (imprecise Markov chains) 17 / 19

[1] Auer, E. ; Luther, W. ; Rebner, G. ; Limbourg, P.: A Verified MATLAB Toolbox for the Dempster-Shafer Theory. In: Proceedings of the Workshop on the Theory of Belief Functions www. udue. de/ DSIPaperone, http: // www. udue. de/ DSI, 2010 [2] Carreras, C. ; Walker, I.: Interval Methods for Fault-Tree Analyses in Robotics. In: IEEE Transactions on Reliability 50 (2001), 3 11. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=00935010 [3] IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. In: IEEE Std 754-2008 (2008), 29, S. 1 58. http://dx.doi.org/10.1109/ieeestd.2008.4610935. DOI 10.1109/IEEESTD.2008.4610935 [4] Krämer, H.: C-XSC 2.0: A C++ Library for Extended Scientific Computing. In: Lecture Notes in Computer Science Bd. 2991/2004. Springer-Verlag, Heidelberg, 2004, S. 15 35 [5] Krämer, W. ; Zimmer, M. ; Hofschuster, W.: Using C-XSC for High Performance Verified Computing. Version: 2012. http://dx.doi.org/10.1007/978-3-642-28145-7_17. In: Jónasson, Kristján (Hrsg.): Applied Parallel and Scientific Computing Bd. 7134. Springer Berlin / Heidelberg, 2012. ISBN 978 3 642 28144 0, 168-178. 10.1007/978-3-642-28145-7 17 [6] NVIDIA: Plattform für Parallel-Programmierung und parallele Berechnungen. Website http://www.nvidia.de/object/cuda_home_new_de.html, [7] Rebner, G. ; Auer, E. ; Luther, W.: A verified realization of a Dempster Shafer based fault tree analysis. In: Computing 94 (2012), S. 313 324. http://dx.doi.org/10.1007/s00607-011-0179-3. DOI 10.1007/s00607 011 0179 3. ISSN 0010 485X 18 / 19

Thank you 19 / 19