Efficient implementation of interval matrix multiplication

Similar documents
The optimal routing of augmented cubes.

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

Real-Time Collision Detection for Dynamic Virtual Environments

Traffic Grooming in Bidirectional WDM Ring Networks

Computing and maximizing the exact reliability of wireless backhaul networks

An Experimental Assessment of the 2D Visibility Complex

Malware models for network and service management

An Efficient Numerical Inverse Scattering Algorithm for Generalized Zakharov-Shabat Equations with Two Potential Functions

THE COVERING OF ANCHORED RECTANGLES UP TO FIVE POINTS

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Multimedia CTI Services for Telecommunication Systems

Synthesis of fixed-point programs: the case of matrix multiplication

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

Mokka, main guidelines and future

Relabeling nodes according to the structure of the graph

Acyclic Coloring of Graphs of Maximum Degree

Representation of Finite Games as Network Congestion Games

SDLS: a Matlab package for solving conic least-squares problems

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

X-Kaapi C programming interface

Setup of epiphytic assistance systems with SEPIA

Structuring the First Steps of Requirements Elicitation

LaHC at CLEF 2015 SBS Lab

Robust IP and UDP-lite header recovery for packetized multimedia transmission

The Connectivity Order of Links

Taking Benefit from the User Density in Large Cities for Delivering SMS

Comparison of spatial indexes

Moveability and Collision Analysis for Fully-Parallel Manipulators

DANCer: Dynamic Attributed Network with Community Structure Generator

Avoiding double roundings in scaled Newton-Raphson division

Assisted Policy Management for SPARQL Endpoints Access Control

Linked data from your pocket: The Android RDFContentProvider

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

A Voronoi-Based Hybrid Meshing Method

Primitive roots of bi-periodic infinite pictures

On a,b-edge-weightings of bipartite graphs with odd a,b

How to simulate a volume-controlled flooding with mathematical morphology operators?

Every 3-connected, essentially 11-connected line graph is hamiltonian

Accurate Conversion of Earth-Fixed Earth-Centered Coordinates to Geodetic Coordinates

A case-based reasoning approach for invoice structure extraction

Lossless and Lossy Minimal Redundancy Pyramidal Decomposition for Scalable Image Compression Technique

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

Tacked Link List - An Improved Linked List for Advance Resource Reservation

A 64-Kbytes ITTAGE indirect branch predictor

A case-based reasoning approach for unknown class invoice processing

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

Fuzzy sensor for the perception of colour

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

Stream Ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

MUTE: A Peer-to-Peer Web-based Real-time Collaborative Editor

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

A Practical Evaluation Method of Network Traffic Load for Capacity Planning

Comparison of radiosity and ray-tracing methods for coupled rooms

Linux: Understanding Process-Level Power Consumption

Implementing decimal floating-point arithmetic through binary: some suggestions

Hardware support for UNUM floating point arithmetic

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

Efficient Gradient Method for Locally Optimizing the Periodic/Aperiodic Ambiguity Function

A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer

YAM++ : A multi-strategy based approach for Ontology matching task

Motion-based obstacle detection and tracking for car driving assistance

Deformetrica: a software for statistical analysis of anatomical shapes

HySCaS: Hybrid Stereoscopic Calibration Software

Real-time FEM based control of soft surgical robots

Prototype Selection Methods for On-line HWR

A New Method for Mining High Average Utility Itemsets

Decomposition of a curve into arcs and line segments based on dominant point detection

A Methodology for Improving Software Design Lifecycle in Embedded Control Systems

From medical imaging to numerical simulations

Minor-monotone crossing number

XML Document Classification using SVM

Comparator: A Tool for Quantifying Behavioural Compatibility

Workspace and joint space analysis of the 3-RPS parallel robot

Quasi-tilings. Dominique Rossin, Daniel Krob, Sebastien Desreux

Fuzzy interpolation and level 2 gradual rules

Caching strategies based on popularity prediction in content delivery networks

Solving and Certifying the Solution of a Linear System

Topological Model for 3D Image Representation: Definition and Incremental Extraction Algorithm

Kernel perfect and critical kernel imperfect digraphs structure

Induced minors and well-quasi-ordering

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

QuickRanking: Fast Algorithm For Sorting And Ranking Data

SIM-Mee - Mobilizing your social network

Approximate RBF Kernel SVM and Its Applications in Pedestrian Classification

Self-optimisation using runtime code generation for Wireless Sensor Networks Internet-of-Things

Implementing an Automatic Functional Test Pattern Generation for Mixed-Signal Boards in a Maintenance Context

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

Complexity Comparison of Non-Binary LDPC Decoders

The b-chromatic number of power graphs

A Method for Interactive 3D Reconstruction of Piecewise Planar Objects from Single Images

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Formal modelling of ontologies within Event-B

Collision Avoidance on Shared Slots in a Wireless Slotted Network: Models and Simulations

Change Detection System for the Maintenance of Automated Testing

Transcription:

Efficient implementation of interval matrix multiplication Hong Diep Nguyen To cite this version: Hong Diep Nguyen. Efficient implementation of interval matrix multiplication. Para 2010: State of the Art in Scientific and Parallel Computing, Jun 2010, Reykjavik, Iceland. 2010. <inria-00469472> HAL Id: inria-00469472 https://hal.inria.fr/inria-00469472 Submitted on 1 Apr 2010 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Efficient implementation of interval matrix multiplication NGUYEN Hong Diep INRIA LIP(UMR 5668 CNRS - ENS de Lyon - INRIA - UCBL) Université de Lyon hong.diep.nguyen@ens-lyon.fr April 1, 2010 Abstract The straightforward implementation of interval matrix product suffers from poor efficiency, far from the performances of highly optimized floating-point implementations. In this paper, we show how to reduce the interval matrix multiplication to 9 floating-point matrix products - for performance issues - without sacrificing the quality of the result. We show that, compared to the straightforward implementation, the overestimation factor is at most 1.18. 1 Introduction Interval arithmetic is a means to obtain guaranteed results: enclosures of the results are computed. Nonetheless, it suffers from lower performance than nonguaranteed floating-point computations. Indeed, the factor of performance between interval arithmetic and floating-point arithmetic in theory is four. It is even much worse in practice, especially when a large number of operations is involved. In this paper we will study the case of matrix multiplication. Our implementation, based on the natural algorithm, provides good results at a cost of very low performance. The idea is to exploit existing libraries which are well optimized for floating-point matrix operations such as BLAS, ATLAS, etc. Rump proposed a fast algorithm which uses only four floating-point matrix products [2]. It returns a result wider than the result obtained by replacing each floatingpoing opertaion between two numbers by its interval counterpart: the factor of overestimation in the worst case of this algorithm is 1.5. This paper proposes a new algorithm which costs nine floating-point matrix products with the factor of overestimation in the worst case of 1.18. 1

This paper is organized as follows. Section 2 briefly presents interval arithmetic with some basic operations. These are extended to matrix operations which are studied in section 3. Section 4 reminds the idea of Rump algorithm. Section 5 is devoted to our proposed algorithm. Notations In this paper, bold-face lower-case letters represent scalar intervals and boldface upper-case letters represent interval matrices. Below are some additional notations. [ ] an expression computed by interval arithmetic, ( ), ( ) expressions computed by floating-point arithmetic with downward and upward rounding mode respectively, [i,s] an interval bounded by i and s, {m,r} an interval whose mid-point is m and radius is r, a, a lower and upper bound of a, mag(a) maximal magnitude of a: mag(a) def = max{ a,a a} = max( a, a ) 2 Interval Arithmetic Interval are used to represent connected closed sets of real values. Interval arithmetic defines operations between intervals. The result of an operation between intervals is also an interval containing all the possible results between all possible pairsofrealvalues taken from input intervals: r = a[ ]b = {a b,a a,b b}. Due to the monotonicity property, the sum of two intervals r = a+b can be computed by the sum of their two respective lower and upper bounds: r = a+b r = a+b (1) Interval multiplication r = a b is formed by taking the minimum and maximum value of the four products between two pairs of bounds of the two input intervals. { r = min(a b,a b,a b,a b) (2) r = max(a b,a b,a b,a b) Hence, in theory the factor, in terms of performance, between interval arithmetic and floating-point arithmetic is 4. Implementation Intervals are implemented on computers using floating-point numbers. To ensure the inclusion property of the results, rounding errors must be taken into account. 2

For interval addition r = a + b, the lower bound must be computed with downward rounding mode, and the upper bound with upward rounding mode. r = (a+b) r = (a+b) (3) Interval product r = a b is computed following (2) by four floating-point products. Nevertheless, to tackle rounding errors, each floating-point product must be computed twice, once with upward and once with downward rounding mode. Thus in total, it requires eight floating-point products. { r = min((a b),(a b),(a b),(a b) ) (4) r = max((a b),(a b),(a b),(a b) ) We can reduce the number of floating-point products by inspecting the sign of each component. But testing the sign is costly also. Particular cases If a and b are centered in zero ie a = a and b = b then a+b = (a+b). Hence, the sum r = a+b can be computed by: r = ( a+b ) r = r (5) If b is centered in zero then max(x b,x b) = x b and min(x b,x b) = x b for all x R. Hence according to (2), r = a b can be computed by: r = max( a, a ) b = mag(a) b r = max( a, a ) b = r Using floating-point arithmetic, r can be computed by only one floatingpoint product in upward rounding mode: r = ( mag(a) b ) r = r (6) 3 Interval matrix operations Let s now study the case of matrix operations. Suppose that each interval matrix here is of dimension n n and is represented by two floating-point matrices, one for its lower bound and the other for its upper bound. The addition of two interval matrices C = A + B can be computed by performing scalar additions between corresponding elements of the two matrices. Thus C = (A+B) C = ( A+B ) 3

For the case of interval matrix multiplication, each element of the result matrix is computed by: C i,j = A i,k B k,j (7) k Using this natural formula, the computation of each product element requires n interval multiplications and n interval additions, or, following (3) and (4), it requires 8n floating-point products and 2n floating-point additions. Hence, the overall cost in theory is 8n 3 +2n3 + floating-point operations. Nonetheless, this cost does not take into account neither min,max functions nor the cost of rounding mode changes. Particular cases A is centered in zero ie A i,k is centered in zero for all i,k. From (6) we get ] A i,k B k,j = [ A i,k mag(b) k,j,a i,k mag(b) k,j C i,j = [ k A i,k mag(b) k,j, ] k A i,k mag(b) k,j C C = ( A mag(b) ) = C Hence A B can be computed by one floating-point matrix product with upward rounding mode. A is non-negative It means that 0 A i,k and 0 A i,k for all i,k. Hence max(a i,k B k,j,a i,k B k,j ) = A i,k B k,j min(a i,k B k,j,a i,k B k,j ) = A i,k B k,j max(a i,k B k,j,a i,k B k,j ) = A i,k B k,j min(a i,k B k,j,a i,k B k,j ) = A i,k B k,j Denote by x pos = max(x,0) and x neg = min(x,0) then { max(ai,k B k,j,a i,k B k,j ) = A i,k B pos k,j +A i,k B neg k,j min(a i,k B k,j,a i,k B k,j ) = A i,k B pos k,j +A i,k B neg k,j { A i,k B k,j A i,k B k,j { ( C = A B pos +A B neg) = A i,k B pos k,j +A i,k B neg k,j = A i,k B pos k,j +A i,k B neg k,j C = (A B pos +A B neg ) 4

In this case, an interval matrix product can be computed by 4 floating-point matrix products. Similarly, if A is non positive, 4 floating-point matrix products suffice to compute an interval matrix product. A does not contain zero in its interior We can then split A into positive and negative part by { {a a a 0} if a 0 a pos = { 0 if a < 0 {a a a 0} if a 0 a neg = 0 if a > 0 A does not contain zero in its interior, thus either A pos i,j = A i,j,a neg i,j = 0 or A pos i,j = 0,Aneg i,j = A i,j. Hence A i,j B j,k A B = A i,j B pos j,k +A i,j B neg j,k = A pos B+A neg B In total, it costs eight floating-point matrix products. For these three particular cases, if all operations are performed in infinite precision or if there is no rounding error, then the computed result is exact. In general, when there is no assumption about the input intervals then it would not be efficient to use the natural algorithm. One solution is to exploit floating-point matrix multiplication because there are available libraries which are well optimised for floating-point matrix operations. 4 Rump s algorithm Rump proposes an algorithm which makes use of floating-point operations for speed [2, 3]. This algorithm is based on the midpoint-radius representation of intervals. LetA = {m A,r A }andb = {m B,r B }betwointervalmatrices,withm A,m B their midpoints and r A,r B their radius respectively. The product A B is enclosed by an interval matrix C whose midpoint and radius are computed by: m C = m A m B r C = ( m A +r A ) r B +r A m B In fact, because of rounding errors, m A m B cannot be computed exactly. Thus it must be computed with both upward and downward rounding mode to obtain en enclosure of the midpoint. The radius r C must also be computed with upward rounding to ensure the inclusion property. ] mc = [(m A m B ),(m B m B ) r C = (( m A +r A ) r B +r A m B ) 5

Finally, the result can be easily converted back to endpoints representation by two floating-point additions. C = ( ) mc+r C C = (mc r C ) In total, this algorithm uses four floating-point products. Over-estimation Without taking into account of rounding errors, Rump s algorithm always provides over-estimating results. The factor of over-estimation in the worst case is 1.5. For example, by definition [0,2] [0,4] def = [0,8]. Meanwhile, Rump s algorithm gives [0,2] [0,4] = {1,1} {2,2} = {1 2,(1+1) 2+1 2} = {2, 6} = [ 4, 8] The over-estimation factor in this case is 12/8 = 1.5. 5 Proposition Rump algorithm can be considered as decomposing A and B into a sum of two components representing its midpoint and radius respectively. A B is then replaced by its development, which is a sum of four sub-products. Due to the sub-distributive property of interval product, the result yielded by this sum is an enclosure of the original product. Our idea is to find another decomposition such that sub-products can be efficiently computed, and the over-estimation is small. As we can see in Section 3, an interval product can be efficiently computed when one of the two multipliers is centered in zero or does not contain zero. Proposition 1. Let A be an interval matrix. If A is decomposed into two interval matrices A 0 and A which satisfy: then A 0 i,j = 0,A i,j = A i,j if A i,j A i,j 0, A 0 i,j = [ A i,j, A i,j ],A i,j = [ 0,A i,j +A i,j ] if Ai,j < 0 < Ai,j Ai,j, A 0 i,j = [ A i,j,a i,j ],A i,j = [ A i,j +A i,j,0 ] if A i,j < 0 < A i,j < Ai,j A 0 is centered in zero, A does not contain zero in its interior, and 6

A 0 +A = A. Proof. Easily deduced from the formula. Proposition 2. Let A and B be two interval matrices and (A 0,A ) a decomposition of A by Proposition 1. Let C be an interval matrix computed by Then A B is contained in C. Denote C def = A B. Proof. Following Proposition 1: C = A 0 B+A B (8) A = A 0 +A Interval multiplication is sub-distributive, hence A B A 0 B+A B C A 0 is centered in zero and A does not contain zero, so according to Section 3, A 0 B and A B can be computed using 1 and 8 floating-point matrix products respectively. Hence, the overall cost is 9 floating-point matrix products. The following section will study the over-estimation factor of this operation in the worst case. Over-estimation factor Let us first consider the product of two scalar intervals a and b with a being decomposed, following Proposition 1, into two parts: a = a 0 +a. Suppose that all calculations here are performed in infinite precision. It means that rounding errors will not be taken into account. If a is centered in zero then a = 0 a b = a 0 b. Thus the result is exact. If a does not contain zero in its interior then a 0 = 0 a b = a b. Thus the result is exact too. 7

If a contains zero It means that a < 0 < a. The case a < a is contrary to the case a > a, hence without loss of generality we suppose that a > a. In this case a 0 = [a, a] and a = [0,a+a]. Following Proposition 2 a b = a 0 b+a b If b 0, then mag(b) = b = [a, a] b+[0,a+a] b = [a mag(b), a mag(b)]+[0,a+a] b a b = [a mag(b), a mag(b)]+[0,a+a] b = [ a b, a b ] + [ 0,(a+a) b ] = [ a b,a b ] Meanwhile, b 0 a b = [ a b,a b ]. Hence a b = a b. It means that there is no overestimation. Idem for the case b 0. If b > 0 > b and b b then mag(b) = b. Hence a b = [a mag(b), a mag(b)]+[0,a+a] b = [ a b, a b ] + [ (a+a) b,(a+a) b ] = [ (a+a) b+a b,a b ] diam(a b) = a b ((a+a) b+a b) As a > 0 > a > a and b > 0 > b > b Denote by = a b (1 a/a b/b a/a b/b) a b = [ min(a b,a b),a b ] diam(a b) = a b min(a b,a b) { M = max( a /a, b /b) m = min( a /a, b /b) = a b (1 min(a/a,b/b)) 0 < m M 1 min(a/a, b/b) a/a+b/b a/a b/b then = M = m M = m M 8

The factor of overestimation is computed by: diam(a b) diam(a b) = 1 a/a b/b a/a b/b 1 min(a/a,b/b) 1+M +m Mm = 1+M 1+M +m(1 M) = 1+M 1+M +M(1 M) 1+M Inspecting the last function of unknown M between 0 and 1, we have that its maximum is 4 2 2 1.18 and this maximum is reached for m = M = 2 1, or a/a = b/b = 1 2. Idem for the case b > 0 > b and b < b Let s now extend to the case of matrix multiplication. Each element of the result matrix is computed by C i,j = k A 0 i,k B k,j + k A i,k B k,j = k (A 0 i,k B k,j +A i,k B k,j) Moreover diam(a+b) = (a+b) (a+b) = (a a)+(b b) = diam(a) + diam(b) ( ) diam(c i,j ) = diam (A 0 i,k B k,j +A i,k B k,j) = k k diam(a 0 i,k B k,j +A i,k B k,j ) With the assumption of no rounding error, the over-estimation factor in the worst case for this scalar product is 4 2 2. Hence diam(c i,j ) (4 2 2) diam(a i,k B k,j ) k (4 2 ( ) 2) diam A i,k B k,j diam(c) (4 2 2) diam(a B) Hence, the over-estimation factor in the worst case of interval matrix product is also 4 2 2 1.18. k 9

6 Conclusion The algorithm presented in this paper implements the product of interval matrices using floating-point operations. It constitutes a trade-off between the performances of optimized floating-point libraries, and a slight overestimation of the result. We have proven that this overestimation factor is at most 1.18, whichmeansthatthe width ofcomputed resultisalwayslessthan1.18times the width ofexactresult. In particular,ifoneofthetwomultipliersdoesnot contain zero in its interior then the computed result is exact using exact arithmetic. The performance of the algorithm given above relies entirely on the performance of the employed floating-point library. Such a library must support directed rounding modes to be of use. This requirement is not really an issue, since libraries such as LAPACK or ATLAS, which are renowned for their performances, support directed roundings. However, the use of directed rounding modees restricts the algorithm that can be used as a basic algorithm to the natural one. Indeed, fast algorithms for matrix products [1], which have a complexity below O(n 3 ), do not preserve the monotonicity of operations (they use substraction as well as addition and multiplication) and they cannot be used. Furthermore, such algorithms rely on algebraic properties such as (x+y) y = x, which do not hold in interval arithmetic. References [1] M. Ceberio and V. Kreinovich. Fast multiplication of interval matrices (interval version of strassen s algorithm). Reliable Computing, 10(3):241 243, 2004. [2] S. M. Rump. Fast and parallel interval arithmetic. BIT, 39(3):534 554, 1999. [3] S. M. Rump. Computer-assisted proofs and self-validating methods. In B. Einarsson, editor, Handbook on Accuracy and Reliability in Scientific Computation, chapter 10, pages 195 240. SIAM, 2005. 10