Synthesis of fixed-point programs: the case of matrix multiplication Mohamed Amine Najahi To cite this version: Mohamed Amine Najahi. Synthesis of fixed-point programs: the case of matrix multiplication. EJCIM: École Jeunes Chercheurs en Informatique Mathématique, Apr 213, Perpignan, France. 13th École Jeunes Chercheurs en Informatique Mathématique (EJCIM 213) Perpignan, April 12th, 213, 213. <lirmm-1277362> HAL Id: lirmm-1277362 https://hal-lirmm.ccsd.cnrs.fr/lirmm-1277362 Submitted on 22 Feb 216 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
13 th École Jeunes Chercheurs en Informatique Mathématique (EJCIM 213) Perpignan, April 12 th, 213 Synthesis of fixed-point programs: the case of matrix multiplication Amine Najahi Advisers: M. Martel and G. Revy Équipe-projet DALI, Univ. Perpignan Via Domitia LIRMM, CNRS: UMR 556 - Univ. Montpellier 2 DALI A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 1/26
How easy it is to program a product of matrices? A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26
How easy it is to program a product of matrices? Well, in floating-point, it is very easy!! #define N 8 int main () { int i,j,k; float A[N][N]={...}; float B[N][N]={...}; float C[N][N]={,...,}; for (i = ; i < N ; i++) for (j = ; j < N ; j++) for (k = ; k < N ; k++) /* dot product of row i and column j */ C[i][j]+=A[i][k]*B[k][j]; return ; } A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26
How easy it is to program a product of matrices? Well, in floating-point, it is very easy!! #define N 8 int main () { int i,j,k; float A[N][N]={...}; float B[N][N]={...}; float C[N][N]={,...,}; for (i = ; i < N ; i++) for (j = ; j < N ; j++) for (k = ; k < N ; k++) /* dot product of row i and column j */ C[i][j]+=A[i][k]*B[k][j]; return ; } But, what if the target does not have a floating-point unit? A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26
Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Embedded systems No FPU Highly used in audio and video applications demanding on floating-point computations A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26
Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Applications Embedded systems FP computations No FPU Highly used in audio and video applications demanding on floating-point computations A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26
Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Applications FP computations Embedded systems No FPU Software implementing floating point arithmetic Highly used in audio and video applications demanding on floating-point computations A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26
Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Applications FP computations Embedded systems Conversion No FPU Fixed point Float to Fix conversion is tackled by the ANR project DEFIS LIP6, IRISA, CEA, LIRMM, THALES and INPIXAL A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26
Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 4/26
Background of fixed-point arithmetic Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 5/26
Background of fixed-point arithmetic Basics of fixed-point arithmetic Principles of fixed-point arithmetic Main idea of fixed-point arithmetic: interpret bit words as integers coupled with a scale factor: z 2 n 1 1 z 2 7 + 2 1 = 13 Value in fixed-point 13 2 4 = 27 +2 1 2 4 = 2 3 + 2 3 = 8.125 Integer part Fractional part A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 6/26
Background of fixed-point arithmetic Basics of fixed-point arithmetic Principles of fixed-point arithmetic Main idea of fixed-point arithmetic: interpret bit words as integers coupled with a scale factor: z 2 n 1 1 z 2 7 + 2 1 = 13 Value in fixed-point 13 2 4 = 27 +2 1 2 4 = 2 3 + 2 3 = 8.125 Integer part Fractional part The scale factor (or fixed-point format) is implicit, only the programmer is aware of it A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 6/26
Background of fixed-point arithmetic Basics of fixed-point arithmetic Principles of fixed-point arithmetic Main idea of fixed-point arithmetic: interpret bit words as integers coupled with a scale factor: z 2 n 1 1 z 2 7 + 2 1 = 13 Value in fixed-point 13 2 4 = 27 +2 1 2 4 = 2 3 + 2 3 = 8.125 Integer part Fractional part The scale factor (or fixed-point format) is implicit, only the programmer is aware of it Let us denote by Q a,b a fixed-point format with a integer bits and b fractional bits 1 1 (1.15625) 1 Q 1,7 1 1 Q 2,6 1 1 (2.3125) 1 Q,8 1 1 (.578125) 1 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 6/26
Background of fixed-point arithmetic Basics of fixed-point arithmetic Basic fixed-point operators Addition The two variables have to be in the same fixed-point format The sum of two Q a,b variables yields a Q a+1,b variable truncated 1 1 1 5.625 + 1 1 1 1 1 2.828125 1 1 1 1 1 1 1 7.89625 7.875 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 7/26
Background of fixed-point arithmetic Basics of fixed-point arithmetic Basic fixed-point operators Addition The two variables have to be in the same fixed-point format The sum of two Q a,b variables yields a Q a+1,b variable truncated 1 1 1 5.625 + 1 1 1 1 1 2.828125 1 1 1 1 1 1 1 7.89625 7.875 Multiplication No need for the two variables to have the same fixed-point format The product of a Q a,b variable by a Q c,d variable yields a Q a+c,b+d variable 1 1 1 5.625 truncated 1 1 1 1 1 1.421875 1 1 1 1 1 1 1 1 7.198242187 7.125 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 7/26
Background of fixed-point arithmetic Numerical and combinatorial issues in fixed-point programs First example: a size 3 dot product Let us consider the arithmetic expression: (a b ) +(a 1 b 1 ) +(a 2 b 2 ) and the following input fixed-point formats: a b a 1 b 1 a 2 b 2 Value [.1,1.57] [,1.98] [.1,.87] [1.1,1.86] [,15.4] [2,3.3] Fixed-point format Q 1,7 Q 1,7 Q,8 Q 1,7 Q 4,4 Q 2,6 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 8/26
Background of fixed-point arithmetic Numerical and combinatorial issues in fixed-point programs First example: a size 3 dot product Let us consider the arithmetic expression: (a b ) +(a 1 b 1 ) +(a 2 b 2 ) and the following input fixed-point formats: a b a 1 b 1 a 2 b 2 Value [.1,1.57] [,1.98] [.1,.87] [1.1,1.86] [,15.4] [2,3.3] Fixed-point format Q 1,7 Q 1,7 Q,8 Q 1,7 Q 4,4 Q 2,6 Let us focus on 2 different schemes to compute the sum of products: Q 8,15 + Q 7,15 + Q 2,14 Q 7,15 a + b Q 1,15 Q 6,1 1 1 a b a 2 b 2 in full precision Q 3,15 + Q 6,1 Q 2,14 Q 1,15 a 2 b 2 a b a 1 b 1 (c +(c 1 + c 2 )) ((c + c 1 ) + c 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 8/26
Background of fixed-point arithmetic Numerical and combinatorial issues in fixed-point programs First example: a size 3 dot product Let us consider the arithmetic expression: (a b ) +(a 1 b 1 ) +(a 2 b 2 ) and the following input fixed-point formats: a b a 1 b 1 a 2 b 2 Value [.1,1.57] [,1.98] [.1,.87] [1.1,1.86] [,15.4] [2,3.3] Fixed-point format Q 1,7 Q 1,7 Q,8 Q 1,7 Q 4,4 Q 2,6 Let us focus on 2 different schemes to compute the sum of products: Q 8,8 + Q 7,9 + Q 2,14 Q 7,9 a + b Q 1,15 Q 6,1 1 1 a b a 2 b 2 with 16 bits precision Q 3,13 + Q 6,1 Q 2,14 Q 1,15 a 2 b 2 a b a 1 b 1 (c +(c 1 + c 2 )) ((c + c 1 ) + c 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 8/26
Background of fixed-point arithmetic CGPE The CGPE 1 software tool Written by Revy and Mouilleron to aid in emulating floating-point in software A tool that generates fast and certified code fast that reduce the evaluation latency on a given target, by using the target architecture features (as much as possible) certified for which we can bound the error entailed by the evaluation within the given target s arithmetic back-end middle-end front-end DAG computation Set of DAGs Filter 1 Filter n Decorated DAGs Code generator polynomial.xml <polynomial> <coefficient... >... <variable... >... <error... > </polynomial> architecture.xml <architecture> <instruction name="add" type="unsigned" inputs="32 32" output="32" latency="1" nodes="add dag 1..." macro="static inline..." gappa="..."... /> </architecture> C files Accuracy certificates 1 Code Generation for Polynomial Evaluation A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 9/26
Matrix multiplication in fixed-point Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 1/26
Matrix multiplication in fixed-point Defining the problem We are provided with a black box (CGPE) that synthesises code for dot-products in fixed-point arithmetic C code 2 real interval vectors Accuracy certificate 2 matrices A and B in I(R n n ) [ 4.54,7.78] [.789,.967]... A =...... [12.51,24.14] [.921,.791] [ 64,45.78] [.287,.7]... and, B =...... [125.1,245.14] [ 5.74,7.32] A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 11/26
Matrix multiplication in fixed-point Defining the problem We are provided with a black box (CGPE) that synthesises code for dot-products in fixed-point arithmetic C code 2 real interval vectors Accuracy certificate 2 matrices A and B in I(R n n ) [ 4.54,7.78] [.789,.967]... A =...... [12.51,24.14] [.921,.791] [ 64,45.78] [.287,.7]... and, B =...... [125.1,245.14] [ 5.74,7.32] We are asked to Generate code that evaluates all the products C = MN in fixed-point arithmetic where M A and N B A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 11/26
Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26
Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 1. Size of the generated code We are targeting embedded systems code size should be as tight as possible A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26
Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 1. Size of the generated code We are targeting embedded systems code size should be as tight as possible 2. Accuracy of the generated code Accuracy certificates should be produced that bound the absolute error The guaranteed absolute error should be as tight as possible A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26
Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 1. Size of the generated code We are targeting embedded systems code size should be as tight as possible 2. Accuracy of the generated code Accuracy certificates should be produced that bound the absolute error The guaranteed absolute error should be as tight as possible 3. Speed of generation A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26
Matrix multiplication in fixed-point An accurate algorithm An accurate algorithm Main idea: Generate a dot product code for each coefficient of the resulting matrix AccurateProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: for 1 < i n do 2: for 1 < j n do 3: cgpegendotproduct(a i,b j ); 4: end for 5: end for A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 13/26
Matrix multiplication in fixed-point An accurate algorithm An accurate algorithm Main idea: Generate a dot product code for each coefficient of the resulting matrix AccurateProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: for 1 < i n do 2: for 1 < j n do 3: cgpegendotproduct(a i,b j ); 4: end for 5: end for Illustration on the product of two 2 2 matrices C = ( ) C1,1 = cgpegendotproduct(a 1,B 1 ) C 1,2 = cgpegendotproduct(a 1,B 2 ) C 2,1 = cgpegendotproduct(a 2,B 1 ) C 2,2 = cgpegendotproduct(a 2,B 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 13/26
Matrix multiplication in fixed-point An accurate algorithm Analysis of AccurateProduct For square matrices of size n, n 2 calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n 3 More than 124 instructions for 8 8 matrices A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 14/26
Matrix multiplication in fixed-point An accurate algorithm Analysis of AccurateProduct For square matrices of size n, n 2 calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n 3 More than 124 instructions for 8 8 matrices Advantages Easy to generate code Two nested loops and n 2 calls to the routine cgpegendotproduct The reference in terms of numerical quality Drawbacks Code size is proportional to 2n 3 Similar code sizes are prohibitive in embedded systems A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 14/26
Matrix multiplication in fixed-point A compact algorithm A compact algorithm Main idea: Generate a unique dot product code for all the computations CompactProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: compute v such that v = A 1 A 2 A n 2: compute w such that w = B 1 B 2 B n 3: cgpegendotproduct(v,w); A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 15/26
Matrix multiplication in fixed-point A compact algorithm A compact algorithm Main idea: Generate a unique dot product code for all the computations CompactProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: compute v such that v = A 1 A 2 A n 2: compute w such that w = B 1 B 2 B n 3: cgpegendotproduct(v,w); Illustration on the product of two 2 2 matrices C = ( ) C1,1 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) C 1,2 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) C 2,1 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) C 2,2 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 15/26
Matrix multiplication in fixed-point A compact algorithm Analysis of CompactProduct For square matrices of size n, only one call to the cgpegendotproduct is issued The dot product uses around 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n Around 16 instructions for 8 8 matrices A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 16/26
Matrix multiplication in fixed-point A compact algorithm Analysis of CompactProduct For square matrices of size n, only one call to the cgpegendotproduct is issued The dot product uses around 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n Around 16 instructions for 8 8 matrices Advantages Easy to generate code Compute the union of all vectors of A and B and call the routine cgpegendotproduct The reference in terms of code size Drawbacks Numerical quality deteriorates dramatically A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 16/26
Matrix multiplication in fixed-point Closest pair algorithm A closest pair algorithm Main idea: Fuse together only rows or columns that are close to each other The Hausdorff distance d H dh : I(R n ) I(R { n ) R d H (A,B) = max max ai, } b ai i b i 1 i n A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 17/26
Matrix multiplication in fixed-point Closest pair algorithm A closest pair algorithm Main idea: Fuse together only rows or columns that are close to each other The Hausdorff distance d H dh : I(R n ) I(R { n ) R d H (A,B) = max max ai, } b ai i b i 1 i n Example Let A = ( [ 4, 7] ) [ 11, 12] ( ) and B = [ 2, 88] [ 23, 1] be two vectors in I ( R 2), we have: d H (A,B) = 11 (A,B) = ( ) [ 4, 88] [ 23, 12] a 1 a1 b 1 b1 R (A 1,B 1 ) d H (A 1,B 1 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 17/26
Matrix multiplication in fixed-point Closest pair algorithm ClosestPairFusion ClosestPairFusion Inputs: n vectors, v 1,...,v n in I(R m ) a routine findclosestpair based on d H a routine Union that applies the union operator the number k of output vectors Outputs: k vectors in I(R m ) Steps: 1: B = {v 1,...,v n } 2: while size(b) > k do 3: (u 1,u 2 ) = findclosestpair(b) 4: remove(u 1,B) 5: remove(u 2,B) 6: add(union(u 1,u 2 ),B) 7: end while A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 18/26
Matrix multiplication in fixed-point Closest pair algorithm Illustration of the ClosestPairFusion v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 [ 7,7] [ 4,4] [ 12,12] [ 11,11] v 4 [ 8,8] [ 1,1] [ 1,1] [ 9,9] w 1 w 2 w 3 w 4 [ 3,3] [ 14,14] [ 5,5] [ 6,6] [ 1,1] [ 11,11] [ 3,3] [ 9,9] [ 4,4] [ 8,8] [ 11,11] [ 1,1] [ 9,9] [ 7,7] [ 1,1] [ 2,2] d H (v 1,v 2) d H (v 1,v 3) d H (v 1,v 4) d H (v 2,v 3) d H (v 2,v 4) d H (v 3,v 4) 4 7 5 9 7 3 d H (w 1,w 2) d H (w 1,w 3) d H (w 1,w 4) d H (w 2,w 3) d H (w 2,w 4) d H (w 3,w 4) 11 7 8 9 8 1 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 19/26
Matrix multiplication in fixed-point Closest pair algorithm Illustration of the ClosestPairFusion v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 [ 7,7] [ 4,4] [ 12,12] [ 11,11] v 4 [ 8,8] [ 1,1] [ 1,1] [ 9,9] w 1 w 2 w 3 w 4 [ 3,3] [ 14,14] [ 5,5] [ 6,6] [ 1,1] [ 11,11] [ 3,3] [ 9,9] [ 4,4] [ 8,8] [ 11,11] [ 1,1] [ 9,9] [ 7,7] [ 1,1] [ 2,2] d H (v 1,v 2) d H (v 1,v 3) d H (v 1,v 4) d H (v 2,v 3) d H (v 2,v 4) d H (v 3,v 4) 4 7 5 9 7 3 d H (w 1,w 2) d H (w 1,w 3) d H (w 1,w 4) d H (w 2,w 3) d H (w 2,w 4) d H (w 3,w 4) 11 7 8 9 8 1 v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 v 4 [ 8,8] [ 4,4] [ 12,12] [ 11,11] w 1 w 3 w 2 w 4 [ 5,5] [ 14,14] [ 6,6] [ 3,3] [ 11,11] [ 9,9] [ 11,11] [ 8,8] [ 1,1] [ 1,1] [ 7,7] [ 2,2] d H (v 1,v 2) d H (v 1,v 3 v 4) d H (v 2,v 3 v 4) 4 7 9 d H (w 1 w 3,w 2) d H (w 1 w 3,w 4) d H (w 2,w 4) 9 1 8 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 19/26
Matrix multiplication in fixed-point Closest pair algorithm Illustration of the ClosestPairFusion v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 [ 7,7] [ 4,4] [ 12,12] [ 11,11] v 4 [ 8,8] [ 1,1] [ 1,1] [ 9,9] w 1 w 2 w 3 w 4 [ 3,3] [ 14,14] [ 5,5] [ 6,6] [ 1,1] [ 11,11] [ 3,3] [ 9,9] [ 4,4] [ 8,8] [ 11,11] [ 1,1] [ 9,9] [ 7,7] [ 1,1] [ 2,2] d H (v 1,v 2) d H (v 1,v 3) d H (v 1,v 4) d H (v 2,v 3) d H (v 2,v 4) d H (v 3,v 4) 4 7 5 9 7 3 d H (w 1,w 2) d H (w 1,w 3) d H (w 1,w 4) d H (w 2,w 3) d H (w 2,w 4) d H (w 3,w 4) 11 7 8 9 8 1 v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 v 4 [ 8,8] [ 4,4] [ 12,12] [ 11,11] w 1 w 3 w 2 w 4 [ 5,5] [ 14,14] [ 6,6] [ 3,3] [ 11,11] [ 9,9] [ 11,11] [ 8,8] [ 1,1] [ 1,1] [ 7,7] [ 2,2] d H (v 1,v 2) d H (v 1,v 3 v 4) d H (v 2,v 3 v 4) 4 7 9 d H (w 1 w 3,w 2) d H (w 1 w 3,w 4) d H (w 2,w 4) 9 1 8 v 1 v 2 ( [ 4,4] [ 5,5] [ 5,5] ) [ 9,9] v 3 v 4 [ 8,8] [ 4,4] [ 12,12] [ 11,11] w 1 w 3 w 2 w 4 [ 5,5] [ 14,14] [ 3,3] [ 11,11] [ 11,11] [ 8,8] [ 1,1] [ 7,7] A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 19/26
Matrix multiplication in fixed-point Closest pair algorithm Analysis of the closest pair algorithm For square matrices of size n, k l calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2nkl For 8 8 matrices, the table below gives the number of instructions l k 1 2 4 5 8 1 16 2 4 8 1 16 32 64 8 128 16 256 32 64 128 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 5 8 16 32 4 64 8 128 16 32 64 8 128 256 512 64 124 128 248 256 512 124 1 16 32 64 8 128 16 256 32 64 128 16 256 512 124 128 248 256 496 512 124 248 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 8 128 256 512 64 124 128 248 256 512 124 Advantages Code size can be controlled through the parameters k and l Drawbacks Numerical quality deteriorates with small values of k and l A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26
Matrix multiplication in fixed-point Closest pair algorithm Analysis of the closest pair algorithm For square matrices of size n, k l calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2nkl For 8 8 matrices, the table below gives the number of instructions l k 1 2 4 5 8 1 16 2 4 8 1 16 32 64 8 128 16 256 32 64 128 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 5 8 16 32 4 64 8 128 16 32 64 8 128 256 512 64 124 128 248 256 512 124 1 16 32 64 8 128 16 256 32 64 128 16 256 512 124 128 248 256 496 512 124 248 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 8 128 256 512 64 124 128 248 256 512 124 Advantages Code size can be controlled through the parameters k and l Drawbacks Numerical quality deteriorates with small values of k and l A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26
Matrix multiplication in fixed-point Closest pair algorithm Let us compare these algorithms These results were produced for interval matrices of size 8 8 The center of each interval is randomly selected in [ 1, 1] The diameter of the intervals is fixed to 1 1.5 1.5 2 3 4 5 6.4.3.2 2 3 4 5 6.4.3.2 7.1 7.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 AccurateProduct Largest certified error:.1254 Mean certified error:.865 Number of instructions: 124 CompactProduct Largest certified error:.5585 Mean certified error:.5585 Number of instructions: 16 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 21/26
1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26
Matrix multiplication in fixed-point Closest pair algorithm Let us compare these algorithms, cont d.6 124.55.5 256 Certified bound for the absolute error.45.4.35.3.25.2.15 Maximum certified error Mean certified error Code size 64 496 16 124 4 256 64 Approximate number of instructions.1.5 16 1 5 1 16 2 4 8 k (Number of fused rows and columns) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 23/26
Conclusion Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 24/26
Conclusion In this talk: We suggested 3 strategies to generate code for matrix product in fixed-point arithmetic The accurate algorithm performs well in terms of numerical quality but is prohibitive The compact algorithm generates concise codes but deteriorates the numerical quality The Closest Pair algorithm enables the tradeoffs between code size and numerical quality A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 25/26
Conclusion In this talk: We suggested 3 strategies to generate code for matrix product in fixed-point arithmetic The accurate algorithm performs well in terms of numerical quality but is prohibitive The compact algorithm generates concise codes but deteriorates the numerical quality The Closest Pair algorithm enables the tradeoffs between code size and numerical quality For the future, we will be working on: Suggesting similar algorithms for the discrete convolution in fixed-point arithmetic Investigating the synthesis of VHDL code for building blocks like matrix multiplication A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 25/26
Conclusion 13 th École Jeunes Chercheurs en Informatique Mathématique (EJCIM 213) Perpignan, April 12 th, 213 Synthesis of fixed-point programs: the case of matrix multiplication Amine Najahi Advisers: M. Martel and G. Revy Équipe-projet DALI, Univ. Perpignan Via Domitia LIRMM, CNRS: UMR 556 - Univ. Montpellier 2 DALI A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 26/26
Conclusion Example of code generated by CGPE // File automatically generated by CGPE (Code Generation for Polynomial Evaluation ) // Scheme : (((( a+(s*(a1+(t*a2))))+((s*(t*t))*(a3+(t*a4))))+((s*(t*t))*((t*t)*(a5+(t*a6)))))+(((s*(t*t))*((t*t)*( T*T)))*(( a7+(t*a8))+((t*t)*(a9+(t*a1)))))) // Degree : [9,1] uint32_t func_(uint32_t T, uint32_t S) { uint32_t r = mul(t, x5a82685d ); // 1.31 uint32_t r1 = xb54f31f - r; // 1.31 uint32_t r2 = mul(s, r1); // 2.3 uint32_t r3 = x2 + r2; // 2.3 uint32_t r4 = mul(t, T); //.32 uint32_t r5 = mul(s, r4); // 1.31 uint32_t r6 = mul(t, x386fd5f4 ); // 1.31 uint32_t r7 = x43df72f7 - r6; // 1.31 uint32_t r8 = mul(r5, r7); // 2.3 uint32_t r9 = r3 + r8; // 2.3 uint32_t r1 = mul(t, x287241 ); // 1.31 uint32_t r11 = x38b1798 - r1; // 1.31 uint32_t r12 = mul(r4, r11); // 1.31 uint32_t r13 = mul(r5, r12); // 2.3 uint32_t r14 = r9 + r13; // 2.3 uint32_t r15 = mul(r4, r4); //.32 uint32_t r16 = mul(r5, r15); // 1.31 uint32_t r17 = mul(t, x16c5cd9 ); // 1.31 uint32_t r18 = x1d7bf968 - r17; // 1.31 uint32_t r19 = mul(t, xfa9aa4 ); // 1.31 uint32_t r2 = x5dfffa4 - r19; // 1.31 uint32_t r21 = mul(r4, r2); // 1.31 uint32_t r22 = r18 + r21; // 1.31 uint32_t r23 = mul(r16, r22); // 2.3 uint32_t r24 = r14 + r23; // 2.3 return r24; } /* Error bound computed using MPFI: * [ -11431649573949877792342647494613841987878211621348969225563371141715b-283, * 25586666393682615675492452639411998875812711854693922364875444185767b -282] * ~ [ -2^{ -27.191},2^{ -28.1781}] */ A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 27/26