Synthesis of fixed-point programs: the case of matrix multiplication

Similar documents
BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

Relabeling nodes according to the structure of the graph

DANCer: Dynamic Attributed Network with Community Structure Generator

The optimal routing of augmented cubes.

Efficient implementation of interval matrix multiplication

Self-optimisation using runtime code generation for Wireless Sensor Networks Internet-of-Things

An Experimental Assessment of the 2D Visibility Complex

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Scan chain encryption in Test Standards

Stream Ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

Linux: Understanding Process-Level Power Consumption

LaHC at CLEF 2015 SBS Lab

Hardware support for UNUM floating point arithmetic

Moveability and Collision Analysis for Fully-Parallel Manipulators

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

Catalogue of architectural patterns characterized by constraint components, Version 1.0

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

Efficient Gradient Method for Locally Optimizing the Periodic/Aperiodic Ambiguity Function

Taking Benefit from the User Density in Large Cities for Delivering SMS

Real-Time Collision Detection for Dynamic Virtual Environments

Mokka, main guidelines and future

Multimedia CTI Services for Telecommunication Systems

Computing and maximizing the exact reliability of wireless backhaul networks

SDLS: a Matlab package for solving conic least-squares problems

Setup of epiphytic assistance systems with SEPIA

Tacked Link List - An Improved Linked List for Advance Resource Reservation

Induced minors and well-quasi-ordering

Linked data from your pocket: The Android RDFContentProvider

How to simulate a volume-controlled flooding with mathematical morphology operators?

THE COVERING OF ANCHORED RECTANGLES UP TO FIVE POINTS

XBenchMatch: a Benchmark for XML Schema Matching Tools

Assisted Policy Management for SPARQL Endpoints Access Control

Every 3-connected, essentially 11-connected line graph is hamiltonian

Representation of Finite Games as Network Congestion Games

Comparison of spatial indexes

Robust IP and UDP-lite header recovery for packetized multimedia transmission

X-Kaapi C programming interface

Light field video dataset captured by a R8 Raytrix camera (with disparity maps)

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

Comparison of radiosity and ray-tracing methods for coupled rooms

An Efficient Numerical Inverse Scattering Algorithm for Generalized Zakharov-Shabat Equations with Two Potential Functions

The Connectivity Order of Links

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Comparator: A Tool for Quantifying Behavioural Compatibility

YAM++ : A multi-strategy based approach for Ontology matching task

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

Service Reconfiguration in the DANAH Assistive System

A Voronoi-Based Hybrid Meshing Method

Malware models for network and service management

Deformetrica: a software for statistical analysis of anatomical shapes

[Demo] A webtool for analyzing land-use planning documents

QuickRanking: Fast Algorithm For Sorting And Ranking Data

Traffic Grooming in Bidirectional WDM Ring Networks

Acyclic Coloring of Graphs of Maximum Degree

Implementing decimal floating-point arithmetic through binary: some suggestions

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

Simulations of VANET Scenarios with OPNET and SUMO

Implementing an Automatic Functional Test Pattern Generation for Mixed-Signal Boards in a Maintenance Context

The New Territory of Lightweight Security in a Cloud Computing Environment

From medical imaging to numerical simulations

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

Multi-mode Operator for SHA-2 Hash Functions

Weed Leaf Recognition in Complex Natural Scenes by Model-Guided Edge Pairing

Accurate Conversion of Earth-Fixed Earth-Centered Coordinates to Geodetic Coordinates

Workspace and joint space analysis of the 3-RPS parallel robot

Real-Time and Resilient Intrusion Detection: A Flow-Based Approach

Modularity for Java and How OSGi Can Help

Framework for Hierarchical and Distributed Smart Grid Management

Kernel perfect and critical kernel imperfect digraphs structure

HySCaS: Hybrid Stereoscopic Calibration Software

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

Quasi-tilings. Dominique Rossin, Daniel Krob, Sebastien Desreux

RecordMe: A Smartphone Application for Experimental Collections of Large Amount of Data Respecting Volunteer s Privacy

A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer

IMPLEMENTATION OF MOTION ESTIMATION BASED ON HETEROGENEOUS PARALLEL COMPUTING SYSTEM WITH OPENC

Constraint Programming and Combinatorial Optimisation in Numberjack

Zigbee Wireless Sensor Network Nodes Deployment Strategy for Digital Agricultural Data Acquisition

Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores

Approximate Computing with Runtime Code Generation on Resource-Constrained Embedded Devices

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

Primitive roots of bi-periodic infinite pictures

A 64-Kbytes ITTAGE indirect branch predictor

Fuzzy sensor for the perception of colour

A Methodology for Improving Software Design Lifecycle in Embedded Control Systems

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

Time-Division Multiplexing Architecture for Hybrid Filter Bank A/D converters

Lossless and Lossy Minimal Redundancy Pyramidal Decomposition for Scalable Image Compression Technique

Multidimensional Data Streams Summarization Using Extended Tilted-Time Windows

Combined video and laser camera for inspection of old mine shafts

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

Modelling approaches for anion-exclusion in compacted Na-bentonite

An Object Oriented Finite Element Toolkit

Transcription:

Synthesis of fixed-point programs: the case of matrix multiplication Mohamed Amine Najahi To cite this version: Mohamed Amine Najahi. Synthesis of fixed-point programs: the case of matrix multiplication. EJCIM: École Jeunes Chercheurs en Informatique Mathématique, Apr 213, Perpignan, France. 13th École Jeunes Chercheurs en Informatique Mathématique (EJCIM 213) Perpignan, April 12th, 213, 213. <lirmm-1277362> HAL Id: lirmm-1277362 https://hal-lirmm.ccsd.cnrs.fr/lirmm-1277362 Submitted on 22 Feb 216 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

13 th École Jeunes Chercheurs en Informatique Mathématique (EJCIM 213) Perpignan, April 12 th, 213 Synthesis of fixed-point programs: the case of matrix multiplication Amine Najahi Advisers: M. Martel and G. Revy Équipe-projet DALI, Univ. Perpignan Via Domitia LIRMM, CNRS: UMR 556 - Univ. Montpellier 2 DALI A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 1/26

How easy it is to program a product of matrices? A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26

How easy it is to program a product of matrices? Well, in floating-point, it is very easy!! #define N 8 int main () { int i,j,k; float A[N][N]={...}; float B[N][N]={...}; float C[N][N]={,...,}; for (i = ; i < N ; i++) for (j = ; j < N ; j++) for (k = ; k < N ; k++) /* dot product of row i and column j */ C[i][j]+=A[i][k]*B[k][j]; return ; } A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26

How easy it is to program a product of matrices? Well, in floating-point, it is very easy!! #define N 8 int main () { int i,j,k; float A[N][N]={...}; float B[N][N]={...}; float C[N][N]={,...,}; for (i = ; i < N ; i++) for (j = ; j < N ; j++) for (k = ; k < N ; k++) /* dot product of row i and column j */ C[i][j]+=A[i][k]*B[k][j]; return ; } But, what if the target does not have a floating-point unit? A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26

Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Embedded systems No FPU Highly used in audio and video applications demanding on floating-point computations A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26

Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Applications Embedded systems FP computations No FPU Highly used in audio and video applications demanding on floating-point computations A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26

Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Applications FP computations Embedded systems No FPU Software implementing floating point arithmetic Highly used in audio and video applications demanding on floating-point computations A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26

Motivation Embedded systems are ubiquitous microprocessors and/or DSPs dedicated to one or a few specific tasks satisfy constraints: area, energy consumption, conception cost Some embedded systems do not have any FPU (floating-point unit) Applications FP computations Embedded systems Conversion No FPU Fixed point Float to Fix conversion is tackled by the ANR project DEFIS LIP6, IRISA, CEA, LIRMM, THALES and INPIXAL A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 3/26

Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 4/26

Background of fixed-point arithmetic Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 5/26

Background of fixed-point arithmetic Basics of fixed-point arithmetic Principles of fixed-point arithmetic Main idea of fixed-point arithmetic: interpret bit words as integers coupled with a scale factor: z 2 n 1 1 z 2 7 + 2 1 = 13 Value in fixed-point 13 2 4 = 27 +2 1 2 4 = 2 3 + 2 3 = 8.125 Integer part Fractional part A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 6/26

Background of fixed-point arithmetic Basics of fixed-point arithmetic Principles of fixed-point arithmetic Main idea of fixed-point arithmetic: interpret bit words as integers coupled with a scale factor: z 2 n 1 1 z 2 7 + 2 1 = 13 Value in fixed-point 13 2 4 = 27 +2 1 2 4 = 2 3 + 2 3 = 8.125 Integer part Fractional part The scale factor (or fixed-point format) is implicit, only the programmer is aware of it A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 6/26

Background of fixed-point arithmetic Basics of fixed-point arithmetic Principles of fixed-point arithmetic Main idea of fixed-point arithmetic: interpret bit words as integers coupled with a scale factor: z 2 n 1 1 z 2 7 + 2 1 = 13 Value in fixed-point 13 2 4 = 27 +2 1 2 4 = 2 3 + 2 3 = 8.125 Integer part Fractional part The scale factor (or fixed-point format) is implicit, only the programmer is aware of it Let us denote by Q a,b a fixed-point format with a integer bits and b fractional bits 1 1 (1.15625) 1 Q 1,7 1 1 Q 2,6 1 1 (2.3125) 1 Q,8 1 1 (.578125) 1 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 6/26

Background of fixed-point arithmetic Basics of fixed-point arithmetic Basic fixed-point operators Addition The two variables have to be in the same fixed-point format The sum of two Q a,b variables yields a Q a+1,b variable truncated 1 1 1 5.625 + 1 1 1 1 1 2.828125 1 1 1 1 1 1 1 7.89625 7.875 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 7/26

Background of fixed-point arithmetic Basics of fixed-point arithmetic Basic fixed-point operators Addition The two variables have to be in the same fixed-point format The sum of two Q a,b variables yields a Q a+1,b variable truncated 1 1 1 5.625 + 1 1 1 1 1 2.828125 1 1 1 1 1 1 1 7.89625 7.875 Multiplication No need for the two variables to have the same fixed-point format The product of a Q a,b variable by a Q c,d variable yields a Q a+c,b+d variable 1 1 1 5.625 truncated 1 1 1 1 1 1.421875 1 1 1 1 1 1 1 1 7.198242187 7.125 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 7/26

Background of fixed-point arithmetic Numerical and combinatorial issues in fixed-point programs First example: a size 3 dot product Let us consider the arithmetic expression: (a b ) +(a 1 b 1 ) +(a 2 b 2 ) and the following input fixed-point formats: a b a 1 b 1 a 2 b 2 Value [.1,1.57] [,1.98] [.1,.87] [1.1,1.86] [,15.4] [2,3.3] Fixed-point format Q 1,7 Q 1,7 Q,8 Q 1,7 Q 4,4 Q 2,6 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 8/26

Background of fixed-point arithmetic Numerical and combinatorial issues in fixed-point programs First example: a size 3 dot product Let us consider the arithmetic expression: (a b ) +(a 1 b 1 ) +(a 2 b 2 ) and the following input fixed-point formats: a b a 1 b 1 a 2 b 2 Value [.1,1.57] [,1.98] [.1,.87] [1.1,1.86] [,15.4] [2,3.3] Fixed-point format Q 1,7 Q 1,7 Q,8 Q 1,7 Q 4,4 Q 2,6 Let us focus on 2 different schemes to compute the sum of products: Q 8,15 + Q 7,15 + Q 2,14 Q 7,15 a + b Q 1,15 Q 6,1 1 1 a b a 2 b 2 in full precision Q 3,15 + Q 6,1 Q 2,14 Q 1,15 a 2 b 2 a b a 1 b 1 (c +(c 1 + c 2 )) ((c + c 1 ) + c 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 8/26

Background of fixed-point arithmetic Numerical and combinatorial issues in fixed-point programs First example: a size 3 dot product Let us consider the arithmetic expression: (a b ) +(a 1 b 1 ) +(a 2 b 2 ) and the following input fixed-point formats: a b a 1 b 1 a 2 b 2 Value [.1,1.57] [,1.98] [.1,.87] [1.1,1.86] [,15.4] [2,3.3] Fixed-point format Q 1,7 Q 1,7 Q,8 Q 1,7 Q 4,4 Q 2,6 Let us focus on 2 different schemes to compute the sum of products: Q 8,8 + Q 7,9 + Q 2,14 Q 7,9 a + b Q 1,15 Q 6,1 1 1 a b a 2 b 2 with 16 bits precision Q 3,13 + Q 6,1 Q 2,14 Q 1,15 a 2 b 2 a b a 1 b 1 (c +(c 1 + c 2 )) ((c + c 1 ) + c 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 8/26

Background of fixed-point arithmetic CGPE The CGPE 1 software tool Written by Revy and Mouilleron to aid in emulating floating-point in software A tool that generates fast and certified code fast that reduce the evaluation latency on a given target, by using the target architecture features (as much as possible) certified for which we can bound the error entailed by the evaluation within the given target s arithmetic back-end middle-end front-end DAG computation Set of DAGs Filter 1 Filter n Decorated DAGs Code generator polynomial.xml <polynomial> <coefficient... >... <variable... >... <error... > </polynomial> architecture.xml <architecture> <instruction name="add" type="unsigned" inputs="32 32" output="32" latency="1" nodes="add dag 1..." macro="static inline..." gappa="..."... /> </architecture> C files Accuracy certificates 1 Code Generation for Polynomial Evaluation A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 9/26

Matrix multiplication in fixed-point Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 1/26

Matrix multiplication in fixed-point Defining the problem We are provided with a black box (CGPE) that synthesises code for dot-products in fixed-point arithmetic C code 2 real interval vectors Accuracy certificate 2 matrices A and B in I(R n n ) [ 4.54,7.78] [.789,.967]... A =...... [12.51,24.14] [.921,.791] [ 64,45.78] [.287,.7]... and, B =...... [125.1,245.14] [ 5.74,7.32] A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 11/26

Matrix multiplication in fixed-point Defining the problem We are provided with a black box (CGPE) that synthesises code for dot-products in fixed-point arithmetic C code 2 real interval vectors Accuracy certificate 2 matrices A and B in I(R n n ) [ 4.54,7.78] [.789,.967]... A =...... [12.51,24.14] [.921,.791] [ 64,45.78] [.287,.7]... and, B =...... [125.1,245.14] [ 5.74,7.32] We are asked to Generate code that evaluates all the products C = MN in fixed-point arithmetic where M A and N B A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 11/26

Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26

Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 1. Size of the generated code We are targeting embedded systems code size should be as tight as possible A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26

Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 1. Size of the generated code We are targeting embedded systems code size should be as tight as possible 2. Accuracy of the generated code Accuracy certificates should be produced that bound the absolute error The guaranteed absolute error should be as tight as possible A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26

Matrix multiplication in fixed-point Tradeoffs to consider Remark: The suggested strategy should be efficient in terms of the tradeoffs below for all matrices of size smaller than 8 8 1. Size of the generated code We are targeting embedded systems code size should be as tight as possible 2. Accuracy of the generated code Accuracy certificates should be produced that bound the absolute error The guaranteed absolute error should be as tight as possible 3. Speed of generation A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 12/26

Matrix multiplication in fixed-point An accurate algorithm An accurate algorithm Main idea: Generate a dot product code for each coefficient of the resulting matrix AccurateProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: for 1 < i n do 2: for 1 < j n do 3: cgpegendotproduct(a i,b j ); 4: end for 5: end for A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 13/26

Matrix multiplication in fixed-point An accurate algorithm An accurate algorithm Main idea: Generate a dot product code for each coefficient of the resulting matrix AccurateProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: for 1 < i n do 2: for 1 < j n do 3: cgpegendotproduct(a i,b j ); 4: end for 5: end for Illustration on the product of two 2 2 matrices C = ( ) C1,1 = cgpegendotproduct(a 1,B 1 ) C 1,2 = cgpegendotproduct(a 1,B 2 ) C 2,1 = cgpegendotproduct(a 2,B 1 ) C 2,2 = cgpegendotproduct(a 2,B 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 13/26

Matrix multiplication in fixed-point An accurate algorithm Analysis of AccurateProduct For square matrices of size n, n 2 calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n 3 More than 124 instructions for 8 8 matrices A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 14/26

Matrix multiplication in fixed-point An accurate algorithm Analysis of AccurateProduct For square matrices of size n, n 2 calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n 3 More than 124 instructions for 8 8 matrices Advantages Easy to generate code Two nested loops and n 2 calls to the routine cgpegendotproduct The reference in terms of numerical quality Drawbacks Code size is proportional to 2n 3 Similar code sizes are prohibitive in embedded systems A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 14/26

Matrix multiplication in fixed-point A compact algorithm A compact algorithm Main idea: Generate a unique dot product code for all the computations CompactProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: compute v such that v = A 1 A 2 A n 2: compute w such that w = B 1 B 2 B n 3: cgpegendotproduct(v,w); A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 15/26

Matrix multiplication in fixed-point A compact algorithm A compact algorithm Main idea: Generate a unique dot product code for all the computations CompactProduct Inputs: Two square matrices A I(R n n ) and B I(R n n ) Outputs: C code to compute the product MN for all M A and N B Steps: 1: compute v such that v = A 1 A 2 A n 2: compute w such that w = B 1 B 2 B n 3: cgpegendotproduct(v,w); Illustration on the product of two 2 2 matrices C = ( ) C1,1 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) C 1,2 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) C 2,1 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) C 2,2 = cgpegendotproduct(a 1 A 2,B 1 B 2 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 15/26

Matrix multiplication in fixed-point A compact algorithm Analysis of CompactProduct For square matrices of size n, only one call to the cgpegendotproduct is issued The dot product uses around 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n Around 16 instructions for 8 8 matrices A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 16/26

Matrix multiplication in fixed-point A compact algorithm Analysis of CompactProduct For square matrices of size n, only one call to the cgpegendotproduct is issued The dot product uses around 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2n Around 16 instructions for 8 8 matrices Advantages Easy to generate code Compute the union of all vectors of A and B and call the routine cgpegendotproduct The reference in terms of code size Drawbacks Numerical quality deteriorates dramatically A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 16/26

Matrix multiplication in fixed-point Closest pair algorithm A closest pair algorithm Main idea: Fuse together only rows or columns that are close to each other The Hausdorff distance d H dh : I(R n ) I(R { n ) R d H (A,B) = max max ai, } b ai i b i 1 i n A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 17/26

Matrix multiplication in fixed-point Closest pair algorithm A closest pair algorithm Main idea: Fuse together only rows or columns that are close to each other The Hausdorff distance d H dh : I(R n ) I(R { n ) R d H (A,B) = max max ai, } b ai i b i 1 i n Example Let A = ( [ 4, 7] ) [ 11, 12] ( ) and B = [ 2, 88] [ 23, 1] be two vectors in I ( R 2), we have: d H (A,B) = 11 (A,B) = ( ) [ 4, 88] [ 23, 12] a 1 a1 b 1 b1 R (A 1,B 1 ) d H (A 1,B 1 ) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 17/26

Matrix multiplication in fixed-point Closest pair algorithm ClosestPairFusion ClosestPairFusion Inputs: n vectors, v 1,...,v n in I(R m ) a routine findclosestpair based on d H a routine Union that applies the union operator the number k of output vectors Outputs: k vectors in I(R m ) Steps: 1: B = {v 1,...,v n } 2: while size(b) > k do 3: (u 1,u 2 ) = findclosestpair(b) 4: remove(u 1,B) 5: remove(u 2,B) 6: add(union(u 1,u 2 ),B) 7: end while A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 18/26

Matrix multiplication in fixed-point Closest pair algorithm Illustration of the ClosestPairFusion v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 [ 7,7] [ 4,4] [ 12,12] [ 11,11] v 4 [ 8,8] [ 1,1] [ 1,1] [ 9,9] w 1 w 2 w 3 w 4 [ 3,3] [ 14,14] [ 5,5] [ 6,6] [ 1,1] [ 11,11] [ 3,3] [ 9,9] [ 4,4] [ 8,8] [ 11,11] [ 1,1] [ 9,9] [ 7,7] [ 1,1] [ 2,2] d H (v 1,v 2) d H (v 1,v 3) d H (v 1,v 4) d H (v 2,v 3) d H (v 2,v 4) d H (v 3,v 4) 4 7 5 9 7 3 d H (w 1,w 2) d H (w 1,w 3) d H (w 1,w 4) d H (w 2,w 3) d H (w 2,w 4) d H (w 3,w 4) 11 7 8 9 8 1 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 19/26

Matrix multiplication in fixed-point Closest pair algorithm Illustration of the ClosestPairFusion v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 [ 7,7] [ 4,4] [ 12,12] [ 11,11] v 4 [ 8,8] [ 1,1] [ 1,1] [ 9,9] w 1 w 2 w 3 w 4 [ 3,3] [ 14,14] [ 5,5] [ 6,6] [ 1,1] [ 11,11] [ 3,3] [ 9,9] [ 4,4] [ 8,8] [ 11,11] [ 1,1] [ 9,9] [ 7,7] [ 1,1] [ 2,2] d H (v 1,v 2) d H (v 1,v 3) d H (v 1,v 4) d H (v 2,v 3) d H (v 2,v 4) d H (v 3,v 4) 4 7 5 9 7 3 d H (w 1,w 2) d H (w 1,w 3) d H (w 1,w 4) d H (w 2,w 3) d H (w 2,w 4) d H (w 3,w 4) 11 7 8 9 8 1 v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 v 4 [ 8,8] [ 4,4] [ 12,12] [ 11,11] w 1 w 3 w 2 w 4 [ 5,5] [ 14,14] [ 6,6] [ 3,3] [ 11,11] [ 9,9] [ 11,11] [ 8,8] [ 1,1] [ 1,1] [ 7,7] [ 2,2] d H (v 1,v 2) d H (v 1,v 3 v 4) d H (v 2,v 3 v 4) 4 7 9 d H (w 1 w 3,w 2) d H (w 1 w 3,w 4) d H (w 2,w 4) 9 1 8 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 19/26

Matrix multiplication in fixed-point Closest pair algorithm Illustration of the ClosestPairFusion v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 [ 7,7] [ 4,4] [ 12,12] [ 11,11] v 4 [ 8,8] [ 1,1] [ 1,1] [ 9,9] w 1 w 2 w 3 w 4 [ 3,3] [ 14,14] [ 5,5] [ 6,6] [ 1,1] [ 11,11] [ 3,3] [ 9,9] [ 4,4] [ 8,8] [ 11,11] [ 1,1] [ 9,9] [ 7,7] [ 1,1] [ 2,2] d H (v 1,v 2) d H (v 1,v 3) d H (v 1,v 4) d H (v 2,v 3) d H (v 2,v 4) d H (v 3,v 4) 4 7 5 9 7 3 d H (w 1,w 2) d H (w 1,w 3) d H (w 1,w 4) d H (w 2,w 3) d H (w 2,w 4) d H (w 3,w 4) 11 7 8 9 8 1 v 1 [ 4,4] [ 5,5] [ 5,5] [ 6,6] v 2 [ 2,2] [ 1,1] [ 3,3] [ 9,9] v 3 v 4 [ 8,8] [ 4,4] [ 12,12] [ 11,11] w 1 w 3 w 2 w 4 [ 5,5] [ 14,14] [ 6,6] [ 3,3] [ 11,11] [ 9,9] [ 11,11] [ 8,8] [ 1,1] [ 1,1] [ 7,7] [ 2,2] d H (v 1,v 2) d H (v 1,v 3 v 4) d H (v 2,v 3 v 4) 4 7 9 d H (w 1 w 3,w 2) d H (w 1 w 3,w 4) d H (w 2,w 4) 9 1 8 v 1 v 2 ( [ 4,4] [ 5,5] [ 5,5] ) [ 9,9] v 3 v 4 [ 8,8] [ 4,4] [ 12,12] [ 11,11] w 1 w 3 w 2 w 4 [ 5,5] [ 14,14] [ 3,3] [ 11,11] [ 11,11] [ 8,8] [ 1,1] [ 7,7] A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 19/26

Matrix multiplication in fixed-point Closest pair algorithm Analysis of the closest pair algorithm For square matrices of size n, k l calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2nkl For 8 8 matrices, the table below gives the number of instructions l k 1 2 4 5 8 1 16 2 4 8 1 16 32 64 8 128 16 256 32 64 128 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 5 8 16 32 4 64 8 128 16 32 64 8 128 256 512 64 124 128 248 256 512 124 1 16 32 64 8 128 16 256 32 64 128 16 256 512 124 128 248 256 496 512 124 248 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 8 128 256 512 64 124 128 248 256 512 124 Advantages Code size can be controlled through the parameters k and l Drawbacks Numerical quality deteriorates with small values of k and l A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26

Matrix multiplication in fixed-point Closest pair algorithm Analysis of the closest pair algorithm For square matrices of size n, k l calls to the cgpegendotproduct are issued Each dot product uses more than 2n instructions (n multiplications + n additions) The generated code for the product is proportional in size to 2nkl For 8 8 matrices, the table below gives the number of instructions l k 1 2 4 5 8 1 16 2 4 8 1 16 32 64 8 128 16 256 32 64 128 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 5 8 16 32 4 64 8 128 16 32 64 8 128 256 512 64 124 128 248 256 512 124 1 16 32 64 8 128 16 256 32 64 128 16 256 512 124 128 248 256 496 512 124 248 2 32 64 128 16 256 32 512 64 128 256 4 64 128 256 32 512 64 124 128 256 512 8 128 256 512 64 124 128 248 256 512 124 Advantages Code size can be controlled through the parameters k and l Drawbacks Numerical quality deteriorates with small values of k and l A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 2/26

Matrix multiplication in fixed-point Closest pair algorithm Let us compare these algorithms These results were produced for interval matrices of size 8 8 The center of each interval is randomly selected in [ 1, 1] The diameter of the intervals is fixed to 1 1.5 1.5 2 3 4 5 6.4.3.2 2 3 4 5 6.4.3.2 7.1 7.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 AccurateProduct Largest certified error:.1254 Mean certified error:.865 Number of instructions: 124 CompactProduct Largest certified error:.5585 Mean certified error:.5585 Number of instructions: 16 A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 21/26

1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Matrix multiplication in fixed-point 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Closest pair algorithm 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7.5.4.3.2.1.5.4.3.2.1.5.4.3.2.1 Let us compare these algorithms, cont d A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 22/26

Matrix multiplication in fixed-point Closest pair algorithm Let us compare these algorithms, cont d.6 124.55.5 256 Certified bound for the absolute error.45.4.35.3.25.2.15 Maximum certified error Mean certified error Code size 64 496 16 124 4 256 64 Approximate number of instructions.1.5 16 1 5 1 16 2 4 8 k (Number of fused rows and columns) A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 23/26

Conclusion Outline of the talk 1. Background of fixed-point arithmetic 1.1 Basics of fixed-point arithmetic 1.2 Numerical and combinatorial issues in fixed-point programs 1.3 CGPE 2. Matrix multiplication in fixed-point 2.1 An accurate algorithm 2.2 A compact algorithm 2.3 Closest pair algorithm 3. Conclusion A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 24/26

Conclusion In this talk: We suggested 3 strategies to generate code for matrix product in fixed-point arithmetic The accurate algorithm performs well in terms of numerical quality but is prohibitive The compact algorithm generates concise codes but deteriorates the numerical quality The Closest Pair algorithm enables the tradeoffs between code size and numerical quality A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 25/26

Conclusion In this talk: We suggested 3 strategies to generate code for matrix product in fixed-point arithmetic The accurate algorithm performs well in terms of numerical quality but is prohibitive The compact algorithm generates concise codes but deteriorates the numerical quality The Closest Pair algorithm enables the tradeoffs between code size and numerical quality For the future, we will be working on: Suggesting similar algorithms for the discrete convolution in fixed-point arithmetic Investigating the synthesis of VHDL code for building blocks like matrix multiplication A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 25/26

Conclusion 13 th École Jeunes Chercheurs en Informatique Mathématique (EJCIM 213) Perpignan, April 12 th, 213 Synthesis of fixed-point programs: the case of matrix multiplication Amine Najahi Advisers: M. Martel and G. Revy Équipe-projet DALI, Univ. Perpignan Via Domitia LIRMM, CNRS: UMR 556 - Univ. Montpellier 2 DALI A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 26/26

Conclusion Example of code generated by CGPE // File automatically generated by CGPE (Code Generation for Polynomial Evaluation ) // Scheme : (((( a+(s*(a1+(t*a2))))+((s*(t*t))*(a3+(t*a4))))+((s*(t*t))*((t*t)*(a5+(t*a6)))))+(((s*(t*t))*((t*t)*( T*T)))*(( a7+(t*a8))+((t*t)*(a9+(t*a1)))))) // Degree : [9,1] uint32_t func_(uint32_t T, uint32_t S) { uint32_t r = mul(t, x5a82685d ); // 1.31 uint32_t r1 = xb54f31f - r; // 1.31 uint32_t r2 = mul(s, r1); // 2.3 uint32_t r3 = x2 + r2; // 2.3 uint32_t r4 = mul(t, T); //.32 uint32_t r5 = mul(s, r4); // 1.31 uint32_t r6 = mul(t, x386fd5f4 ); // 1.31 uint32_t r7 = x43df72f7 - r6; // 1.31 uint32_t r8 = mul(r5, r7); // 2.3 uint32_t r9 = r3 + r8; // 2.3 uint32_t r1 = mul(t, x287241 ); // 1.31 uint32_t r11 = x38b1798 - r1; // 1.31 uint32_t r12 = mul(r4, r11); // 1.31 uint32_t r13 = mul(r5, r12); // 2.3 uint32_t r14 = r9 + r13; // 2.3 uint32_t r15 = mul(r4, r4); //.32 uint32_t r16 = mul(r5, r15); // 1.31 uint32_t r17 = mul(t, x16c5cd9 ); // 1.31 uint32_t r18 = x1d7bf968 - r17; // 1.31 uint32_t r19 = mul(t, xfa9aa4 ); // 1.31 uint32_t r2 = x5dfffa4 - r19; // 1.31 uint32_t r21 = mul(r4, r2); // 1.31 uint32_t r22 = r18 + r21; // 1.31 uint32_t r23 = mul(r16, r22); // 2.3 uint32_t r24 = r14 + r23; // 2.3 return r24; } /* Error bound computed using MPFI: * [ -11431649573949877792342647494613841987878211621348969225563371141715b-283, * 25586666393682615675492452639411998875812711854693922364875444185767b -282] * ~ [ -2^{ -27.191},2^{ -28.1781}] */ A. Najahi (DALI UPVD/LIRMM, CNRS, UM2) Synthesis of fixed-point programs: the case of matrix multiplication 27/26