Lecture 10. Floating point arithmetic GPUs in perspective

Similar documents
CSE 160 Lecture 14. Floating Point Arithmetic Condition Variables

Lecture 17 Floating point

Lecture 6. Floating Point Arithmetic Stencil Methods Introduction to OpenMP

Lecture 15 Floating point

FP_IEEE_DENORM_GET_ Procedure

Floating-point representations

Floating-point representations

Floating Point Numbers

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

System Programming CISC 360. Floating Point September 16, 2008

2 Computation with Floating-Point Numbers

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points. Jo, Heeseung

Binary floating point encodings

Foundations of Computer Systems

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Classes of Real Numbers 1/2. The Real Line

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

Floating Point January 24, 2008

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

2 Computation with Floating-Point Numbers

Up next. Midterm. Today s lecture. To follow

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Representing and Manipulating Floating Points

Efficient floating point: software and hardware

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point Numbers. Lecture 9 CAP

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Representing and Manipulating Floating Points

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Floating-point numbers. Phys 420/580 Lecture 6

Floating-Point Arithmetic

Data Representation Floating Point

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Data Representation Floating Point

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Giving credit where credit is due

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Giving credit where credit is due

Data Representation Floating Point

Scientific Computing. Error Analysis

IEEE Standard 754 Floating Point Numbers

Finite arithmetic and error analysis

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

COMP2611: Computer Organization. Data Representation

Floating-point representation

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Floating Point Numbers

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

CSCI 402: Computer Architectures

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

Computer Organization: A Programmer's Perspective

Floating Point Numbers

Floating Point Numbers

Number Representations

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

Floating-Point Numbers in Digital Computers

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic

Computer Systems C S Cynthia Lee

Floating-Point Numbers in Digital Computers

Floating-point operations I

AMTH142 Lecture 10. Scilab Graphs Floating Point Arithmetic

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Module 12 Floating-Point Considerations

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Decimal & Binary Representation Systems. Decimal & Binary Representation Systems

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

CO212 Lecture 10: Arithmetic & Logical Unit

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

NAN propagation versus fault trapping in floating point code

Floating Point Representation in Computers

Basics of Computation. PHY 604:Computational Methods in Physics and Astrophysics II

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation

Module 12 Floating-Point Considerations

.. \ , Floating Point. Arithmetic. (The IEEE Standard) :l.

CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued

Precision and Accuracy

Computational Economics and Finance

CS429: Computer Organization and Architecture

Divide: Paper & Pencil

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

UC Berkeley CS61C : Machine Structures

Co-processor Math Processor. Richa Upadhyay Prabhu. NMIMS s MPSTME February 9, 2016

CO Computer Architecture and Programming Languages CAPL. Lecture 15

The Perils of Floating Point

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

Computer Arithmetic Ch 8

High Performance Computing - Floating Point Numbers. Prof Matt Probert

Scientific Computing: An Introductory Survey

Transcription:

Lecture 10 Floating point arithmetic GPUs in perspective

Announcements Interactive use on Forge Trestles accounts? A4 2012 Scott B. Baden /CSE 260/ Winter 2012 2

Today s lecture Floating point arithmetic GPU vs CPUs 2012 Scott B. Baden /CSE 260/ Winter 2012 3

Floating Point Arithmetic 2/7/12 2012 Scott B. Baden /CSE 260/ Winter 2012 4

A representation ±2.5732 10 22 What is floating point? NaN Single, double, extended precision A set of operations + = * / rem Comparison < = > Conversions between different formats, binary to decimal Exception handling IEEE Floating point standard P754 Universally accepted W. Kahan received the Turing Award in 1989 for design of IEEE Floating Point Standard Revision in 2008 2012 Scott B. Baden /CSE 260/ Winter 2012 5

IEEE Floating point standard P754 Normalized representation ±1.d d 2 eps -#significand bits Macheps = Machine epsilon = ε = 2 relative error in each operation OV = overflow threshold = largest number UN = underflow threshold = smallest number ±Zero: ±significand and exponent = 0 Format # bits #significand bits macheps #exponent bits exponent range ---------- -------- ---------------------- ------------ -------------------- ---------------------- Single 32 23+1 2-24 (~10-7 ) 8 2-126 - 2 127 (~10 +-38 ) Double 64 52+1 2-53 (~10-16 ) 11 2-1022 - 2 1023 (~10 +-308 ) Double 80 64 2-64 (~10-19 ) 15 2-16382 - 2 16383 (~10 +-4932 ) Jim Demmel 2012 Scott B. Baden /CSE 260/ Winter 2012 6

What happens in a floating point operation? Round to the nearest representable floating point number that corresponds to the exact value(correct rounding) Round to nearest value with the lowest order bit =0 (rounding toward nearest even) Others are possible We don t need the exact value to work this out! Applies to + = * / rem Error formula: fl(a op b) = (a op b)*(1 + δ) where op one of +, -, *, / δ ε assuming no overflow, underflow, or divide by zero Addition example fl( x i ) = i=1:n x i *(1+e i ) e i < (n-1)ε 2012 Scott B. Baden /CSE 260/ Winter 2012 7

Exception Handling An exception occurs when the result of a floating point operation is not representable as a normalized floating point number 1/0, -1 P754 standardizes how we handle exceptions Overflow: - exact result > OV, too large to represent Underflow: exact result nonzero and < UN, too small to represent Divide-by-zero: nonzero/0 Invalid: 0/0, -1, log(0), etc. Inexact: there was a rounding error (common) Two possible responses Stop the program, given an error message Tolerate the exeption 2012 Scott B. Baden /CSE 260/ Winter 2012 8

Graph the function An example f(x) = sin(x) / x But we get a singularity @ x=0: 1/x = This is an accident in how we represent the function (W. Kahan) f(0) = 1 We catch the exception (divide by 0) Substitute the value f(0) = 1 2012 Scott B. Baden /CSE 260/ Winter 2012 9

Denormalized numbers We compute if (a b) then x = a/(a-b) We should never divide by 0, even if a-b is tiny Underflow exception occurs when exact result a-b < underflow threshold UN We return a denormalized number for a-b ±0.d d x 2 min_exp sign bit, nonzero significand, minimum exponent value Fill in the gap between 0 and UN Jim Demmel 2012 Scott B. Baden /CSE 260/ Winter 2012 10

Invalid exception NaN (Not a Number) Exact result is not a well-defined real number We can have a quiet NaN or an snan Quiet does not raise an exception, but propagates a distinguished value E.g. missing data: max(3,nan) = 3 Signaling - generate an exception when accessed Detect uninitialized data 2012 Scott B. Baden /CSE 260/ Winter 2012 11

Exception handling Each of the 5 exceptions manipulates 2 flags Sticky flag set by an exception, can be read and cleared by the user Exception flag: should a trap occur? If so, we can enter a trap handler But requires precise interrupts, causes problems on a parallel computer We can use exception handling to build faster algorithms Try the faster but riskier algorithm Rapidly test for accuracy (possibly with the aid of exception handling) Substitute slower more stable algorithm as needed 2012 Scott B. Baden /CSE 260/ Winter 2012 12

When compiler optimizations alter precision Let s say we support 79 + bit extended format in registers When we store values into memory, values a converted to the lower precision format Compilers can keep things in registers and we may lose referential transparency An example float x, y, z; int j;. x = y + z; if (x >= j) replace x by something smaller than j // x < j y=x; With optimization turned on, x is computed to extra precision; it is not an ordinary float If x is in a register, x y, no guarantee that the condition x < j will be preserved when x is stored in y, i.e. y >= j 2012 Scott B. Baden /CSE 260/ Winter 2012 13

P754 on the GPU Cuda Programming Guide (4.1) All compute devices follow the IEEE 754-2008 standard for binary floating-point arithmetic with the following deviations There is no mechanism for detecting that a floating-point exception has occurred and all operations behave as if the exceptions are always masked SNaN are handled as quiet Cap. 2.x: FFMA is an IEEE-754-2008 compliant fused multiplyadd instruction the full-width product used in the addition & a single rounding occurs during generation of the final result rnd(a A + B) with FFMA (2.x) vs rnd(rnd(a A) + B) FMAD for 1.x FFMA can avoid loss of precision during subtractive cancellation when adding quantities of similar magnitude but opposite signs Also see Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs, by N. Whitehead and A. Fit-Florea 2012 Scott B. Baden /CSE 260/ Winter 2012 14

Today s lecture Floating point arithmetic GPU vs CPUs 2012 Scott B. Baden /CSE 260/ Winter 2012 15

Comparative results with Panfilov Method Runs on Forge, single precision, 16 threads on the CPU GPU: n= (1024, 1536); t=5.0 (84, 98) GF; Double prec: (49,54) CPU: (62,65); Double precision: (35, 17) 2012 Scott B. Baden /CSE 260/ Winter 2012 16

Computing Platforms NVIDIA GTX-280 200 series, like Lilliput Intel core i7 Superscalar, branch cache, out of order, 2-way hyperthreading Each core: 32 KB I+D L1, 256KB L2 Shared 8MB L3 2012 Scott B. Baden /CSE 260/ Winter 2012 17

Workloads (from Lee et al.) 2012 Scott B. Baden /CSE 260/ Winter 2012 18

Normalized performance 2012 Scott B. Baden /CSE 260/ Winter 2012 19

Fin