Code optimization techniques

Similar documents
Short Notes of CS201

CS201 - Introduction to Programming Glossary By

C++ (Non for C Programmer) (BT307) 40 Hours

Contents. Preface. Introduction. Introduction to C Programming

Introduction to C++ with content from

Fixed-Point Math and Other Optimizations

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language

Review of the C Programming Language for Principles of Operating Systems

QUIZ. 1. Explain the meaning of the angle brackets in the declaration of v below:

Kampala August, Agner Fog


Absolute C++ Walter Savitch

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core

Writing an ANSI C Program Getting Ready to Program A First Program Variables, Expressions, and Assignments Initialization The Use of #define and

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow

SWAR: MMX, SSE, SSE 2 Multiplatform Programming

Review of the C Programming Language

VALLIAMMAI ENGINEERING COLLEGE

CS201 Some Important Definitions

Name :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70

C-LANGUAGE CURRICULAM

SRM ARTS AND SCIENCE COLLEGE SRM NAGAR, KATTANKULATHUR

An introduction to Scheme

CERTIFICATE IN WEB PROGRAMMING

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

COMP2611: Computer Organization. Data Representation

ENGINEERING 1020 Introduction to Computer Programming M A Y 2 6, R E Z A S H A H I D I

CS3157: Advanced Programming. Outline

Aryan College. Fundamental of C Programming. Unit I: Q1. What will be the value of the following expression? (2017) A + 9

Preface... (vii) CHAPTER 1 INTRODUCTION TO COMPUTERS

CHAPTER 1 Introduction to Computers and Programming CHAPTER 2 Introduction to C++ ( Hexadecimal 0xF4 and Octal literals 031) cout Object

Scientific Programming in C X. More features & Fortran interface

Fast Introduction to Object Oriented Programming and C++

Contents. 1 Introduction to Computers, the Internet and the World Wide Web 1. 2 Introduction to C Programming 26

Welcome to Teach Yourself Acknowledgments Fundamental C++ Programming p. 2 An Introduction to C++ p. 4 A Brief History of C++ p.

Problem Solving with C++

C Programming SYLLABUS COVERAGE SYLLABUS IN DETAILS

QUIZ How do we implement run-time constants and. compile-time constants inside classes?

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

Algorithms and Computation in Signal Processing

CPSC 3740 Programming Languages University of Lethbridge. Data Types

CODE TIME TECHNOLOGIES. Abassi RTOS MISRA-C:2004. Compliance Report

Data Types. Data Types. Integer Types. Signed Integers

1 Chapter Plan...1 Exercise - Simple Program...2

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

Languages and Compiler Design II IR Code Generation I

Programming Languages and Compilers. Jeff Nucciarone AERSP 597B Sept. 20, 2004

WA1278 Introduction to Java Using Eclipse

A Fast Review of C Essentials Part I

Crash Course into. Prof. Dr. Renato Pajarola

Page 1. Stuff. Last Time. Today. Safety-Critical Systems MISRA-C. Terminology. Interrupts Inline assembly Intrinsics

Borland 105, 278, 361, 1135 Bounded array Branch instruction 7 break statement 170 BTree 873 Building a project 117 Built in data types 126

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Floating Point Arithmetic

Java Primer 1: Types, Classes and Operators

Type Conversion. and. Statements

Optimising with the IBM compilers

Introduction to Programming Using Java (98-388)

Introduction to Computers and C++ Programming p. 1 Computer Systems p. 2 Hardware p. 2 Software p. 7 High-Level Languages p. 8 Compilers p.

The Waite Group's. New. Primer Plus. Second Edition. Mitchell Waite and Stephen Prata SAMS

Course Text. Course Description. Course Objectives. StraighterLine Introduction to Programming in C++

Introduction to Parallel Computing

C Refresher, Advance C, Coding Standard, Misra C Compliance & Real-time Programming

CSCI 171 Chapter Outlines

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Practical C++ Programming

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

C++ for System Developers with Design Pattern

OBJECT ORIENTED PROGRAMMING

Optimisation p.1/22. Optimisation

Midterm 1 topics (in one slide) Bits and bitwise operations. Outline. Unsigned and signed integers. Floating point numbers. Number representation

COMPUTER ORGANIZATION & ARCHITECTURE

Midterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, p.m. - 5 p.m.


ME 461 C review Session Fall 2009 S. Keres

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth

Language Extensions for Vector loop level parallelism

Overview of C. Basic Data Types Constants Variables Identifiers Keywords Basic I/O

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

Object Oriented Programming with c++ Question Bank

BLAS. Christoph Ortner Stef Salvini

Final CSE 131B Winter 2003

CS 6456 OBJCET ORIENTED PROGRAMMING IV SEMESTER/EEE

Computer Programming C++ (wg) CCOs

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

Axivion Bauhaus Suite Technical Factsheet MISRA

Bits, Words, and Integers

Expressions. Arithmetic expressions. Logical expressions. Assignment expression. n Variables and constants linked with operators

Tutorial 1: Introduction to C Computer Architecture and Systems Programming ( )

Automatic Performance Tuning. Jeremy Johnson Dept. of Computer Science Drexel University

Malloc Lab & Midterm Solutions. Recitation 11: Tuesday: 11/08/2016

Intel Performance Libraries

Programming Languages. Streams Wrapup, Memoization, Type Systems, and Some Monty Python

19 Much that I bound, I could not free; Much that I freed returned to me. Lee Wilson Dodd

High-Performance Cryptography in Software

c 2013 Alexander Jih-Hing Yee

Ch. 12: Operator Overloading

Advanced School in High Performance and GRID Computing November Mathematical Libraries. Part I

Transcription:

& Alberto Bertoldo Advanced Computing Group Dept. of Information Engineering, University of Padova, Italy cyberto@dei.unipd.it May 19, 2009 The Four Commandments 1. The Pareto principle 80% of the effects comes from 20% of the causes 80% of the performance is affected by 20% of the code 80% of the final performance is obtained by 20% of the efforts Do you really need/want to spend 4x the time to get 25% speedup? 2. The Clever Man principle Exploit compiler s optimization flags 3. The Occam s razor Entia non sunt multiplicanda praeter necessitatem a a Entities should not be multiplied beyond necessity 4. The Weel Reinvention principle Reuse already optimized code as much as possible

Development tools Choosing compilers Obey the Clever Man principle GNU Compiler Collection a a http://gcc.gnu.org/onlinedocs/gcc/option-summary.html -On enable machine-independent optimizations (e.g. -O3) -fx manually enable/disable machine-independent optimizations -mx automatic machine-dependent optimizations -mtune=cpu-type restructuring without using a specific ISA -march=cpu-type use specific ISA (e.g. SSE2) Intel Compilers a a http://www.intel.com/cd/software/products/asmo-na/eng/ compilers/284264.htm -On enable machine-independent optimizations (e.g. -O3) -xt automatic machine-dependent optimizations -parallel automatic parallelization for multithreaded architectures

Using libraries Obey the Weel Reinvention principle Math libraries BLAS & LAPACK, FFT, math operations, random number generation Vendor libraries (IBM ESSL, Intel MKL, AMD CML) Generated libraries (ATLAS, Spiral, FFTW, Goto) Other libraries Image and signal processing, string processing, compression, cryptography, etc. Vendor libraries (Intel IPP, AMD PL) Generated libraries (ATLAS, Spiral, FFTW, Goto) Template Libraries Containers, iterators, algorithms, and functors C++ Standard Library Boost C++ Other compiler techniques Use compiler-specific directives to ease compiler s job (pragmas) Use machine-specific instructions Use intrinsics

C/C++ optimizations Floating-point operations Use the f or F suffix (for example, 3.14f) to specify a constant value of type float Use function prototypes for all functions that accept arguments of type float Extract common subexpressions, especially costly ones (e.g. divisions) Addition are quicker than multiplications Use array notation instead of pointer notation when working with arrays

Conditional instructions Make the fall-through path more probable in if statements Put high-probability case before switch statements Exploit short circuiting Exploit bitwise operators instead of logic operators Avoid postincrement and postdecrement in conditions Remember that compound conditions are translated into a series of conditional branches Use tables indexed on conditions instead of switch or if statements Exploit else-clause removal Function calls General optimizations Avoid function pointers Declare a nonmember function as static Use the const specifier for member functions Avoid virtual functions and virtual inheritance Fully prototype all functions Don t define a return value if not used

Function calls Arguments Using global variables may prevent local optimizations Constant arguments improve optimizations Put frequently used parameters in the leftmost positions Pass or return a pointer to structure/union/class, or pass it by reference Pass non-aggregate types (i.e. int and short) by value rather than passing by reference If you bind functions, use the same order for parameters Inline functions or macros Best candidates Small size Frequently called Few callers One or more compile-time constant parameters used in if, switch, for Perform only a load/store or simple comparisons/operations Drawbacks Increased code size Poor I-cache efficiency

Memory management Dynamic memory Minimize the use of malloc Use efficient memory managers Enforce memory alignment Check for memory leaks Structures Declare the largest members first Place variables near each other if they are frequently used together Adjust structure sizes to power of two Sort and pad C and C++ structures to achieve natural alignment Memory management Variables Prefer local automatic variables Declare local variables in the inner most scope Sort local variables in decreasing order Prefer static global variables to external variables Group external variables into structures or arrays If a global variable is needed, copy its value to a local variable and use it Avoid taking the address of a variable with & operator Use constants instead of variables, especially as loop bounds Use register-sized integers Use the smallest floating-point precision

Expressions Reduce store-to-load dependencies Assign common subexpressions to local automatic variables Avoid both explicit and implicit (pay attention!) integer/floating-point conversion Code the integer and floating-point arithmetic in separate computations Transform division by the same denominator in multiplications Avoid exception handling Prefer initialization over assignment Minimize both implicit and explicit type casting Pay attention using volatile and register 64-bit mode Use 64-bit mode only if you need to access large data Avoid performing mixed 32- and 64-bit operations Use long for variables which will be frequently accessed Loops Compilers already do most loop transformations Recall the Occam s razor: your clever optimization may prevent compiler s (much more clever) optimizations Take constant expressions out of the loop Count down instead of up Minimize stride Consider store VS recompute tradeoffs Keep array index expressions as simple as possible Keep index type compatible to pointer type Simplify loop expressions as much as possible If loop expressions ARE complicated, try manual transformations 1 1 http://en.wikipedia.org/wiki/loop_optimization

Input/Output Use binary streams instead of text streams Use the low-level I/O functions, such as open and close Design your own buffering for the low-level functions Access multiple of 4K, which is the size of a page