C++ Named Return Value Optimization

Similar documents
Machine Program: Procedure. Zhaoguo Wang

CS429: Computer Organization and Architecture

Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

Machine-level Programs Procedure

Machine-Level Programming III: Procedures

%r8 %r8d. %r9 %r9d. %r10 %r10d. %r11 %r11d. %r12 %r12d. %r13 %r13d. %r14 %r14d %rbp. %r15 %r15d. Sean Barker

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Assembly Language: Function Calls

Procedures and the Call Stack

Stack Frame Components. Using the Stack (4) Stack Structure. Updated Stack Structure. Caller Frame Arguments 7+ Return Addr Old %rbp

6.1. CS356 Unit 6. x86 Procedures Basic Stack Frames

The Stack & Procedures

University of Washington

Mechanisms in Procedures. CS429: Computer Organization and Architecture. x86-64 Stack. x86-64 Stack Pushing

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012

Machine- Level Representa2on: Procedure

Areas for growth: I love feedback

EXAMINATIONS 2014 TRIMESTER 1 SWEN 430. Compiler Engineering. This examination will be marked out of 180 marks.

Machine/Assembler Language Putting It All Together

Machine-Level Programming (2)

Changelog. Assembly (part 1) logistics note: lab due times. last time: C hodgepodge

Function Calls and Stack

Corrections made in this version not seen in first lecture:

Binghamton University. CS-220 Spring X86 Debug. Computer Systems Section 3.11

The Stack & Procedures

Assembly Programming IV

Machine Programming 3: Procedures

x86 Programming I CSE 351 Winter

Credits and Disclaimers

Credits and Disclaimers

void P() {... y = Q(x); print(y); return; } ... int Q(int t) { int v[10];... return v[t]; } Computer Systems: A Programmer s Perspective

CSE351 Spring 2018, Midterm Exam April 27, 2018

CS 261 Fall Machine and Assembly Code. Data Movement and Arithmetic. Mike Lam, Professor


Brief Assembly Refresher

COMP 210 Example Question Exam 2 (Solutions at the bottom)

Systems Programming and Computer Architecture ( )

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

x86-64 Programming III

Credits and Disclaimers

Assembly II: Control Flow

Changelog. Brief Assembly Refresher. a logistics note. last time

7.1. CS356 Unit 7. Data Layout & Intermediate Stack Frames

Data Representa/ons: IA32 + x86-64

1 Number Representation(10 points)

Assembly Programming IV

Lecture 4 CIS 341: COMPILERS

Implementing Threads. Operating Systems In Depth II 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

University*of*Washington*

CS356: Discussion #8 Buffer-Overflow Attacks. Marco Paolieri

Optimization part 1 1

x86 Programming I CSE 351 Autumn 2016 Instructor: Justin Hsia

Machine Language CS 3330 Samira Khan

Structs and Alignment

1. A student is testing an implementation of a C function; when compiled with gcc, the following x86-64 assembly code is produced:

The von Neumann Machine

CS 2505 Computer Organization I Test 2. Do not start the test until instructed to do so! printed

CSCI 2021: x86-64 Control Flow

CSE351 Autumn 2014 Midterm Exam (29 October 2014)

Question Points Score Total: 100

CS356: Discussion #15 Review for Final Exam. Marco Paolieri Illustrations from CS:APP3e textbook

Computer Organization: A Programmer's Perspective

CS 107 Lecture 10: Assembly Part I

CS356: Discussion #7 Buffer Overflows. Marco Paolieri

x64 Cheat Sheet Fall 2014

C to Machine Code x86 basics: Registers Data movement instructions Memory addressing modes Arithmetic instructions

Assembly I: Basic Operations. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

SYSTEMS PROGRAMMING AND COMPUTER ARCHITECTURE Assignment 5: Assembly and C

How Software Executes

Compiling C Programs Into X86-64 Assembly Programs

Corrections made in this version not in first posting:

How Software Executes

Concurrency. Johan Montelius KTH

The Compilation Process

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

18-600: Recitation #4 Exploits

Machine-Level Programming I: Basics

Credits and Disclaimers

Building an Executable

Generation. representation to the machine

Test I Solutions MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Department of Electrical Engineering and Computer Science

CS 107. Lecture 13: Assembly Part III. Friday, November 10, Stack "bottom".. Earlier Frames. Frame for calling function P. Increasing address

Sungkyunkwan University

x86-64 Programming III & The Stack

Lecture 3 CIS 341: COMPILERS

Recitation: Attack Lab

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

Assembly Language: Function Calls

Where We Are. Optimizations. Assembly code. generation. Lexical, Syntax, and Semantic Analysis IR Generation. Low-level IR code.

Controlling Program Flow

CSE P 501 Exam Sample Solution 12/1/11

Changelog. Corrections made in this version not in first posting:

Changes made in this version not seen in first lecture:

Assembly Language: Function Calls" Goals of this Lecture"

CSE 351: Week 4. Tom Bergan, TA

x86 64 Programming I CSE 351 Autumn 2018 Instructor: Justin Hsia

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 12

Transcription:

C++ Named Return Value Optimization

post. Robert.Schneider@hotmail.de meetup.com/c-user-group-karlsruhe

auto create() -> T { T castor /* initialize */; //... return castor; } void use() { T pollux = create(); //... }

variable object resource

{ } Type variable /* initialize */; //... //...

{ } Type variable /* initialize */; //... return stuff;

{ } Type variable /* initialize */; //... throw stuff;

variable object clean-up scope left destroyed executed

class Type { //... }; ~Type() { // clean-up code }

{ } Type variable /* init */ ; Type other_var = variable; //...

variable object clean-up different independent independent

{ } Type variable /* init */ ; { Type other_var = variable; //... } //...

variable object resource

int MyInt array<int, 10> vector<int>

{ int castor = 0; int pollux = castor; assert(castor == pollux); castor = 1729; pollux = -1; } assert(castor!= pollux);

{ vector<int> castor(1000); vector<int> pollux = castor; assert(castor == pollux); castor.at(42) = 1729; pollux.at(42) = -1; } assert(castor!= pollux);

int MyInt array<int, 10> vector<int> special

vector<int> memory allocation object resource

{ vector<int> castor(1000); vector<int> pollux = castor; assert(castor == pollux); castor.at(42) = 1729; pollux.at(42) = -1; } assert(castor!= pollux);

variable object resource different independent independent

variable object resource initialize from copy copy clone variable object resource

auto create() -> vector<int> { vector<int> castor(1000); //... return castor; } void use() { vector<int> pollux = create(); //... }

variable object resource

int_pair p() { int_pair r = {0}; return r; } int_quadruple q() { int_quadruple r = {0}; return r; } movl ret movl movl ret $0, %eax $0, %eax $0, %edx

... sets register 0 register 1 register 2 reads callee (called function) calls caller

int_pair p() { int_pair r = {0}; return r; } int_quadruple q() { int_quadruple r = {0}; return r; } int_octuple o() { int_octuple r = {0}; return r; } movl ret movl movl ret movq movl movl movl movl movl movl movl movl ret $0, %eax $0, %eax $0, %edx %rdi, %rax $0, (%rdi) $0, 4(%rdi) $0, 8(%rdi) $0, 12(%rdi) $0, 16(%rdi) $0, 20(%rdi) $0, 24(%rdi) $0, 28(%rdi)

... reads address register 0 register 1 register 2 sets address callee (called function) calls caller writes to address space for return value

RAM register register 0 register 1 register 2

RAM register object @ address 2 size 4 register 0 register 1 register 2

RAM register object @ address 2 size 4 register 0 register 1 register 2

object object

trivially copyable

... reads address register 0 register 1 register 2 sets address callee (called function) calls caller writes to address space for return value

variable object resource indirect return for non-trivially-copyable

void use() { vector<int> pollux = create(); } //... auto create() -> vector<int> { vector<int> castor(1000); //... return castor; }

void use() { buffer< vector<int> > future_pollux; create(&future_pollux); vector<int>& pollux = future_pollux; //... } void create(buffer< vector<int> >* ret) { vector<int> castor(1000); //... new(ret) vector<int>(castor); }

void create( buffer< vector<int> >* ret) { } vector<int> castor(1000); new(ret) vector<int>(castor); pushq subq movq movl movq call movq movq call movq call movq addq popq ret %rbx $32, %rsp %rdi, %rbx $1000, %esi %rsp, %rdi vector<int>::vector(int) %rsp, %rsi %rbx, %rdi vector<int> ::vector(vector<int> const&) %rsp, %rdi vector<int>::~vector() %rbx, %rax $32, %rsp %rbx

void create( buffer< vector<int> >* ret) { new(ret) vector<int>(1000); pushq movq movl call %rbx %rdi, %rbx $1000, %esi vector<int>::vector(int) } movq popq ret %rbx, %rax %rbx

NRVO = indirect return - unneccesary operations

void use() { vector<int> pollux = create(); } //... auto create() -> vector<int> { vector<int> castor(1000); //... return castor; }

pollux castor object resource different same same

observable behaviour is guaranteed but not how it is achieved

int x = 0; for(int i = 0; i < 100; ++i) x += i; cout << x; mov edi, offset std::cout mov esi, 4950 call ostream::operator<<(int)

observable behaviour is guaranteed but not how it is achieved *except NRVO / copy elision

resource management is potentially observable

limits of NRVO local variable same type as return type

auto no_nrvo() -> vector<int> { array< vector<int>, 2 > ogre; } return ogre.at(0);

limits of NRVO local variable same type as return type complete object

auto nrvo(bool b) -> vector<int> { vector<int> thor(500); vector<int> loki(900); } if(b) return thor; else return loki;

limits of NRVO local variable same type as return type complete object compiler smartness

auto nrvo() -> vector<int> { vector<int> thor(500); vector<int> loki(900); } if(read_char() == 'a') return thor; else return loki;

limits of NRVO local variable same type as return type complete object compiler smartness observable behaviour

variable object resource variable object

moving is great! but... for some types it is still slow don t want to write it explicitly

auto nrvo() -> vector<int> { vector<int> thor(500); vector<int> loki(900); } if(read_char() == 'a') return thor; else return loki;

NRVO > move > copy

auto no_nrvo() -> vector<int> { vector<int> thor(500); vector<int> loki(900); } if(read_char() == 'a') return move(thor); else return move(loki);