TILE PROCESSOR APPLICATION BINARY INTERFACE

Similar documents
Run-time Environment

CS356: Discussion #6 Assembly Procedures and Arrays. Marco Paolieri

IA-64 Software Conventions and Runtime Architecture Guide September 2000

Storage in Programs. largest. address. address

Memory Usage 0x7fffffff. stack. dynamic data. static data 0x Code Reserved 0x x A software convention

Functions in MIPS. Functions in MIPS 1

SH-5 Generic and C Specific ABI

PTX WRITER'S GUIDE TO INTEROPERABILITY

Today. Putting it all together

Lecture 5. Announcements: Today: Finish up functions in MIPS

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Lectures 5. Announcements: Today: Oops in Strings/pointers (example from last time) Functions in MIPS

Separate compilation. Topic 6: Runtime Environments p.1/21. CS 526 Topic 6: Runtime Environments The linkage convention

Subroutines. int main() { int i, j; i = 5; j = celtokel(i); i = j; return 0;}

Q1: /20 Q2: /30 Q3: /24 Q4: /26. Total: /100

CA Compiler Construction

CSCE 5610: Computer Architecture

Chapter 2A Instructions: Language of the Computer

Arguments and Return Values. EE 109 Unit 16 Stack Frames. Assembly & HLL s. Arguments and Return Values

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Run-Time Data Structures

COMP 303 Computer Architecture Lecture 3. Comp 303 Computer Architecture

ELF Application Binary Interface Supplement

Stack Frames. September 2, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 September 2, / 15

Short Notes of CS201

Procedure Calls Main Procedure. MIPS Calling Convention. MIPS-specific info. Procedure Calls. MIPS-specific info who cares? Chapter 2.7 Appendix A.

Instruction Set Architectures (4)

CS201 - Introduction to Programming Glossary By

Power Architecture 64-Bit ELF V2 ABI Specification. OpenPOWER ABI for Linux Supplement

LAB C Translating Utility Classes

instructions aligned is little-endian

CS429: Computer Organization and Architecture

Support for high-level languages

Compiling Code, Procedures and Stacks

CS153: Compilers Lecture 8: Compiling Calls

64-bit PowerPC ELF Application Binary Interface Supplement 1.9

ECE260: Fundamentals of Computer Engineering

Tile Processor (TILEPro64)

A crash course in MIPS assembly programming

Shift and Rotate Instructions

Assembly labs start this week. Don t forget to submit your code at the end of your lab section. Download MARS4_5.jar to your lab PC or laptop.

S/390 ELF Application Binary Interface Supplement

3/7/2018. Sometimes, Knowing Which Thing is Enough. ECE 220: Computer Systems & Programming. Often Want to Group Data Together Conceptually

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

Secure software guidelines for ARMv8-M. for ARMv8-M. Version 0.1. Version 2.0. Copyright 2017 ARM Limited or its affiliates. All rights reserved.

Chapter 9 :: Subroutines and Control Abstraction

Implementing Procedure Calls

Runtime management. CS Compiler Design. The procedure abstraction. The procedure abstraction. Runtime management. V.

Control Abstraction. Hwansoo Han

Prof. Kavita Bala and Prof. Hakim Weatherspoon CS 3410, Spring 2014 Computer Science Cornell University. See P&H 2.8 and 2.12, and A.

CSE Lecture In Class Example Handout

Size, Alignment, and Value Ranges of Data Types. Type Synonym Size Alignment Value Range. BYTE INTEGER*1 8 bits Byte

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Course Administration

Calling Conventions. See P&H 2.8 and Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Calling Conventions. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 2.8 and 2.12

SPU Application Binary Interface Specification. Version 1.3

SC100 Application Binary Interface. MNSC100ABI/D Rev. 2.0, 06/2002

ECE232: Hardware Organization and Design

MIPS Functions and Instruction Formats

Programming the ARM. Computer Design 2002, Lecture 4. Robert Mullins

CENG3420 Lecture 03 Review

Writing TMS320C8x PP Code Under the Multitasking Executive

Programming Languages

Chap. 8 :: Subroutines and Control Abstraction

Procedures and Stacks

Anne Bracy CS 3410 Computer Science Cornell University

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

EE 109 Unit 15 Subroutines and Stacks

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

CPS311 Lecture: Procedures Last revised 9/9/13. Objectives:

Oracle Tuxedo. CORBA Technical Articles 11g Release 1 ( ) March 2010

External Data Representation (XDR)

CPEG421/621 Tutorial

Functions and Procedures

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: MIPS Programming

ECE331: Hardware Organization and Design

Common Problems on Homework

ComAPI+ API Documentation

2/12/2018. Recall Why ISAs Define Calling Conventions. ECE 220: Computer Systems & Programming. Recall the Structure of the LC-3 Stack Frame

Five classic components

AN3154 Application note

CS 61C: Great Ideas in Computer Architecture. (Brief) Review Lecture

Topic 3-a. Calling Convention 2/29/2008 1

Today s topics. MIPS operations and operands. MIPS arithmetic. CS/COE1541: Introduction to Computer Architecture. A Review of MIPS ISA.

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Compiler Design. Homework 1. Due Date: Thursday, January 19, 2006, 2:00

Chapter 8 :: Subroutines and Control Abstraction. Final Test. Final Test Review Tomorrow

MOSAIC CONTROL DISPLAYS

CodeWarrior Development Studio Processor Expert RTOS Adapter User Guide

Interrupts in Decoupled Parallel Mode for MPC5675K Configuration and Usage

2/16/2018. Procedures, the basic idea. MIPS Procedure convention. Example: compute multiplication. Re-write it as a MIPS procedure

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions

ABI for the ARM Architecture Advisory Note SP must be 8- byte aligned on entry to AAPCSconforming

V850 Calling Convention

MIPS Procedure Calls - Review

MIPS Instruction Set

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Transcription:

MULTICORE DEVELOPMENT ENVIRONMENT TILE PROCESSOR APPLICATION BINARY INTERFACE RELEASE 4.2.-MAIN.1534 DOC. NO. UG513 MARCH 3, 213 TILERA CORPORATION

Copyright 212 Tilera Corporation. All rights reserved. Printed in the United States of America. The information contained in this document is the property of Tilera Corporation. It has been released under Non- Disclosure Agreement. Any unauthorized review, use, disclosure or distribution is strictly prohibited. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, except as may be expressly permitted by the applicable copyright statutes or in writing by the Publisher. The following are registered trademarks of Tilera Corporation: Tilera and the Tilera logo. The following are trademarks of Tilera Corporation: Embedding Multicore, The Multicore Company, Tile Processor, TILE Architecture, TILE64, TILEPro, TILEPro36, TILEPro64, TILExpress, TILExpress-64, TILExpressPro-64, TILExpress- 2G, TILExpressPro-2G, TILExpressPro-22G, imesh, TileDirect, TILEmpower, TILEmpower-Gx, TILEncore, TI- LEncorePro, TILEncore-Gx, TILE-Gx, TILE-Gx16, TILE-Gx36, TILE-Gx64, TILE-Gx1, TILE-Gx3, TILE- Gx5, TILE-Gx8, DDC (Dynamic Distributed Cache), Multicore Development Environment, Gentle Slope Programming, ilib, TMC (Tilera Multicore Components), hardwall, Zero Overhead Linux (ZOL), MiCA (Multicore imesh Coprocessing Accelerator), and mpipe (multicore Programmable Intelligent Packet Engine). All other trademarks and/or registered trademarks are the property of their respective owners. Third-party software: The Tilera IDE makes use of the BeanShell scripting library. Source code for the BeanShell library can be found at the BeanShell website (http://www.beanshell.org/developer.html). The following is a trademark of Marvell Semiconductor, Inc.: Distributed Switching Architecture (DSA). This document contains advance information on Tilera products that are in development, sampling or initial production phases. This information and specifications contained herein are subject to change without notice at the discretion of Tilera Corporation. No license, express or implied by estoppels or otherwise, to any intellectual property is granted by this document. Tilera disclaims any express or implied warranty relating to the sale and/or use of Tilera products, including liability or warranties relating to fitness for a particular purpose, merchantability or infringement of any patent, copyright or other intellectual property right. Products described in this document are NOT intended for use in medical, life support, or other hazardous uses where malfunction could result in death or bodily injury. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN AS IS BASIS. Tilera assumes no liability for damages arising directly or indirectly from any use of the information contained in this document. Sun Mar 3 18:16:25 EST 213 Tilera Corporation Information: info@tilera.com Website: http://www.tilera.com

Contents 1 Introduction 1 2 Machine Interface 3 2.1 Instruction Set Architecture................................... 3 2.2 Data Representation....................................... 3 2.2.1 Byte Ordering...................................... 3 2.2.2 Scalar Types...................................... 3 2.2.3 Aggregates and Unions................................. 5 2.3 Function Call Specification................................... 8 2.3.1 Register Usage..................................... 8 2.3.2 Stack.......................................... 8 2.3.3 Argument Passing................................... 1 2.3.4 Variadic Functions................................... 1 2.3.5 Return Values...................................... 1 2.4 Binary Image Format...................................... 1 3 System Interface 11 3.1 System Calls.......................................... 11 iii

CONTENTS iv

Chapter 1 Introduction The TILE-Gx Processor provides a programmer with a rectangular grid of tiled processors. Each processor supports a 48-bit virtual address space, mapped onto 64-bit physical addresses. This document describes the Application Binary Interface, or ABI, for programs running on Tile Processors. The ABI specifies how an application, stored in binary form, will be executed on the machine. It specifies the function calling convention, how an application is stored on disk, how data structures are represented in memory, and how programs interface with the operating system. 1

CHAPTER 1. INTRODUCTION 2

Chapter 2 Machine Interface This chapter describes the binary formats and calling conventions used with the TILE-Gx Processor ABI. This information is particularly important to compiler writers and assembly programmers, as it specifies how structures, array, and unions are formed and how functions are called. 2.1 Instruction Set Architecture ABI compliant programs use the TILE-Gx Processor ISA, as specified in the TILE-Gx Instruction Set Architecture Specification. 2.2 Data Representation The Tile Processor Architecture is little-endian with strict alignment requirements. This section describes how data structures should be arranged in memory to meet these requirements in a standardized fashion. Arranging data as described below will also guarantee inter-operation between binaries compiled in different environments. 2.2.1 Byte Ordering The Tile Processor Architecture is little-endian; the least significant byte in a multi-byte data item is stored at the lowest address. Figure 2.1 illustrates how data bytes are ordered in different sized data types. 2.2.2 Scalar Types Table 2.1 defines the mapping from ANSI C types to Tile Processor data types. Long integers occupy eight bytes and floating point types must be stored in standard IEEE format. 3

CHAPTER 2. MACHINE INTERFACE 8 msb (1) lsb () 24 16 8 msb (3) 2 1 lsb () 24 16 8 3 2 1 lsb () msb (7) 6 5 4 Figure 2.1: Halfword, word, and doubleword byte ordering. Numbers indicate the byte offset in memory. C type sizeof byte alignment machine type char 1 1 byte short 2 2 halfword int 4 4 word enum float 4 4 word pointer 8 8 doubleword long int 8 8 doubleword double 8 8 doubleword long long 8 8 doubleword Table 2.1: Mapping from C types to machine data types. A machine type must be aligned to an address which is a multiple of the type s size. 4

2.2. DATA REPRESENTATION struct { char c; short s; long l; double d; }; 24 16 8 s pad c l d d Figure 2.2: Each primitive type is aligned to a multiple of its size. Internal padding is added to assure alignment. 2.2.3 Aggregates and Unions As seen in Table 2.1, the Tile Processor Architecture requires that all primitive data types be aligned to addresses that are integral multiples of their size. Consequently, aggregate types (structures and arrays) must be carefully arranged in order meet the alignment requirements. An aggregate or union must be aligned in the same way as its largest component primitive type (including components in nested structures), and it must be padded to a size that is a multiple of the alignment. The following examples illustrate the alignment requirements. 2.2.3.1 Bit Fields When arranging bitfields within a structure, the compiler should satisfy the following rules. First, each bitfield must fit within a region with the size and alignment of its storage quantifier. Thus, a bitfield declared with a short storage quantifier must fit entirely within an aligned, halfword region. Second, the bits should be located at the lowest possible bit address such that they come after any previously declared fields and satisfy condition one. Thus, the storage region may overlap, but the bit locations must not interfere with any previously declared data. The following figures illustrate the use of these rules. Third, aggregates and unions must still be aligned and padded according to the size of their largest member. 5

CHAPTER 2. MACHINE INTERFACE struct{ double d; char c; } 24 16 8 d d pad c pad Figure 2.3: Structures must be padded at the end so that their size is a multiple of the largest alignment requirement. struct{ char s1; char s2; } 8 s2 s1 Figure 2.4: This structure has a size of two bytes, but it need only be aligned to one byte because its smallest primitive has an alignment requirement of one byte. Thus, the structure could be allocated at even or odd byte addresses. struct{ char b:3; char c:4; }; 6 3 2 c b Figure 2.5: Multiple bitfields may be compressed into a single storage region. 6

2.2. DATA REPRESENTATION struct{ short short }; b:1; c:7; 31 22 16 9 c b Figure 2.6: If a bitfield cannot fit in overlapped storage without overwriting previously allocated bits, it must be allocated at the next aligned location. This structure must be stored with halfword alignment. struct { int a:2; double b; int c:2; }; 31 19 1 a pad b b c pad Figure 2.7: Fields must be properly aligned, and structures must be post-padded to a size that is a multiple of their alignment. 7

CHAPTER 2. MACHINE INTERFACE Register Assembler name Type Purpose - 9 r - r9 Caller-saved Parameter passing / return values 1-29 r1 - r29 Caller-saved 3-51 r3 - r51 Callee-saved 52 r52 Callee-saved optional frame pointer 53 tp Dedicated Thread-local data 54 sp Dedicated Stack pointer 55 lr Caller-saved Return address 56 sn Always zero 57 idn Network IO dynamic network 58 idn1 Network IO dynamic network 1 59 udn Network User dynamic network 6 udn1 Network User dynamic network 1 61 udn2 Network User dynamic network 2 62 udn3 Network User dynamic network 3 63 zero Always zero 2.3 Function Call Specification Table 2.2: Register assignments for the Tile Processor ABI This section defines the conventions to be used when a program makes function calls on the Tile Processor Architecture. These conventions are designed to support C-style function calls while enabling backtracing for debugging purposes. 2.3.1 Register Usage Table 2.2 defines the register conventions used by ABI compliant programs. Registers may be designated as dedicated, caller-saved, callee-saved or network. Caller-saved registers may be used for any purpose and may change value after a function call is made, thus the caller must save them if it wants to preserve their value. Callee-saved registers must have the same value when a function returns as when it was entered. Dedicated registers are reserved for a particular data item, such as a stack pointer. ABI compliant programs should use dedicated registers as specified; even temporary usage of a dedicated register for another purpose may lead to exposed, inconsistent state if a trap occurs. Dedicated registers are always callee-saved. Network registers correspond to hardware FIFOs and are not relevant to the calling convention. 2.3.2 Stack Unlike some other architectures, the TILE-Gx Processor does not include dedicated instructions for stack manipulation. The stack is managed entirely by the software program, which stores a stack pointer in sp. The ABI requires that a program s stack grow downward. Thus, the stack pointer starts at a high address 8

2.3. FUNCTION CALL SPECIFICATION Region Purpose Size Locals Local variables and register spill slots Variable Dynamic space Dynamic stack space (e.g. alloca()) Variable Argument space Callee arguments beyond first 1 words Variable Frame pointer Caller space to spill incoming sp One word Callee lr Callee space to spill incoming lr One word Table 2.3: The regions in a stack frame, with locals at the highest address and lr spill space at the lowest. On entry to a function the stack pointer points to the callee lr spill location set up by the caller. and decreases as stack frames are pushed on by decrementing the stack pointer. The stack pointer is always aligned to a doubleword (64-bit) boundary. A stack frame is divided into several regions: Locals: The function may allocate this frame space for any required local variables, temporaries or register spill targets, etc. Dynamic space: This region contains memory whose size cannot be known statically, e.g. alloca() memory, variable-length arrays, etc. As memory is dynamically allocated, this region grows and the following regions are effectively slid over to make room for it. If dynamic memory is allocated, the offset from sp to the Locals region is no longer known, so the function copies its initial sp value to r52 and uses that to access the Locals. Argument space: When a subroutine is called, if it requires more arguments than fit in ten registers then the calling routine must pass those excess arguments by storing them here. Arguments are stored with the first argument at the lowest address, according to the argument passing convention described below. This region holds the maximum argument space needed by any call in the owning function, so the total stack frame size can be determined at compile time. Frame pointer: To assist backtracing, a function must store its own frame pointer here before calling any subroutines. The frame pointer is the value of sp on entry to the caller. Callees are not allowed to modify this memory, so the caller can store its frame pointer here once and then make many calls. Leaf functions, by definition, do not need to store anything here. Callee lr: Non-leaf functions, as well as those that modify lr by using it as a general register, must store their incoming lr value here. The backtracer expects to see the instruction sw sp, lr perform the store. Functions that do not modify lr (including through a jal) can ignore this memory location. 2.3.2.1 Allocating a stack frame There are two different ways to allocate a stack frame. Functions with fixed-size frames less than 32K in size should use a single addi sp, sp, -N or addli sp, sp, -N instruction to allocate the frame, and a single addi sp, sp, N or addli sp, sp, N to deallocate it. The backtracer looks for these specific instructions to determine if a PC is in the prolog or epilog. 9

CHAPTER 2. MACHINE INTERFACE If this technique cannot be used (i.e., the frame size is not statcially known, or is too large for a single addli), then the function must set up an explicit frame pointer with move r52, sp in the prolog, and only decrement the stack pointer using some instruction other than addi or addli, such as sub. 2.3.3 Argument Passing The first ten words of arguments are passed in r through r9. Any arguments beyond that are passed in the argument space region of the caller s stack frame, meaning the receiving function will find them at sp + 16 on entry. No parameter is passed partially in registers and partially on the stack. If a struct parameter will not fit in the remaining registers, it is passed entirely on the stack, and those remaining registers go unused. If a function returns a struct too large to fit in return registers, the caller passes a pointer to that struct in r, appropriately sliding over the other parameters to make room. The sliding over process is performed just as if the function took an extra pointer value as its first parameter, so all of the alignment and other constraints listed above are properly maintained. As this is a little endian processor, the least-significant half of a doubleword value is always passed in the lowest-numbered register or stack address. 2.3.4 Variadic Functions Arguments to variadic functions (those taking... ) should be passed using the standard calling convention. However, arguments passed in... should be fully-promoted as per the usual C promotion rules. Specifically, the caller must convert byte and halfword integers to word-size integers and convert single-precision float values to double. 2.3.5 Return Values A function returns a value in r through r9, just as if it were passing that value as the first argument to a function. If the returned value cannot fit in these registers, it is returned indirectly through a special pointer passed to that function in r as described earlier. 2.4 Binary Image Format Tile Processor binaries are distributed and loaded in the standard ELF64 format. 1

Chapter 3 System Interface 3.1 System Calls System calls are invoked by setting up the argument registers as usual for a subroutine call, storing the syscall number in r1, and then executing a swint1 instruction. 11