Portability. Professor Jennifer Rexford

Similar documents
Portability. Goals of this Lecture. The Real World is Heterogeneous

Continued from previous lecture

Byte Ordering. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS429: Computer Organization and Architecture

Pointers (1A) Young Won Lim 2/10/18

Byte Ordering. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pointers (1A) Young Won Lim 2/6/18

We would like to find the value of the right-hand side of this equation. We know that

Goals of C "" The Goals of C (cont.) "" Goals of this Lecture"" The Design of C: A Rational Reconstruction"

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

CS240: Programming in C. Lecture 2: Overview

Representation of Information

Computer Organization & Systems Exam I Example Questions

Pointers (1A) Young Won Lim 1/22/18

Why Don t Computers Use Base 10? Lecture 2 Bits and Bytes. Binary Representations. Byte-Oriented Memory Organization. Base 10 Number Representation

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Bits and Bytes and Numbers

The Design of C: A Rational Reconstruction (cont.)" Jennifer Rexford!

Distributed Systems 8. Remote Procedure Calls

CS C Primer. Tyler Szepesi. January 16, 2013

Bits, Bytes, and Integers Part 2

Contents of Lecture 3

The Design of C: A Rational Reconstruction: Part 2

Bits and Bytes. Why bits? Representing information as bits Binary/Hexadecimal Byte representations» numbers» characters and strings» Instructions

Section Notes - Week 1 (9/17)

CSE 12 Spring 2016 Week One, Lecture Two

Reverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher

Why Don t Computers Use Base 10? Lecture 2 Bits and Bytes. Binary Representations. Byte-Oriented Memory Organization. Base 10 Number Representation

Lecture 4: Instruction Set Architecture

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

EC 413 Computer Organization

Work relative to other classes

The Design of C: A Rational Reconstruction (cont.)

Page 1. Where Have We Been? Chapter 2 Representing and Manipulating Information. Why Don t Computers Use Base 10?

Topic Notes: Bits and Bytes and Numbers

Chapter 7. Basic Types

Operating Systems. 18. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Spring /20/ Paul Krzyzanowski

Princeton University Computer Science 217: Introduction to Programming Systems The Design of C

Reverse Engineering II: The Basics

Distributed Systems. 03. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Fall 2017

Intermediate Programming, Spring 2017*

Hardware: Logical View

Bits and Bytes January 13, 2005

CS 261 Fall Mike Lam, Professor. Structs and I/O

CSE 12 Spring 2018 Week One, Lecture Two

Topics of this Slideset. CS429: Computer Organization and Architecture. It s Bits All the Way Down. Why Binary? Why Not Decimal?

CS429: Computer Organization and Architecture

Type (1A) Young Won Lim 2/17/18

CS 417 9/18/17. Paul Krzyzanowski 1. Socket-based communication. Distributed Systems 03. Remote Procedure Calls. Sample SMTP Interaction

LESSON 4. The DATA TYPE char

SWEN-250 Personal SE. Introduction to C

Data Representation and Storage. Some definitions (in C)

17. Instruction Sets: Characteristics and Functions

Lecture 02 C FUNDAMENTALS

Low-Level Programming

CSC C69: OPERATING SYSTEMS

9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

The type of all data used in a C (or C++) program must be specified

Lecture 8: Structs & File I/O

Bits, Bytes and Integers

Under the Hood: Data Representations, Memory and Bit Operations. Computer Science 104 Lecture 3

Representing numbers on the computer. Computer memory/processors consist of items that exist in one of two possible states (binary states).

Topic Notes: Bits and Bytes and Numbers

Static Analysis I PAOLO PALUMBO, F-SECURE CORPORATION

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

CS107, Lecture 3 Bits and Bytes; Bitwise Operators

Other array problems. Integer overflow. Outline. Integer overflow example. Signed and unsigned

Lecture 5: Outline. I. Multi- dimensional arrays II. Multi- level arrays III. Structures IV. Data alignment V. Linked Lists

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

C Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006

CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O

For Your Amusement. Beware of bugs in the above code; I have only proved it correct, not tried it. Donald Knuth. Program Verification

CS 61C: Great Ideas in Computer Architecture Intro to Assembly Language, MIPS Intro

Pointers (1A) Young Won Lim 1/14/18

More in-class activities / questions

Arithmetic Operators. Portability: Printing Numbers

CSCI-243 Exam 2 Review February 22, 2015 Presented by the RIT Computer Science Community

Files and Streams Opening and Closing a File Reading/Writing Text Reading/Writing Raw Data Random Access Files. C File Processing CS 2060

COMP3221: Microprocessors and. and Embedded Systems. Instruction Set Architecture (ISA) What makes an ISA? #1: Memory Models. What makes an ISA?

scanf erroneous input Computer Programming: Skills & Concepts (CP) Characters Last lecture scanf error-checking our input This lecture

Practical Malware Analysis

When an instruction is initially read from memory it goes to the Instruction register.

EE 209: Programming Structures for Electrical Engineering

ICS Instructor: Aleksandar Kuzmanovic TA: Ionut Trestian Recitation 2

Computer Architecture and System Software Lecture 02: Overview of Computer Systems & Start of Chapter 2

COMP2121: Microprocessors and Interfacing. Number Systems

Harry H. Porter, 2006

Testing! The material for this lecture is drawn, in part, from! The Practice of Programming (Kernighan & Pike) Chapter 6!

Memory, Arrays & Pointers

Reverse Engineering II: The Basics

IO = Input & Output 2

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Memory Organization and Addressing

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

ELEC 377 C Programming Tutorial. ELEC Operating Systems

So far, system calls have had easy syntax. Integer, character string, and structure arguments.

CSE351 Spring 2018, Final Exam June 6, 2018

M1 Computers and Data

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

Transcription:

Portability Professor Jennifer Rexford http://www.cs.princeton.edu/~jrex The material for this lecture is drawn, in part, from The Practice of Programming (Kernighan & Pike) Chapter 8 1 Goals of this Lecture Learn to write code that works with multiple: Hardware platforms Operating systems Compilers Human cultures Why? Moving existing code to a new context is easier/cheaper than writing new code for the new context Code that is portable is (by definition) easier to move; portability reduces software costs Relative to other high-level languages (e.g., Java), C is notoriously non-portable 2 1

The Real World is Heterogeneous Multiple kinds of hardware 32-bit Intel Architecture 64-bit IA, PowerPC, Sparc, MIPS, Arms, Multiple operating systems Linux Windows, Mac, Sun, AIX, Multiple character sets ASCII Latin-1, Unicode, Multiple human alphabets and languages 3 Portability Goal: Run program on any system No modifications to source code required Program continues to perform correctly Ideally, the program performs well too 4 2

C is Notoriously Non-Portable Recall C design goals Create Unix operating system and associated software Reasonably high level, but Close to the hardware for efficiency So C90 is underspecified Compiler designer has freedom to reflect the design of the underlying hardware But hardware systems differ! So C compilers differ Extra care is required to write portable C code 5 General Heuristics Some general portability heuristics 6 3

Intersection (1) Program to the intersection Use only features that are common to all target environments I.e., program to the intersection of features, not the union When that s not possible 7 Encapsulation (2) Encapsulate Localize and encapsulate features that are not in the intersection Use parallel source code files -- so non-intersection code can be chosen at link-time Use parallel data files so non-intersection data (e.g. textual messages) can be chosen at run-time When that s not possible, as a last resort 8 4

Conditional Compilation (3) Use conditional compilation #ifdef UNIX /* Unix-specific code */ #endif #ifdef WINDOWS /* MS Windows-specific code */ #endif And above all 9 Test!!! (4) Test the program with multiple: Hardware (Intel, MIPS, SPARC, ) Operating systems (Linux, Solaris, MS Windows, ) Compilers (GNU, MS Visual Studio, ) Cultures (United States, Europe, Asia, ) 10 5

Hardware Differences Some hardware differences, and corresponding portability heuristics 11 Natural Word Size Obstacle: Natural word size In some systems, natural word size is 4 bytes In some (esp. older) systems, natural word size is 2 bytes In some (esp. newer) systems, natural word size is 8 bytes C90 intentionally does not specify sizeof(int); depends upon natural word size of underlying hardware 12 6

Natural Word Size (cont.) (5) Don t assume data type sizes int *p; p = malloc(4); Portable: int *p; p = malloc(sizeof(int)); 13 Right Shift Obstacle: Right shift operation In some systems, right shift operation is logical Right shift of a negative signed int fills with zeroes In some systems, right shift operation is arithmetic Right shift of a negative signed int fills with ones C90 intentionally does not specify semantics of right shift; depends upon right shift operator of underlying hardware 14 7

Right Shift (cont.) (6) Don t right-shift signed ints -3 >> 1 Logical shift => 2147483646 Arithmetic shift => -2 Portable: /* Don't do that!!! */ 15 Byte Order Obstacle: Byte order Some systems (e.g. Intel) use little endian byte order Least significant byte of a multi-byte entity is stored at lowest memory address The int 5 at address 1000: 1000 00000101 1001 00000000 1002 00000000 1003 00000000 Some systems (e.g. SPARC) use big endian byte order Most significant byte of a multi-byte entity is stored The int 5 at address 1000: at lowest memory address 1000 1001 1002 1003 00000000 00000000 00000000 00000101 16 8

Byte Order (cont.) (7) Don t rely on byte order in code int i = 5; char c; c = *(char*)&i; /* Silly, but legal */ Portable: int i = 5; char c; /* Don't do that! Or... */ c = (char)i; Little endian: c = 5 Big endian: c = 0; 17 Byte Order (cont.) (8) Use text for data exchange Run on a little endian computer fwrite()writes raw data to a file unsigned short s = 5; FILE *f = fopen("myfile", "w"); fwrite(&s, sizeof(unsigned short), 1, f); myfile 00000101 00000000 Run on a big endian computer: Reads 1280!!! fread() reads raw data from a file unsigned short s; FILE *f = fopen("myfile", "r"); fread(&s, sizeof(unsigned short), 1, f); 18 9

Byte Order (cont.) Portable: Run on a big or little endian computer unsigned short s = 5; FILE *f = fopen("myfile", "w"); fprintf(f, "%hu", s); fprintf() converts raw data to ASCII text myfile 00110101 ASCII code for 5 Run on a big or little endian computer: Reads 5 unsigned short s; FILE *f = fopen("myfile", "r"); fscanf(f, "%hu", &s); fscanf()reads ASCII text and converts to raw data 19 Byte Order (cont.) If you must exchange raw data (9) Write and read one byte at a time Run on a big or little endian computer unsigned short s = 5; FILE *f = fopen("myfile", "w"); fputc(s >> 8, f); /* high-order byte */ fputc(s & 0xFF, f); /* low-order byte */ Decide on big-endian data exchange format myfile 00000000 00000101 Run on a big or little endian computer: Reads 5 unsigned short s; FILE *f = fopen("myfile", "r"); s = fgetc(f) << 8; /* high-order byte */ s = fgetc(f) & 0xFF; /* low-order byte */ 20 10

OS Differences Some operating system differences, and corresponding portability heuristics 21 End-of-Line Characters Obstacle: Representation of end-of-line Unix (including Mac OS/X) represents end-of-line as 1 byte: 00001010 (binary) Mac OS/9 represents end-of-line as 1 byte: 00001101 (binary) MS Windows represents end-of-line as 2 bytes: 00001101 00001010 (binary) 22 11

End-of-Line Characters (cont.) (10) Use binary mode for textual data exchange FILE *f = fopen("myfile", "w"); fputc('\n', f); Open the file in ordinary text mode Run on Unix Run on Mac OS/9 Run on MS Windows 00001010 00001101 00001101 00001010 \n \r \r \n Trouble if read via fgetc() on wrong operating system 23 End-of-Line Characters (cont.) Portable: FILE *f = fopen("myfile", "wb"); fputc('\n', f); Open the file in binary mode 00001010 \n Run on Unix, Mac OS/9, or MS Windows No problem if read via fgetc() in binary mode on wrong operating system I.e., there is no wrong operating system! 24 12

Data Alignment Obstacle: Data alignment Some hardware requires data to be aligned on particular boundaries Some operating systems impose additional constraints: OS char short int double Linux 1 2 4 4 MS Windows 1 2 4 8 Start address must be evenly divisible by: Moreover If a structure must begin on an x-byte boundary, then it also must end on an x-byte boundary Implication: Some structures must contain padding 25 Data Alignment (cont.) (11) Don t rely on data alignment Allocates 12 bytes; too few bytes on MS Windows struct S { int i; double d; } struct S *p; p = (struct S*)malloc(sizeof(int)+sizeof(double)); Windows: i pad Linux: i d d 26 13

Data Alignment (cont.) Portable: Allocates struct S { 12 bytes on Linux int i; 16 bytes on MS Windows double d; } struct S *p; p = (struct S*)malloc(sizeof(struct S)); Windows: i pad Linux: i d d 27 Character Codes Obstacle: Character codes Some operating systems (e.g. IBM OS/390) use the EBCDIC character code Some systems (e.g. Unix, MS Windows) use the ASCII character code 28 14

Character Codes (cont.) (12) Don t assume ASCII if ((c >= 65) && (c <= 90)) A little better: if ((c >= 'A') && (c <= 'Z')) Portable: #include <ctype.h> if (isupper(c)) Assumes ASCII Assumes that uppercase char codes are contiguous; not true in EBCDIC For ASCII: (c >= 'A') && (c <= 'Z') For EBCDIC: ((c >= 'A') && (c <= 'I')) ((c >= 'J') && (c <= 'R')) ((c >= 'S') && (c <= 'Z')) 29 Compiler Differences Compilers may differ because they: Implement underspecified features of the C90 standard in different ways, or Extend the C90 standard Some compiler differences, and corresponding portability heuristics 30 15

Compiler Extensions Obstacle: Non-standard extensions Some compilers offer non-standard extensions 31 Compiler Extensions (13) Stick to the standard language For now, stick to C90 (not C99) for (int i = 0; i < 10; i++) Portable: int i; for (i = 0; i < 10; i++) Many systems allow definition of loop control variable within for statement, but a C90 compiler reports error 32 16

Evaluation Order Obstacle: Evaluation order C90 specifies that side effects and function calls must be completed at ; But multiple side effects within the same expression can have unpredictable results 33 Evaluation Order (cont.) (14) Don t assume order of evaluation i is incremented before indexing names; but has i strings[i] = names[++i]; been incremented before indexing strings? C90 doesn t say Portable (either of these, as intended): i++; strings[i] = names[i]; strings[i] = names[i+1]; i++; 34 17

Evaluation Order (cont.) Which call of getchar() is executed first? C90 doesn t say printf("%c %c\n", getchar(), getchar()); Portable (either of these, as intended): i = getchar(); j = getchar(); printf("%c %c\n", i, j); i = getchar(); j = getchar(); printf("%c %c\n", j, i); 35 Char Signedness Obstacle: Char signedness C90 does not specify signedness of char On some systems, char means signed char On other systems, char means unsigned char 36 18

Char Signedness (cont.) (15) Don t assume signedness of char If necessary, specify signed char or unsigned char int a[256]; char c; c = (char)255; a[c] If char is unsigned, then a[c] is a[255] => fine If char is signed, then a[c] is a[-1] => out of bounds Portable: int a[256]; unsigned char c; c = 255; a[c] 37 Char Signedness (cont.) int i; char s[max+1]; for (i = 0; i < MAX; i++) if ((s[i] = getchar()) == '\n') (s[i] == EOF)) break; s[i] = \0 ; Portable: int c, i; char s[max+1]; for (i = 0; i < MAX; i++) { if ((c = getchar()) == '\n') (c == EOF)) break; s[i] = c; } s[i] = \0 ; If char is unsigned, then this always is FALSE 38 19

Library Differences Some library differences, and corresponding portability heuristics 39 Library Extensions Obstacle: Non-standard functions Standard libraries bundled with some development environments (e.g. GNU, MS Visual Studio) offer nonstandard functions 40 20

Library Extensions (16) Stick to the standard library functions For now, stick to the C90 standard library functions char *s = "hello"; char *copy; copy = strdup(s); Portable: strdup() is available in many standard libraries, but is not defined in C90 char *s = "hello"; char *copy; copy = (char*)malloc(strlen(s) + 1); strcpy(copy, s); 41 Cultural Differences Some cultural differences, and corresponding portability heuristics 42 21

Character Code Size Obstacle: Character code size United States Alphabet requires 7 bits => 1 byte per character Popular character code: ASCII Western Europe Alphabets require 8 bits => 1 byte per character Popular character code: Latin-1 China, Japan, Korea, etc. Alphabets require 16 bits => 2 bytes per character Popular character code: Unicode 43 Character Code Size (17) Don t assume 1-byte character code size char c = 'a'; Portable: C90 has no good solution C99 has wide character data type, constants, and associated functions #include <stddef.h> wchar_t c = L'\x3B1'; /* Greek lower case alpha */ But then beware of byte-order portability problems! Future is not promising 44 22

Human Language Obstacle: Humans speak different natural languages! 45 Human Language (cont.) (18) Don t assume English /* somefile.c */ printf("bad input"); Can t avoid natural language! So 46 23

Human Language (cont.) Encapsulate code /* somefile.c */ #include "messages.h" printf(getmsg(5)); Choose appropriate message.c file at link-time Messages module, with multiple implementations /* messages.h */ char *getmsg(int msgnum); /* englishmessages.c */ char *getmsg(int msgnum) { switch(msgnum) { case 5: return "Bad input"; } } /* spanishmessages.c */ char *getmsg(int msgnum) { switch(msgnum) { case 5: return "Mala entrada"; } } 47 Human Language (cont.) Maybe even better: encapsulate data /* messages.h */ char *getmsg(int msgnum); Messages module /* messages.c */ enum {MSG_COUNT = 100}; char *getmsg(int msgnum) { static char *msg[msg_count]; static int firstcall = 1; if (firstcall) { <Read all messages from appropriate messages.txt file into msg> firstcall = 0; } return msg[msgnum]; } /* englishmessages.txt */ Bad input /* spanishmessages.txt */ Mala entrada Choose appropriate message.txt file at run-time 48 24

Summary General heuristics (1) Program to the intersection (2) Encapsulate (3) Use conditional compilation (as a last resort) (4) Test!!! 49 Summary (cont.) Heuristics related to hardware differences (5) Don t assume data type sizes (6) Don t right-shift signed ints (7) Don t rely on byte order in code (8) Use text for data exchange (9) Write and read 1 byte at a time Heuristics related to OS differences (10) Use binary mode for textual data exchange (11) Don t rely on data alignment (12) Don t assume ASCII 50 25

Summary (cont.) Heuristics related to compiler differences (13) Stick to the standard language (14) Don t assume evaluation order (15) Don t assume signedness of char Heuristic related to library differences (16) Stick to the standard library Heuristics related to cultural differences (17) Don t assume 1-byte char code size (18) Don t assume English 51 26