Hacking in C. Pointers. Radboud University, Nijmegen, The Netherlands. Spring 2019

Similar documents
Pointers (continued), arrays and strings

Pointers (continued), arrays and strings

Limitations of the stack

Pointer Basics. Lecture 13 COP 3014 Spring March 28, 2018

Arrays and Pointers in C. Alan L. Cox

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

Secure Programming I. Steven M. Bellovin September 28,

C: Pointers. C: Pointers. Department of Computer Science College of Engineering Boise State University. September 11, /21

Hacking in C. The C programming language. Radboud University, Nijmegen, The Netherlands. Spring 2018

Last week. Data on the stack is allocated automatically when we do a function call, and removed when we return

Princeton University Computer Science 217: Introduction to Programming Systems The C Programming Language Part 1

Hacking in C. The C programming language. Radboud University, Nijmegen, The Netherlands. Spring 2018

C++ Data Types. 1 Simple C++ Data Types 2. 3 Numeric Types Integers (whole numbers) Decimal Numbers... 5

First of all, it is a variable, just like other variables you studied

A brief introduction to C programming for Java programmers

Administrivia. Introduction to Computer Systems. Pointers, cont. Pointer example, again POINTERS. Project 2 posted, due October 6

Parameter passing. Programming in C. Important. Parameter passing... C implements call-by-value parameter passing. UVic SEng 265

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

C Introduction. Comparison w/ Java, Memory Model, and Pointers

Lecture 4 September Required reading materials for this class

Memory, Arrays & Pointers

PIC 10A Pointers, Arrays, and Dynamic Memory Allocation. Ernest Ryu UCLA Mathematics

Language comparison. C has pointers. Java has references. C++ has pointers and references

CS 61C: Great Ideas in Computer Architecture C Pointers. Instructors: Vladimir Stojanovic & Nicholas Weaver

Section Notes - Week 1 (9/17)

Hacking in C. Memory layout. Radboud University, Nijmegen, The Netherlands. Spring 2018

Memory, Data, & Addressing II CSE 351 Spring

2/12/17. Goals of this Lecture. Historical context Princeton University Computer Science 217: Introduction to Programming Systems

Lecture Notes on Memory Layout

Contents of Lecture 3

Pointers. 1 Background. 1.1 Variables and Memory. 1.2 Motivating Pointers Massachusetts Institute of Technology

Chapter IV Introduction to C for Java programmers

Pointers, Dynamic Data, and Reference Types

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

Lecture Notes on Types in C

Programming refresher and intro to C programming

CS 61C: Great Ideas in Computer Architecture. Lecture 3: Pointers. Bernhard Boser & Randy Katz

Agenda. Components of a Computer. Computer Memory Type Name Addr Value. Pointer Type. Pointers. CS 61C: Great Ideas in Computer Architecture

Basic Types and Formatted I/O

Page 1. Today. Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right? Compiler requirements CPP Volatile

CS 61C: Great Ideas in Computer Architecture. Lecture 3: Pointers. Krste Asanović & Randy Katz

Undefined Behaviour in C

CS 61C: Great Ideas in Computer Architecture. C Arrays, Strings, More Pointers

Aliasing restrictions of C11 formalized in Coq

HW1 due Monday by 9:30am Assignment online, submission details to come

Pointers, Arrays, and Strings. CS449 Spring 2016

Exercise 3 / Ch.7. Given the following array, write code to initialize all the elements to 0: int ed[100]; Hint: It can be done two different ways!

Software and Web Security 1. Root Cause Analysis. Abstractions Assumptions Trust. sws1 1

Princeton University. Computer Science 217: Introduction to Programming Systems. Data Types in C

C: Pointers, Arrays, and strings. Department of Computer Science College of Engineering Boise State University. August 25, /36

Fundamental of Programming (C)

Heap Arrays. Steven R. Bagley

PRINCIPLES OF OPERATING SYSTEMS

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

Memory and Addresses. Pointers in C. Memory is just a sequence of byte-sized storage devices.

High Performance Computing in C and C++

Chapter 2 (Dynamic variable (i.e. pointer), Static variable)

EL2310 Scientific Programming

Pointers and Arrays 1

Outline. Computer programming. Debugging. What is it. Debugging. Hints. Debugging

Chapter 5. Section 5.1 Introduction to Strings. CS 50 Hathairat Rattanasook

Malloc Lab & Midterm Solutions. Recitation 11: Tuesday: 11/08/2016

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018

Formal C semantics: CompCert and the C standard

In Java we have the keyword null, which is the value of an uninitialized reference type

Pointers (part 1) What are pointers? EECS We have seen pointers before. scanf( %f, &inches );! 25 September 2017

Announcements. assign0 due tonight. Labs start this week. No late submissions. Very helpful for assign1

Class Information ANNOUCEMENTS

Ch. 3: The C in C++ - Continued -

QUIZ on Ch.8. What is kit?

Reflections on using C(++) Root Cause Analysis

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

Lecture 4: Outline. Arrays. I. Pointers II. III. Pointer arithmetic IV. Strings

Physics 306 Computing Lab 5: A Little Bit of This, A Little Bit of That

EC312 Chapter 4: Arrays and Strings

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Part I Part 1 Expressions

Lecture 8 Dynamic Memory Allocation

Arrays in C C Programming and Software Tools. N.C. State Department of Computer Science

Writing Program in C Expressions and Control Structures (Selection Statements and Loops)

Today s lecture. Pointers/arrays. Stack versus heap allocation CULTURE FACT: IN CODE, IT S NOT CONSIDERED RUDE TO POINT.

CS107 Handout 08 Spring 2007 April 9, 2007 The Ins and Outs of C Arrays

CS 61c: Great Ideas in Computer Architecture

D Programming Language

Today s lecture. Continue exploring C-strings. Pointer mechanics. C arrays. Under the hood: sequence of chars w/ terminating null

Systems Programming and Computer Architecture ( )

ECE 15B COMPUTER ORGANIZATION

LESSON 5 FUNDAMENTAL DATA TYPES. char short int long unsigned char unsigned short unsigned unsigned long

Important From Last Time

Page 1. Today. Important From Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right?

Annotation Annotation or block comments Provide high-level description and documentation of section of code More detail than simple comments

Intermediate Programming, Spring 2017*

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

VARIABLES AND TYPES CITS1001

Course organization. Course introduction ( Week 1)

Hardware: Logical View

Slide Set 2. for ENCM 335 in Fall Steve Norman, PhD, PEng

C - Basics, Bitwise Operator. Zhaoguo Wang

Important From Last Time

Transcription:

Hacking in C Pointers Radboud University, Nijmegen, The Netherlands Spring 2019

Allocation of multiple variables Consider the program main(){ char x; int i; short s; char y;... } What will the layout of this data in memory be? Assuming 4-byte ints, 2-byte shorts, and little endian architecture 2

Printing addresses where data is located We can use & to see where data is located char x; int i; short s; char y; printf("x is allocated at %p \n", &x); printf("i is allocated at %p \n", &i); printf("s is allocated at %p \n", &s); printf("y is allocated at %p \n", &y); // Here %p is used to print pointer values Compiling with or without -O2 will reveal different alignment strategies 3

Data alignment Memory as a sequence of bytes... x i 0 i 1 i 2 i 3 s 0 s 1 y... But on a 32-bit machine, the memory is a sequence of 4-byte words x i 0 i 1 i 2 i 3 s 0 s 1 y... Now the data elements are not nicely aligned with the words, which will make execution slow, since CPU instructions act on words. 4

Data alignment Different allocations, with better/worse alignment x i 0 i 1 i 2 i 3 s 0 s 1 y... Lousy alignment, but uses minimal memory 5

Data alignment Different allocations, with better/worse alignment x i 0 i 1 i 2 i 3 s 0 s 1 y... Lousy alignment, but uses minimal memory x i 0 i 1 i 2 i 3 s 0 s 1 y Optimal alignment, but wastes memory 5

Data alignment Different allocations, with better/worse alignment x i 0 i 1 i 2 i 3 s 0 s 1 y... x i 0 i 1 i 2 i 3 s 0 s 1 y s 0 s 1 x y i 0 i 1 i 2 i 3... Lousy alignment, but uses minimal memory Optimal alignment, but wastes memory Possible compromise 5

Data alignment Compilers may introduce padding or change the order of data in memory to improve alignment. There are trade-offs here between speed and memory usage. 6

Data alignment Compilers may introduce padding or change the order of data in memory to improve alignment. There are trade-offs here between speed and memory usage. Most C compilers can provide many optional optimizations. E.g., use man gcc to check out the many optimization options of gcc. 6

Arrays 7

Arrays An array contains a collection of data elements with the same type. The size is constant. int test_array[10]; int a[] = {30,20}; test_array[0] = a[1]; printf("oops %d \n", a[2]); //will compile & run 8

Arrays An array contains a collection of data elements with the same type. The size is constant. int test_array[10]; int a[] = {30,20}; test_array[0] = a[1]; printf("oops %d \n", a[2]); //will compile & run Array bounds are not checked. Anything may happen when accessing outside array bounds. The program may crash, usually with a segmentation fault (segfault). 8

Array bounds checking The historic decision not to check array bounds is responsible for in the order of 50% of all the security vulnerabilities in software, in the form of so-called buffer overflow attacks. Other languages took a different (more sensible?) choice here. E.g. ALGOL60, defined in 1960, already included array bound checks. 9

Typical software security vulnerabilities Security bugs found in Microsoft s first security bug fix month (2002) Here buffer overflows are platform-specific. Some of the code defects and input validation problems might also be. Crypto problems are much more rare, but can be of very high impact. 10

Array bounds checking Tony Hoare in Turing Award speech on the design principles of ALGOL 60 The first principle was security:... A consequence of this principle is that every subscript was checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency. Unanimously, they urged us not to they knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law. [ C.A.R.Hoare, The Emperor s Old Clothes, Communications of the ACM, 1980] 11

Overrunning arrays Consider the program int y = 7; char a[2]; int x = 6; printf("oops %d \n", a[2]); What would you expect this program to print? 12

Overrunning arrays Consider the program int y = 7; char a[2]; int x = 6; printf("oops %d \n", a[2]); What would you expect this program to print? If the compiler allocates x directly after a, then (on a little-endian machine) it will print 6. There are no guarantees! The program could simply crash, or return any other number, re-format the hard drive, explode,... 12

Overrunning arrays Consider the program int y = 7; char a[2]; int x = 6; printf("oops %d \n", a[2]); What would you expect this program to print? If the compiler allocates x directly after a, then (on a little-endian machine) it will print 6. There are no guarantees! The program could simply crash, or return any other number, re-format the hard drive, explode,... By overrunning an array we can try to reverse-engineer the memory layout 12

Arrays and alignment The memory space allocated for an array is guaranteed to be contiguous, i.e. a[1] is allocated right after a[0]. 13

Arrays and alignment The memory space allocated for an array is guaranteed to be contiguous, i.e. a[1] is allocated right after a[0]. For good alignment, a compiler could again add padding at the end of arrays. 13

Arrays and alignment The memory space allocated for an array is guaranteed to be contiguous, i.e. a[1] is allocated right after a[0]. For good alignment, a compiler could again add padding at the end of arrays. E.g. a compiler might allocate 16 bytes rather than 15 bytes for char text[15]; 13

Arrays are passed by reference Arrays are always passed by reference. For example, given the function void increase_elt(int x[]) { x[1] = x[1]+23; } What is the value of a[1] after executing the following code? int a[2] = {1, 2}; increase_elt(a); 14

Arrays are passed by reference Arrays are always passed by reference. For example, given the function void increase_elt(int x[]) { x[1] = x[1]+23; } What is the value of a[1] after executing the following code? int a[2] = {1, 2}; increase_elt(a); 25 Recall call by reference from Imperative Programming 14

Pointers 15

Retrieving addresses of pointers using & We can find out where some data is allocated using the & operation. If int x = 12; then &x is the memory address where the value of x is stored, aka a pointer to x. 12 &x 16

Retrieving addresses of pointers using & We can find out where some data is allocated using the & operation. If int x = 12; then &x is the memory address where the value of x is stored, aka a pointer to x. 12 &x It depends on the underlying architecture how many bytes are needed to represent addresses: 4 on 32-bit machines, 8 on a 64-bit machine. 16

Declaring pointers Pointers are typed: the compiler keeps track of what data type a pointer points to int *p; // p is a pointer that points to an int float *f; // f is a pointer that points to a float 17

Creating and dereferencing pointers Suppose int y, z; int *p; // i.e. p points to an int How can we create a pointer to some variable? 18

Creating and dereferencing pointers Suppose int y, z; int *p; // i.e. p points to an int How can we create a pointer to some variable? Using & y = 7 p = &y; // assign the address of y to p How can we get the value that a pointer points to? 18

Creating and dereferencing pointers Suppose int y, z; int *p; // i.e. p points to an int How can we create a pointer to some variable? Using & y = 7 p = &y; // assign the address of y to p How can we get the value that a pointer points to? Using * y = 7 p = &y; // pointer p now points to y z = *p; // give z the value of what p points to 18

Creating and dereferencing pointers Suppose int y, z; int *p; // i.e. p points to an int How can we create a pointer to some variable? Using & y = 7 p = &y; // assign the address of y to p How can we get the value that a pointer points to? Using * y = 7 p = &y; // pointer p now points to y z = *p; // give z the value of what p points to Looking up what a pointer points to, with *, is called dereferencing. 18

Confused? draw pictures! int y = 7; int *p = &y; // pointer p now points to cell y int z = *p; // give z the value of what p points to 19

Pointer quiz What is the value of y? int y = 2; int x = y; y++; x++; 20

Pointer quiz What is the value of y? int y = 2; int x = y; y++; x++; 3 20

Pointer quiz What is the value of y? int y = 2; int x = y; y++; x++; 3 What is the value of y? int y = 2; int *x = &y; y++; (*x)++; 20

Pointer quiz What is the value of y? int y = 2; int x = y; y++; x++; 3 What is the value of y? int y = 2; int *x = &y; y++; (*x)++; 4 20

Note that * is used for 3 different purposes, with 3 different meanings 21

Note that * is used for 3 different purposes, with 3 different meanings 1. In declarations, to declare pointer types int *p; // p is a pointer to an int // i.e. *p is an int 21

Note that * is used for 3 different purposes, with 3 different meanings 1. In declarations, to declare pointer types int *p; // p is a pointer to an int // i.e. *p is an int 2. As a prefix operator on pointers int z = *p; 21

Note that * is used for 3 different purposes, with 3 different meanings 1. In declarations, to declare pointer types int *p; // p is a pointer to an int // i.e. *p is an int 2. As a prefix operator on pointers int z = *p; 3. Multiplication of numeric values 21

Note that * is used for 3 different purposes, with 3 different meanings 1. In declarations, to declare pointer types int *p; // p is a pointer to an int // i.e. *p is an int 2. As a prefix operator on pointers int z = *p; 3. Multiplication of numeric values Some legal C code can get confusing, e.g. z = 3 * *p 21

Style debate: int* p or int *p? What can be confusing in int *p = &y; is that this is an assignment to p, not *p 22

Style debate: int* p or int *p? What can be confusing in int *p = &y; is that this is an assignment to p, not *p Some people prefer to write int* p = &y; 22

Style debate: int* p or int *p? What can be confusing in int *p = &y; is that this is an assignment to p, not *p Some people prefer to write int* p = &y; but C purists will argue this is C++ style. 22

Style debate: int* p or int *p? What can be confusing in int *p = &y; is that this is an assignment to p, not *p Some people prefer to write int* p = &y; but C purists will argue this is C++ style. Downside of writing int* int* x, y, z; declares x as a pointer to an int and y and z as int... 22

Still not confused? x = 3; p1 = &x; p2 = &p1; z = **p2 + 1; What will the value of z be? 23

Still not confused? x = 3; p1 = &x; p2 = &p1; z = **p2 + 1; What will the value of z be? What should the types of p1 and p2 be? 23

Still not confused? pointers to pointers int x = 3; int *p1 = &x; // p1 points to an int int **p2 = &p1; // p2 points to a pointer to an int int z = **p2 + 1; 24

Pointer test (Hint: example exam question) int y = 2; int z = 3; int* p = &y; int* q = &z; (*q)++; *p = *p + *q; q = q + 1; printf("y is %d\n", y); What is the value of y at the end? 25

Pointer test (Hint: example exam question) int y = 2; int z = 3; int* p = &y; int* q = &z; (*q)++; *p = *p + *q; q = q + 1; printf("y is %d\n", y); What is the value of y at the end? 6 25

Pointer test (Hint: example exam question) int y = 2; int z = 3; int* p = &y; int* q = &z; (*q)++; *p = *p + *q; q = q + 1; printf("y is %d\n", y); What is the value of y at the end? 6 What is the value of *p at the end? 25

Pointer test (Hint: example exam question) int y = 2; int z = 3; int* p = &y; int* q = &z; (*q)++; *p = *p + *q; q = q + 1; printf("y is %d\n", y); What is the value of y at the end? 6 What is the value of *p at the end? 6 25

Pointer test (Hint: example exam question) int y = 2; int z = 3; int* p = &y; int* q = &z; (*q)++; *p = *p + *q; q = q + 1; printf("y is %d\n", y); What is the value of y at the end? 6 What is the value of *p at the end? 6 What is the value of *q at the end? 25

Pointer test (Hint: example exam question) int y = 2; int z = 3; int* p = &y; int* q = &z; (*q)++; *p = *p + *q; q = q + 1; printf("y is %d\n", y); What is the value of y at the end? 6 What is the value of *p at the end? 6 What is the value of *q at the end? We don t know! q points to some memory cell after z in the memory 25

Pointer arithmetic You can use + and - with pointers. The semantics depends on the type of the pointer: adding 1 to a pointer will go to the next location, given the size and the data type that it points to. 26

Pointer arithmetic You can use + and - with pointers. The semantics depends on the type of the pointer: adding 1 to a pointer will go to the next location, given the size and the data type that it points to. For example, if int *ptr; char *str; then ptr + 2 means Add 2 * sizeof(int) to the address in ptr str + 2 means Add 2 to the address in str (because sizeof(char) is 1) 26

Using pointers as arrays The way pointer arithmetic works means that a pointer to the head of an array behaves like an array. Suppose int a[10] = {1,2,3,4,5,6,7,8,9,19}; int *p = (int *) &a; // the address of the head of a // treated as pointer to an int 27

Using pointers as arrays The way pointer arithmetic works means that a pointer to the head of an array behaves like an array. Suppose int a[10] = {1,2,3,4,5,6,7,8,9,19}; int *p = (int *) &a; // the address of the head of a // treated as pointer to an int Now p + 3 points to a[3] So we use addition to pointer p to move through the array 27

Pointer arithmetic for strings What is the output of char *msg = "hello world"; char *t = msg + 6; printf("t points to the string %s.", t); 28

Pointer arithmetic for strings What is the output of char *msg = "hello world"; char *t = msg + 6; printf("t points to the string %s.", t); This will print t points to the string world. 28

Arrays vs pointers Arrays and pointers behave similarly, but are very different in memory Consider int a[]; int *p 29

Arrays vs pointers Arrays and pointers behave similarly, but are very different in memory Consider int a[]; int *p A difference: a will always refer to the same array, whereas p can point to different arrays over time 29

Using pointers as arrays Suppose Then int a[10] = {1,2,3,4,5,6,7,8,9,10}; int sum = 0; for (int i = 0; i!= 10; i++) { sum = sum + a[i]; } can also be implemented using pointer arithmetic 30

Using pointers as arrays Suppose int a[10] = {1,2,3,4,5,6,7,8,9,10}; This cast is needed Then because a is an integer array, so int sum = 0; &a is a pointer to for (int i = 0; i!= 10; i++) { int[], not pointer sum = sum + a[i]; to an int. } An alternative would be to write *p = &(a[0]) can also be implemented using pointer arithmetic int sum = 0; for (int *p = (int sum = sum + *p; } *)&a; p!= &(a[10]); p++) { Instead of p!= &(a[10]) we could also write p!= ((int *)&a)+10 30

Using pointers as arrays Suppose int a[10] = {1,2,3,4,5,6,7,8,9,10}; This cast is needed Then because a is an integer array, so int sum = 0; &a is a pointer to for (int i = 0; i!= 10; i++) { int[], not pointer sum = sum + a[i]; to an int. } An alternative would be to write *p = &(a[0]) can also be implemented using pointer arithmetic int sum = 0; for (int *p = (int *)&a; p!= &(a[10]); p++) { sum = sum + *p; } Instead of p!= &(a[10]) we could also write p!= ((int *)&a)+10 But nobody in their right mind would 30

A problem with pointers:... int i; int j; int *x;... // lots of code omitted i = 5; j++ // what is the value of i here? (*x)++; // what is the value of i here? 31

A problem with pointers:... int i; int j; int *x;... // lots of code omitted i = 5; j++ // what is the value of i here? (*x)++; // what is the value of i here? 5 5 or 6, depending on whether *x points to i 31

Two pointers are called aliases if they point to the same location int i = 5; int *x = &i; int *y = &i; // x and y are aliases now (*x)++; // now i and *y have also changed to 6 32

Two pointers are called aliases if they point to the same location int i = 5; int *x = &i; int *y = &i; // x and y are aliases now (*x)++; // now i and *y have also changed to 6 Keeping track of pointers, in the presence of potential aliasing, can be really confusing, and really hard to debug... 32

Recap so far We have seen pointers, e.g. of type char *p with the operations * and & These are tricky to understand, unless you draw pictures 33

Recap so far We have seen pointers, e.g. of type char *p with the operations * and & These are tricky to understand, unless you draw pictures We can have aliasing, where two names, say *p and c, can refer to the same variable (location in memory) 33

Recap so far We have seen pointers, e.g. of type char *p with the operations * and & These are tricky to understand, unless you draw pictures We can have aliasing, where two names, say *p and c, can refer to the same variable (location in memory) We can use pointer arithmetic, and e.g. write *(p+1), and use this to access arrays 33

Recap so far We have seen pointers, e.g. of type char *p with the operations * and & These are tricky to understand, unless you draw pictures We can have aliasing, where two names, say *p and c, can refer to the same variable (location in memory) We can use pointer arithmetic, and e.g. write *(p+1), and use this to access arrays Confusingly, the meaning of addition for pointers depends on their type, as +1 for pointers of type int * really means +sizeof(int) 33

The potential of pointers: inspecting raw memory To inspect a piece of raw memory, we can cast it to a unsigned char * and then inspect the bytes float f = 3.14; unsigned char *p = (unsigned char *) &f; printf("the representation of float %f is", f); for (int i = 0; i < sizeof(float); i++, p++);) { printf("%d", *p); } printf("\n"); 34

Turning pointers into numbers intptr_t defined in stdint.h is an integral type that is guaranteed to be wide enough to hold pointers. int *p; // p points to an int intptr_t i = (intptr_t) p; // the address as a number p++; i++; // Will i and p be the same? 35

Turning pointers into numbers intptr_t defined in stdint.h is an integral type that is guaranteed to be wide enough to hold pointers. int *p; // p points to an int intptr_t i = (intptr_t) p; // the address as a number p++; i++; // Will i and p be the same? // No! i++ increases by 1, p++ with sizeof(int) 35

Turning pointers into numbers intptr_t defined in stdint.h is an integral type that is guaranteed to be wide enough to hold pointers. int *p; // p points to an int intptr_t i = (intptr_t) p; // the address as a number p++; i++; // Will i and p be the same? // No! i++ increases by 1, p++ with sizeof(int) There is also an unsigned version of intptr_t: uintptr_t 35

Strings 36

Strings Having seen arrays and pointers, we can now understand C strings char *s = "hello world\n"; 37

Strings Having seen arrays and pointers, we can now understand C strings char *s = "hello world\n"; C strings are char arrays, which are terminated by a special null character, aka a null terminator, which is written as \0 37

Strings Having seen arrays and pointers, we can now understand C strings char *s = "hello world\n"; C strings are char arrays, which are terminated by a special null character, aka a null terminator, which is written as \0 There is a special notation for string literals, between double quotes, where the null terminator is implicit. 37

Strings Having seen arrays and pointers, we can now understand C strings char *s = "hello world\n"; C strings are char arrays, which are terminated by a special null character, aka a null terminator, which is written as \0 There is a special notation for string literals, between double quotes, where the null terminator is implicit. As other arrays, we can use both the array type char[] and the pointer type char * for them. 37

String problems Working with C strings is highly error prone! There are two problems 38

String problems Working with C strings is highly error prone! There are two problems 1. As for any array, there are no array bounds checks so it s the programmer s responsibility not to go outside the array bounds 38

String problems Working with C strings is highly error prone! There are two problems 1. As for any array, there are no array bounds checks so it s the programmer s responsibility not to go outside the array bounds 2. It is also the programmer s responsibility to make sure that the string is properly terminated with a null character. If a string lacks its null terminator, e.g. due to problem 1, then standard functions to manipulate strings will go off the rails. 38

Safer strings and array? There is no reason why programming language should not provide safe versions of strings (or indeed arrays). Other languages offer strings and arrays which are safer in that: 39

Safer strings and array? There is no reason why programming language should not provide safe versions of strings (or indeed arrays). Other languages offer strings and arrays which are safer in that: 1. Going outside the array bounds will be detected at runtime (e.g. Java) 39

Safer strings and array? There is no reason why programming language should not provide safe versions of strings (or indeed arrays). Other languages offer strings and arrays which are safer in that: 1. Going outside the array bounds will be detected at runtime (e.g. Java) 2. Which will be resized automatically if they do not fit (e.g. Python) 39

Safer strings and array? There is no reason why programming language should not provide safe versions of strings (or indeed arrays). Other languages offer strings and arrays which are safer in that: 1. Going outside the array bounds will be detected at runtime (e.g. Java) 2. Which will be resized automatically if they do not fit (e.g. Python) 3. The language will ensure that all strings are null-terminated (e.g. C++, Java and Python) 39

Safer strings and array? There is no reason why programming language should not provide safe versions of strings (or indeed arrays). Other languages offer strings and arrays which are safer in that: 1. Going outside the array bounds will be detected at runtime (e.g. Java) 2. Which will be resized automatically if they do not fit (e.g. Python) 3. The language will ensure that all strings are null-terminated (e.g. C++, Java and Python) More precisely, the programmer does not even have to know how strings are represented, and whether null-terminator exists and what they look like: the representation of strings is completely transparant/invisible to the programmer. 39

Safer strings and array? There is no reason why programming language should not provide safe versions of strings (or indeed arrays). Other languages offer strings and arrays which are safer in that: 1. Going outside the array bounds will be detected at runtime (e.g. Java) 2. Which will be resized automatically if they do not fit (e.g. Python) 3. The language will ensure that all strings are null-terminated (e.g. C++, Java and Python) More precisely, the programmer does not even have to know how strings are represented, and whether null-terminator exists and what they look like: the representation of strings is completely transparant/invisible to the programmer. Moral of the story: if you can, avoid using standard C strings. E.g. in C++, use C++ type strings; in C, use safer string libraries. 39

A final string pecularity Strig literals, as in char *msg = "hello, world"; are meant to be constant or read-only: you are not supposed to change the character that made up a string literal. 40

A final string pecularity Strig literals, as in char *msg = "hello, world"; are meant to be constant or read-only: you are not supposed to change the character that made up a string literal. Unfortunately, this does not mean that C will prevent this. It only means that the C standard defines changing a character in a string literal as having undefined behaviour 40

A final string pecularity Strig literals, as in char *msg = "hello, world"; are meant to be constant or read-only: you are not supposed to change the character that made up a string literal. Unfortunately, this does not mean that C will prevent this. It only means that the C standard defines changing a character in a string literal as having undefined behaviour E.g. char *t = msg + 6; *t = ; ; Has undefined behaviour, i.e. anything may happen. 40

A final string pecularity Strig literals, as in char *msg = "hello, world"; are meant to be constant or read-only: you are not supposed to change the character that made up a string literal. Unfortunately, this does not mean that C will prevent this. It only means that the C standard defines changing a character in a string literal as having undefined behaviour E.g. char *t = msg + 6; *t = ; ; Has undefined behaviour, i.e. anything may happen. Compilers can emit warnings if you change string literals, e.g. gcc -Wwrite-strings 40

Recap We have seen 41

Recap We have seen The different C types 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... implicit conversions and explicit conversions (casts) between them 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... implicit conversions and explicit conversions (casts) between them arrays int[] 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... implicit conversions and explicit conversions (casts) between them arrays int[] pointers int * with the operations * and & 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... implicit conversions and explicit conversions (casts) between them arrays int[] pointers int * with the operations * and & C strings, as special char arrays 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... implicit conversions and explicit conversions (casts) between them arrays int[] pointers int * with the operations * and & C strings, as special char arrays Their representation 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... implicit conversions and explicit conversions (casts) between them arrays int[] pointers int * with the operations * and & C strings, as special char arrays Their representation How these representations can be broken, i.e. how we can inspect and manipulate the underlying representation (e.g. with casts) 41

Recap We have seen The different C types primitive types (unsigned) char, short, int, long, long long, float... implicit conversions and explicit conversions (casts) between them arrays int[] pointers int * with the operations * and & C strings, as special char arrays Their representation How these representations can be broken, i.e. how we can inspect and manipulate the underlying representation (e.g. with casts) Some things that can go wrong e.g. due to access outside array bounds or integer under/overflow 41