CS61 Lecture II: Data Representation! with Ruth Fong, Stephen Turban and Evan Gastman! Abstract Machines vs. Real Machines!

Similar documents
When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to.

CS 11 C track: lecture 5

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018

Learning Objectives. A Meta Comment. Exercise 1. Contents. From CS61Wiki

CSE 374 Programming Concepts & Tools. Brandon Myers Winter 2015 Lecture 11 gdb and Debugging (Thanks to Hal Perkins)

QUIZ How do we implement run-time constants and. compile-time constants inside classes?

Lecture 07 Debugging Programs with GDB

CS 31: Intro to Systems Pointers and Memory. Martin Gagne Swarthmore College February 16, 2016

Heap Arrays. Steven R. Bagley

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial

Lab 8. Follow along with your TA as they demo GDB. Make sure you understand all of the commands, how and when to use them.

CS61, Fall 2012 Section 2 Notes

int $0x32 // call interrupt number 50

A heap, a stack, a bottle and a rack. Johan Montelius HT2017

CSE 374 Programming Concepts & Tools

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #29 Arrays in C

Heap Arrays and Linked Lists. Steven R. Bagley

CS 103 Lab - Party Like A Char Star

CS 241 Honors Memory

CMSC 341 Lecture 2 Dynamic Memory and Pointers

Outline. Computer programming. Debugging. What is it. Debugging. Hints. Debugging

CSE 303: Concepts and Tools for Software Development

Intermediate Programming, Spring 2017*

Class Information ANNOUCEMENTS

Section Notes - Week 1 (9/17)

PIC 10A Pointers, Arrays, and Dynamic Memory Allocation. Ernest Ryu UCLA Mathematics

6.S096: Introduction to C/C++

18-349: Introduction to Embedded Real-Time Systems

Programs in memory. The layout of memory is roughly:

Week 5, continued. This is CS50. Harvard University. Fall Cheng Gong

printf( Please enter another number: ); scanf( %d, &num2);

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

CS102 Software Engineering Principles

1 Dynamic Memory continued: Memory Leaks

CSCI0330 Intro Computer Systems Doeppner. Lab 02 - Tools Lab. Due: Sunday, September 23, 2018 at 6:00 PM. 1 Introduction 0.

Dynamic Allocation in C

CSci 4061 Introduction to Operating Systems. Programs in C/Unix

CS354 gdb Tutorial Written by Chris Feilbach

Recitation 10: Malloc Lab

Computer Science 50: Introduction to Computer Science I Harvard College Fall Quiz 0 Solutions. Answers other than the below may be possible.

Chapter 17 vector and Free Store. Bjarne Stroustrup

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

Consider the above code. This code compiles and runs, but has an error. Can you tell what the error is?

18-600: Recitation #3

Programming Studio #9 ECE 190

Principles of Programming Pointers, Dynamic Memory Allocation, Character Arrays, and Buffer Overruns

C++ for Java Programmers

ECS 153 Discussion Section. April 6, 2015

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

CS61C : Machine Structures

DAY 3. CS3600, Northeastern University. Alan Mislove

LAB #8. Last Survey, I promise!!! Please fill out this really quick survey about paired programming and information about your declared major and CS.

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 7

We would like to find the value of the right-hand side of this equation. We know that

Parallel Programming: Background Information

Parallel Programming: Background Information

Announcements. assign0 due tonight. Labs start this week. No late submissions. Very helpful for assign1

In Java we have the keyword null, which is the value of an uninitialized reference type

Structure of Programming Languages Lecture 10

CS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II

Pointers and Memory 1

In further discussion, the books make other kinds of distinction between high level languages:

TI2725-C, C programming lab, course

Memory Allocation in C C Programming and Software Tools. N.C. State Department of Computer Science

Intro to x86 Binaries. From ASM to exploit

CS 61C: Great Ideas in Computer Architecture C Pointers. Instructors: Vladimir Stojanovic & Nicholas Weaver

CS 103 Lab 6 - Party Like A Char Star

Run Time Environment

Software Design and Analysis for Engineers

MPATE-GE 2618: C Programming for Music Technology. Unit 4.1

CS113: Lecture 5. Topics: Pointers. Pointers and Activation Records

Scientific Programming in C IX. Debugging

Chapter 17 vector and Free Store

CSCI-1200 Data Structures Spring 2016 Lecture 6 Pointers & Dynamic Memory

Ch. 11: References & the Copy-Constructor. - continued -

Lecture07: Strings, Variable Scope, Memory Model 4/8/2013

Limitations of the stack

Pointers, Dynamic Data, and Reference Types

Programming. Pointers, Multi-dimensional Arrays and Memory Management

LAB #8. GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:

NEXT SET OF SLIDES FROM DENNIS FREY S FALL 2011 CMSC313.

All Paging Schemes Depend on Locality. VM Page Replacement. Paging. Demand Paging

CA31-1K DIS. Pointers. TA: You Lu

Debugging! The material for this lecture is drawn, in part, from! The Practice of Programming (Kernighan & Pike) Chapter 5!

EL2310 Scientific Programming

Princeton University Computer Science 217: Introduction to Programming Systems. Data Structures

Pointer Casts and Data Accesses

Computer Programming. The greatest gift you can give another is the purity of your attention. Richard Moss

QUIZ. What is wrong with this code that uses default arguments?

ECE 250 / CS 250 Computer Architecture. C to Binary: Memory & Data Representations. Benjamin Lee

Computer Architecture and System Software Lecture 07: Assembly Language Programming

Deep C (and C++) by Olve Maudal

Practical Malware Analysis

CS61C : Machine Structures

Recitation #12 Malloc Lab - Part 2. November 14th, 2017

CS 61C: Great Ideas in Computer Architecture. C Arrays, Strings, More Pointers

We first learn one useful option of gcc. Copy the following C source file to your

Dynamic Allocation in C

CS 11 C track: lecture 6

377 Student Guide to C++

Transcription:

CS61 Lecture II: Data Representation with Ruth Fong, Stephen Turban and Evan Gastman Abstract Machines vs. Real Machines Abstract machine refers to the meaning of a program in a high level language (i.e. C) Real machine refers to the meaning of a program executed on real hardware (laptop, phone, car, etc.) Real model: eg. (1) You want code a program that prints the letter A (2) You code the program in c and save it as a.c file. (3) Your compiler processes the.c file and turns it into machine instructions. (4) Those machine instructions are taken by your processor, which directs your screen, disk, memory, CPU, etc. (5) You have printed the letter A onto your screen. Abstract model consists only of steps (1), (2), and (5). Why do we care about steps 3 and 4? If we know the whole route, we can make our code become faster, more robust and more powerful and prevent against the dangerous lapses that can happen in our code. I. Memory - Eddie's metaphor: Memory is like an enormous post office with a bunch of boxes where each box can only hold a number between 0 and 255. To first order, we treat memory as random access. What does that mean? You can look in one box and then immediately look at another box as look as we know it's address. That is, we can jump from box to box in a unit of time (we don't have to iterate through boxes sequentially). -Address access happens in 1 unit of time. How are addresses written? Numbers in the interval [0 and 2^32). Exercise: Open up mexplore.c Add a line: char* ptr = &ch; Change following line: fprintf(stderr, "%c\n", *ptr); What's this doing compared to the original function? Answer #1: The same thing. Answer #2: Memory is not handled the same way.

A smart compiler can make the pointer disappear, thus optimizing your program. Optimized meant that the compiler recognizes meaning of the program (print A ). Let's use GDB to see what's going on. GDB is a program that hooks up to a different program and allows us to run it but also pause it and snoop around in its execution. gdb./mexplore Make sure you run "make" first b mexplore:c:7 Set a breakpoint in mexplore.c at line 7 What happened? GDB changed the program in order to allow us to stop and look at the state of the real machine r run p ptr print the value of ptr p *ptr = 'B' set the variable c continue prints B Now, let s use an optimized program. emacs -mw mexplore.c How to create the optimized version? make clean make mexplore See "-O0"? That means don't optimize make mexplore.optimized "-O3": optimized Change program again to just print ptr, not *ptr. The optimized version doesn t print the pointer. The compiler understands the meaning of the program and so we don t need the pointer. C lets us bounce between the abstract machine and the real machine (that's how we can print the address of the pointer). In OCaml, the abstract machine has very strong boundaries. The value of the pointer keeps changing when we run the program over and over again. Break

Current code: char ch = 'A'; char* ptr = &ch; fprintf(stderr, "%c\n", ptr[8]); Real machine? Print out the character that's 8 spaces away from the pointer's address. Abstract machine? Undefined behavior. Nasal Demons: If the compiler can prove that your program has undefined behavior, then they can make demons fly out of your nose. Why did the segmentation fault core dump when we run./mexplore.c? Writing to memory that I haven t been allowed to use on the stack (student answer) Eddie: How do we figure out what s explicitly going on? GDB 0x42 in?? () Segmentation Fault Eddie flails to mimic what happens when you run GDB. bt Take a look at the backtrace. How did we end up executing code at an address that doesn t exist? b main next r char ch = 'A'; char* ptr = &ch; ptr[8] = B ; fprintf(stderr, "%c\n", ptr[8]); Run the code again and step through each line (using GDB) Things look okay as we go line-by-line, we re about to exit main after we return from main, we execute some weird code somewhere. Return address What if we overwrite the return address to contain garbage? It goes into garbage land (this is all happening in the real machine; in the abstract machine, the meaning can be anything at all)

To come in future lectures: return value bugs. Second Example: Change the execution of the program (Change b to c ) The segmentation fault occurs in a different address. (42 changes to 43 when we change b to c ; the address printed out by GDB) fprintf(stderr, %c\n, ptr[8]); //c stands for character fprintf(stderr, %d\n, ptr[8]); // d for decimal fprintf(stderr, %o\n, ptr[8]); //o stands for octodecimal fprintf(stderr, %x\n, ptr[8]); // x for hexadecimal Output: C 67 103 43 Set ptr[8] = 255; // ff == Super-important. Another ff in the address. We re figuring out the relationship between the input numbers and the crash site. This is what you can use to debug char ch = A ; hexdump(strderr, &ch, 1); hexdump function: print out contents of memory &: address of operator; it takes an object and turns it into a pointer (the C representation of an address) [reference] *: address => object [dereference] Output of hexdump: address hexidecimal character correspondence #include <stdlib.h> //need to include the library that has malloc char ch_global = B ; const char ch_constant = C ; // const means this variable can never be changed int main() { char ch = A ; char* ch_heap = malloc(1); *ch_heap = D ; //the lifetime of this object is independent of the running of the program. The function that created it can return (i.e. end), and the dynamically-allocated object will continue to exist until it is freed. // C&P hexdump fprintf(stderr, %c\n, ch);

hexdump(stderr, &ch, 1); // stderr prints the output of the function to other hexdump(stderr, &ch_global, 1); hexdump(stderr, &ch_constant, 1); hexdump(stderr, &ch_heap, 1); } There s only one version of the global variable for every program. Heap variables are dynamically allocated: the lifetime of the variable that was allocated is independent of the function structure of the program. The object lasts between malloc and free (irregardless of returned functions). (remember to include <stdlib.h>, which defines malloc) Expectation: ABCD B and C are right next to each other (addresses were 0804a040 and 08048b48, respectively); A looks enormous (address was bfc08143); D looks far away, but not that far away. In address land: address run from 0 to fffffff (2^32-1). 0 (2^32)-1 -------(global)--------(heap )--------------------------------------(<-- local, stack)------------------------ 0 0x400 0x800 0xC00.. (12) 0xffffffffff (one fourth) (one half) (three fourths) (largest value) local variable near 0xC00.. (Do change: are in the stack) global variables near 0x080 (Do not change) heap variables near 0x090. (Do change: are in the heap) *note stack and heap addresses change because when they do change it is harder to hack into your computer x86: name of the architecture of Intel chip The first Intel chip that matters to us is the 8086, second 8186,, Pentium Pro x86 (because x is a variable) Local variables are stored on the stack; they are stored roughly 3/4ths of memory and grown downwards (towards 0x00) Global variables start close to 0. What happens when they crash into each other? seg fault Heap variables start close to 0 as well, and grow upwards (towards the stack)

Object: local variable, global variable, something allocated dynamically The compiler defines that every object is in separate memory (with a few exceptions), objects are stored in separate locations. Program behaves as if we have an infinite amount of memory. But this is not actually true. The compiler is not allowed to let things overlap, so it will cause a segmentation fault. We saw a stack overflow already when we had an infinite recursive stack. The abstract machine says that memory will never overlap. However, in the real machine we see that heap values will fill up the heap and local values will fill up the stack. (Cannot assign a value to a constant variable) // trying really hard to set the constant * (char*) &ch_constant = E ; // casting, telling the compiler to not report an error What s in between the heap and stack? How can we find out? hexdump(stderr, (&ch + &ch_heap), 1) // error We can t add pointers in C, BUT we can subtract pointers and we can add numbers. Still won t compile though because we can t subtract pointer types that aren t the same. &ch_heap + (&ch - (char *) &ch_heap ) / 2, ch_heap is a variable on the stack hexdump prints what s at the address passed in hexdump(&ch_heap) // print the entire value of the pointer address of ch_heap (in reverse) ch_heap + (&h - ch_heap) / 2 // core dumps Summary: We began with a metaphor of a postman in a post office. Bottom line: Nothing between the heap and the stack (except maybe dragons) Why? However, if the post office touches many of the boxes he dies Because, there are other people in the world who are spambots; lots of CS folks trying to stop them. Best way to stop them: As soon as a program is going wrong - stop them (why seg fault happens and why novice C programmers hate it) Good property for improved protection of your programs

Problem Set: You will create a better memory allocator that helps detect when you access invalid programs. let s gooooo Love, Evan, Stephen, Ruth