CA341 - Comparative Programming Languages David Sinclair Dynamic Data Structures Generally we do not know how much data a program will have to process. There are 2 ways to handle this: Create a fixed data structure. This is very inefficient and wasteful, especially when the individual data elements vary in size. Create dynamic data structure that grow and shrink as required. As more data arrives we request memory from a memory pool. When we no longer need the memory holding the data, it is returned to the memory pool. A pointer is a value that is a reference to a block of memory. Many languages support pointers, either explicitly (such as C, C++ and Ada) or implicitly (such as Java). In languages designed for system-level programming, pointers are used to hold the address of memory-mapped hardware resources.
Managing Dynamic Memory There are 2 operations on dynamic memory: Allocate memory. Most languages handle this essentially the same way; you request a block of memory big enough to a defined data element. The run-time system allocates a sufficient block of memory for the memory pool and returns a pointer to the allocated block. Release memory. The allocated memory is released back to the memory pool. This is where languages differ. In some languages, like C, the release of memory is controlled explicitly by the programmer. If memory is not released correctly, it leaks and become lost. In other languages, like C++, it is a mix of explicit programmer control and data structure lifetime. In some languages, like Java, it is totally automatic, and this process is called garbage collection. Pointers and Dynamic Memory in C In C, a pointer is declared as type *name; A pointer holds an address in memory. The address of a variable called name is &name. *ptr accesses the memory the a pointer ptr points to. p->x accesses the element x of a struct that p points to. int j = 1, k= 3; int ip ; ip = &j ; printf ( %d\n, ( ip)++); There are several library functions (stdlib) that request a block of dynamic memory. The most common is malloc(size) and it returns a pointer to the allocated block. If there is insufficient memory, it returns the value NULL. The library function that releases dynamic memory pointed to by p is free(p).
Lists in C The following is one way to implement lists in C. It is not the best as it is specific to a list of ints. A better way is the Standard Template Library, STL, which allows the programmer to create and manipulate lists of different types. / Example list program with pointers / / and memory management in C / #include <stdio.h> #include <stdlib.h> struct Node int item ; struct Node next ; ; Lists in C (2) void add to list ( struct Node list, int value ) struct Node ptr = l ist ; struct Node new node = malloc ( sizeof ( struct Node )); new node >item = value ; new node >next = ptr ; ptr = new node ; return ; void print list ( struct Node l ist ) struct Node ptr = l ist ; printf ( [ ); while ( ptr!= ( struct Node ) NULL) printf ( %d, ptr >item ); ptr = ptr >next ; printf ( ]\n ); return ;
Lists in C (3) void delete item ( struct Node list, int item) struct Node ptr, prev = ( struct Node )NULL; ptr = l ist ; while ( ptr!= ( struct Node )NULL && ptr >item!= item) / relies on order of evaluation / prev = ptr ; ptr = ptr >next ; if ( ptr!= ( struct Node )NULL) / found / if (prev == ( struct Node )NULL) / 1st in list? / l ist = ptr >next ; else prev >next = ptr >next ; / if not previous node / / points to next node / free ( ptr ); / release current node / return ; Lists in C (4) int main ( int argc, char argv) struct Node my list = ( struct Node ) NULL; delete item (&my list, 1); print list ( my list ); add to list (&my list, 1); add to list (&my list, 2); add to list (&my list, 3); add to list (&my list, 4); print list ( my list ); delete item (&my list, 4); delete item (&my list, 2); print list ( my list ); delete item (&my list, 5); print list ( my list ); return (0);
Some things to note: Lists in C (5) This is a singly linked list where each Node in the list contains a pointer to the next Node in the list. The next pointer in the last node is NULL. struct Node **list defines list as a pointer to a pointer to a Node. The first argument of add to list and delete item are the address of the list, &my list. Every time we add an element we request a block of dynamic memory to hold a Node. Very careful programme structure is required to ensure that free() is called whenever a block of dynamic memory is no longer needed. This programme leaks memory. Where? Lists in C++ The same programme but this time in C++. In C++, new allocates dynamic memory, and delete releases dynamic memory. #include <iostream> using namespace std ; class List struct Node int item ; Node next ; ; Node head ; public : List () // constructor, called when an instance come alive head = NULL;
Lists in C++ (2) List () // destructor, called when an instance lifetime ends while(head!= NULL) Node n = head >next ; delete head ; head = n; void add( int value ) Node n = new Node; n >item = value ; n >next = head ; head = n; Lists in C++ (3) void remove ( int item) Node n = head, prev = NULL; while (n!= NULL && n >item!= item) prev = n; n = n >next ; if (n!= NULL) if (prev == NULL) head = n >next ; else prev >next = n >next ; delete n;
Lists in C++ (4) void print (void ) Node n = head ; cout << [ ; while (n!= NULL) cout << n >item << ; n = n >next ; cout << ]\n ; ; int main ( int argc, char argv) List my list ; my list. remove (1); my list. print (); Lists in C++ (5) my list.add (1); my list.add (2); my list.add (3); my list.add (4); my list. print (); my list. remove (4); my list. remove (2); my list. print (); my list. remove (5); my list. print (); return (0); This C++ programme doesn t leak memory like the equivalent C programme. Why?
Lists in Java The same programme again but this time in Java. In Java, new allocates dynamic memory, but there is no explicit memory release. Instead, dynamic memory that is no longer used is recovered by a garbage collection process. File: LinkedLists.java class Node public int item ; public Node next ; public Node () next = null ; public void display () System. out. print (item + ); Lists in Java (2) public class LinkedList private Node head ; public LinkedList () head = null ; public void add ( int item) Node newnode = new Node (); newnode. item = item ; newnode. next = head ; head = newnode ;
Lists in Java (3) public void remove ( int item) Node temp = head, prev = null ; while (temp!= null && temp. item!= item) prev = temp; temp = temp. next ; if (temp!= null ) if (prev == null ) head = temp. next ; else prev. next = temp. next ; public void print () Node temp = head ; Lists in Java (4) System. out. print ( [ ); while (temp!= null ) temp. display (); temp = temp. next ; System. out. println ( ] ); File: LinkedListsMain.java public class LinkedListMain public static void main ( String args []) LinkedList mylist = new LinkedList (); mylist. remove (1); mylist. print ();
mylist.add (1); mylist.add (2); mylist.add (3); mylist.add (4); mylist. print (); mylist. remove (4); mylist. remove (2); mylist. print (); mylist. remove (5); mylist. print (); return ; Lists in Java (5) All the dynamic memory recovery happens quietly in the background. The cost is that you never know when the garbage collection will take place and how long it will last. This is an issue for real-time applications. Garbage Collection We will focus on the Java garbage collection process in order to explain the basic operation of garbage collection in other languages. The memory used to store Java objects is called the heap. Whenever the new operator is called to create a new object, sufficient memory is allocated from the heap. The new operator returns a reference to the allocated heap memory. There are two major approaches to garbage collection: reference counts mark and sweep Reference Counts Every time a reference to a block of allocated heap memory is assigned to a variable, the reference count for that block of allocated heap memory is incremented.
Garbage Collection (2) When a variable that contains a reference to a block of allocated heap memory goes out of scope, the reference count for that block of allocated heap memory is decremented. When the reference count for a block of heap memory becomes zero, then that block of memory is no longer used and is available for garbage collection. Mark and Sweep Mark and Sweep is the approach used in Java s garbage collection. Java s garbage collection occurs in the Java Virtual Machine (JVM). The JVM can implement the garbage collection anyway it wishes as long as it meets the JVM Specification. The most common JVM is Oracle s HotSpot JVM. HotSpot has several garbage collectors, each optimised for different use cases. However they all follow the same basic structure: Garbage Collection (3) They are stop-the-world events. All other threads/activities are suspended until the garbage collection process finishes. They initially perform a mark and sweep. The mark phase starts with every garbage collection root. These are: local variables and input parameters of currently executing modules active threads static fields of loaded classes JNI references The garbage collector traverses the object graph from the garbage collection roots. Every object that the garbage collector visits is marked as alive. In the sweep phase unmarked objects are ready for garbage collection and are returned to the heap.
Garbage Collection (4) Memory can be optionally compacted after the garbage collector deletes objects, so that the remaining allocated blocks are in a contiguous block of memory at the start of the heap. This makes it easier to allocate memory to new objects sequentially after the block of memory allocated to existing objects. They are generational. Generational garbage collectors work on the assumptions that most objects are short-lived and will be ready for garbage collection soon after creation. In the HotSpot JVM, the heap is divided into three sections. The first section is called the Young Generation and it consists of Eden where all new objects are created, and two survivour areas to which objects are moved from Eden if they survive a garbage collection cycle. Garbage Collection (5) The second section is called the Old Generation where long-lived objects from the Young Generation are eventually moved to. The third section is called the Permanent Generation. This contains the programme s classes and methods. Classes that are no longer in use may be garbage collected from the Permanent Generation. A key feature of the garbage collection process in Java is that it is non-deterministic, and there is no way to predict when it will occur at run time. While it is possible to give the JVM a hint in the code to run the garbage collector, by calling the System.gc() or Runtime.gc() methods, but they provide no guarantee that the garbage collector will actually run.