Linked Lists and Abstract Data A brief comparison 24 March 2011
Outline 1 2 3 4
Data Data structures are a key idea in programming It s just as important how you store the data as it is what you do to it A firm grasp on these structures aids in efficient coding Critical to more complex algorithms
Some more All data is stored as bits As you will see, you can impose some clever structures (patterns) as to how you interpret them to make things easier (or harder) The key is that these data structures are custom, and abstracted: define ways of extracting, adding and removing elements without worrying about implementation Abstract Data We will be looking at: Arrays, Linked Lists, Priority Queues, Hash Tables, and Sorted Sets Concept, Analysis, Application
Outline 1 2 3 4
Arrays: Concept Can be thought of as many individual variables in memory These are stored in some long list Each one has a unique index Computers count from 0 A key component in nearly all languages In general, there are no restrictions on what each element can be That is, nest arbitrarily Index 0 1 2 3... Element e 0 e 1 e 2 e 3...
Analysis We know the start of an array, and if we know the size of each element (eg: IEEE 32bit int), we can directly get to each memory location Thus, retrieving/setting at an index is O(1) What about resizing? If the array has more allocated space than it currently uses: Adding to the end is O(1) Anywhere else is O(n). Why is this the case? Determining if something is in the array is O(n)
Application Keep track of many, related variables Manipulate data that is not likely to grow or shrink too much Fast access for read and write is necessary (constant time, in essence) Allows iterative application of some operation to a whole range of variables without calling it on each one explicitly
Singly-linked List: Concept In essence, one long chain comprised of nodes Each node has some data stored, and points to the next node in the list You have a variable that points to the first node, and thus the entire list is accessed through it The last node does not point to anything of interest Head list Tail null
Retrieving Clearly, the only item we can get to directly is the first one Thus, to get to others we need to walk the list We create an iterator, and assign it to point to the first value. Then, if we want the next value, iterator iterator.next This can continue until iterator.next = null However, we can only do this in one direction be sure not to miss something the first time
Inserting Consider adding a new node, C into the singly linked list A B D null between B and D Seeing as there is only one link in each node, we only need to ensure that the thing before this new item points to it That means: C.next B.next, and B.next C (patch it in) Does order matter? Why? It doesn t matter where the node was in memory Adding to the end (tail) is easiest
Deleting Out of sight, out of memory. We depend on garbage collectors to delete items most often (can explicitly deallocate) Consider deleting the node B in A B C null Only one node points to it That means: A.next A.next.next ( skip ). Now, either our garbage collector or programmer will remove it from memory Clearly, deleting from the front (head) is the easiest
Analysis Deleting the first node and adding to the front or end: O(1) Otherwise its O(n) to insert and delete Retrieving the first and last nodes: O(n) Otherwise it is O(n) (need to traverse the list) Therefore, you can treat this as a queue or a stack and get read and write in O(1)! Determining if something is in the list is O(n) *
Applications Good way to simulate stacks and many other abstract data structures (stacks, queues, associative arrays... ) Thus can be used as a buffer Simulate sequential access devices Represents a hierarchical relationship well (with objects) Limited by one-way-ness
Implementation How would we implement this with what we already know (arrays)? Recall arrays can nest arbitrarily Recall that a node has some data and points to another node Consider A B C B, C = null A = [ data_a, B ] B = [ data_b, C ] C = [ data_c, null] --OR in general SLL=[dat1, [dat2, [dat3, [... datn, null]...]
Doubly-linked List: Concept Almost entirely like SLL However, each node also has a prev pointer This makes it far easier to traverse the list
Analysis Clearly, accessing the first and last elements is still O(1) Inserting at the end and front, and deleting from front are still O(1) However, now deleting from end is O(1) Traversing is still O(n) Deleting or inserting elsewhere is also O(n) (technically Θ(1)+search time, but this is O(n) for our purposes)
Stacks and Queues: Concept A stack is a data structure that allows you to only stack things on top of each other Push and pop only top element, FILO Used everywhere, compilers, interpreters, CPU... A queue is similar in concept to those queues you have seen Enqueue at back, dequeue at front, FIFO Used mostly as buffers
Priority queues: Concept Related to a queue, save that when an element is added, it is added with some priority So when dequeueing/popping, the element with the next highest priority is extracted (VIP service so to speak) We can now see that stacks and queues are just specific cases of priority queues Queue: priority is monotonically decreasing (last in has lowest priority) Stack: priority is monotonically increasing (last in has highest priority)
Analysis Performance is implementation specific Type Insert Delete Access Unsorted O(1) O(n) O(n) Sorted list O(n) O(1) {+ O(n)} O(1) {+ O(n)} Heap O(log n) O(log n) O(log n)
Hash-Table: Concept Arrays are indexed by position What if we want to store (Key,Value) pairs? Hash-tables provide the capacity to store a value of some type by referencing an associated key of some type It is, in essence, an array with a function that transforms the key to a (mostly) unique index In Python, we have dictionary and in Java it s called a Hashmap
Analysis Assuming the hash function (key index) takes constant time Inserting a new element (anywhere, so to speak) is O(n) Accessing an element from any key is O(1) Deleting an element ( anywhere ) is O(1) Determining whether any key is in the table is also O(1) Pointless to traverse items (by this I mean non-trivial)
Applications Whenever data needs to be indexed by a non-integral key That is, anything that relates some data to some other data Transformation tables Spell checking
Ordered Set: Concept Again analogous to an array We need to keep some set of data in some strict total ordering Through magic and clever structures, whenever some data is added to the set, it is placed in the correct location This makes getting data out also greatly simplified in some cases
Analysis To insert an element, we get O(log n) Deletion also is also of this order In fact, so is accessing an element And so is determining whether or not an element is in the set
Applications Whenever we need to maintain an ordered list of things Simplify queries on, and insertions and deletions of ordered data
Outline 1 2 3 4
Access, Insert, Delete, N.I.D DS Access Insert Delete Not In Data Array O(1) O(n) O(n) O(n) (S/D) Linked List O(n) * O(n) * O(n) * O(n) Hash-Table O(1) O(n) O(1) O(1) Ordered Set O(log n) O(log n) O(log n) O(log n) Priority Queue O(log n) O(log n) O(log n) O(log n)
Outline 1 2 3 4
Input/Output I have given you plenty of input Now is the best time of the lecture (no, not the end) - I get to ask you things... get your output Based on the previous slide, and your opions: What would you use for... and why? (Software) Interrupt Request storage? Represent the vertices of a 3D polygon? Keeping track of prime numbers for fast factorisation? Function and argument appliaction? (think RPN) Retrieving the color of a sprite depending on its shape? Emulating an old tape-drive? Calculating stats on class test marks?