CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O

Similar documents
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

printf( Please enter another number: ); scanf( %d, &num2);

CS113: Lecture 7. Topics: The C Preprocessor. I/O, Streams, Files

Binary Trees

DATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305

Postfix (and prefix) notation

Binary Search Trees Treesort

Lecture 12 CSE July Today we ll cover the things that you still don t know that you need to know in order to do the assignment.

There are many other applications like constructing the expression tree from the postorder expression. I leave you with an idea as how to do it.

CS113: Lecture 5. Topics: Pointers. Pointers and Activation Records

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

CSCI 171 Chapter Outlines

Section 05: Solutions

CS112 Lecture: Working with Numbers

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Intro. Scheme Basics. scm> 5 5. scm>

Variables and literals

Lecture Notes on Memory Layout

These are notes for the third lecture; if statements and loops.

CS112 Lecture: Variables, Expressions, Computation, Constants, Numeric Input-Output

Trees Slow Insertion in an Ordered Array If you re going to be doing a lot of insertions and deletions, an ordered array is a bad choice.

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

March 20/2003 Jayakanth Srinivasan,

Lecture 15 Notes Binary Search Trees

Section 05: Solutions

CS 161 Computer Security

For instance, we can declare an array of five ints like this: int numbers[5];

Lecture 15 Binary Search Trees

Lecture 7. Binary Search Trees and Red-Black Trees

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

Data Structure. IBPS SO (IT- Officer) Exam 2017

CS301 - Data Structures Glossary By

Access Intermediate

Why Use Binary Trees? Data Structures - Binary Trees 1. Trees (Contd.) Trees

DDS Dynamic Search Trees

QUIZ. What is wrong with this code that uses default arguments?

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

COMP 250 Fall priority queues, heaps 1 Nov. 9, 2018

Data Structures - Binary Trees 1

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Lecture Notes on Binary Decision Diagrams

Comp 11 Lectures. Mike Shah. June 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures June 26, / 57

Binghamton University. CS-211 Fall Syntax. What the Compiler needs to understand your program

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Trees and Tree Traversals. Binary Trees. COMP 210: Object-Oriented Programming Lecture Notes 8. Based on notes by Logan Mayfield

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Computer Science 210 Data Structures Siena College Fall Topic Notes: Priority Queues and Heaps

COSC 2P95. Procedural Abstraction. Week 3. Brock University. Brock University (Week 3) Procedural Abstraction 1 / 26

What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read)

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

HW1 due Monday by 9:30am Assignment online, submission details to come

A set of nodes (or vertices) with a single starting point

V.S.B ENGINEERING COLLEGE DEPARTMENT OF INFORMATION TECHNOLOGY I IT-II Semester. Sl.No Subject Name Page No. 1 Programming & Data Structures-I 2

Slide Set 4. for ENCM 335 in Fall Steve Norman, PhD, PEng

Coding Workshop. Learning to Program with an Arduino. Lecture Notes. Programming Introduction Values Assignment Arithmetic.

Visit ::: Original Website For Placement Papers. ::: Data Structure

CS61C : Machine Structures

Binary Trees. Height 1

CSE 373 OCTOBER 11 TH TRAVERSALS AND AVL

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

Discussion 2C Notes (Week 8, February 25) TA: Brian Choi Section Webpage:

Interactive MATLAB use. Often, many steps are needed. Automated data processing is common in Earth science! only good if problem is simple

CSE 12 Abstract Syntax Trees

CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees. Linda Shapiro Spring 2016

Trees. (Trees) Data Structures and Programming Spring / 28

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #33 Pointer Arithmetic

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

12 Abstract Data Types

Representation of Non Negative Integers

QUIZ. Source:

3137 Data Structures and Algorithms in C++

Provided by - Microsoft Placement Paper Technical 2012

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010


CSE373: Data Structures & Algorithms Lecture 9: Priority Queues and Binary Heaps. Nicki Dell Spring 2014

Physics 306 Computing Lab 5: A Little Bit of This, A Little Bit of That

CS 61c: Great Ideas in Computer Architecture

Variables and Data Representation

In the case of the dynamic array, the space must be deallocated when the program is finished with it. free(x);

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents

The tree data structure. Trees COL 106. Amit Kumar Shweta Agrawal. Acknowledgement :Many slides are courtesy Douglas Harder, UWaterloo

CSE100. Advanced Data Structures. Lecture 8. (Based on Paul Kube course materials)

Multiple choice questions. Answer on Scantron Form. 4 points each (100 points) Which is NOT a reasonable conclusion to this sentence:

Data Abstractions. National Chiao Tung University Chun-Jen Tsai 05/23/2012

DS ata Structures Aptitude

QUIZ. What are 3 differences between C and C++ const variables?

Slide Set 4. for ENCM 339 Fall 2017 Section 01. Steve Norman, PhD, PEng

Pointers cause EVERYBODY problems at some time or another. char x[10] or char y[8][10] or char z[9][9][9] etc.

(Refer Slide Time: 06:01)

Lecture Notes 16 - Trees CSS 501 Data Structures and Object-Oriented Programming Professor Clark F. Olson

Access Intermediate

File System Interface. ICS332 Operating Systems

Principles of Computer Science

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Transcription:

CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O Introduction The WRITE and READ ADT Operations Case Studies: Arrays Strings Binary Trees Binary Search Trees Unordered Search Trees Page 1

Introduction So far, we ve focused on how data structures can be represented by C constructs (structures, pointers, etc.) in memory. Today we will explore issues related to representations that can be used to store these data structures (or send them over a network) in such a way that we can recover them later. Note that we are not concerned with any other kind of operation. We will not explore issues such as how a binary tree can be implemented and manipulated as a sequence of disk blocks (this is interesting, but a separate topic) Page 2

The Write and Read ADT Operations Our goal is to add WRITE and READ operations to each ADT. WRITE - given an instance of an ADT, produce a FILE that represents that instance. READ - given a FILE produced by WRITE, construct a new instance of the ADT which is functionally equivalent to the original instance. Some ADT Write FILE Read Note: A FILE is an ADT that supports CREATE, DESTROY, READ and WRITE operations of its own. It can be implemented by an array of bits, disk file, backup device, network connection, etc. For our purposes, we ll use C s "stdio" library as our implementation of the FILE ADT. Page 3

Why Bother with Read and Write? WRITE and READ may be separated by time or events that the original data structure used to implement the ADT could not survive. WRITE and READ may be separated by space, i.e. be on different hosts. WRITE and READ may be separated by representational issues: different hardware or software architectures. What is meant by functionally identical? In this context, if X and Y are instances of the same ADT, X is functionally identical to Y if and only if all of the ADT operations of X produce the same results for Y. This doesn t mean that X and Y have to be implemented in precisely the same way! For example, pointer fields in X and Y are unlikely to contain the same addresses. In our examples, we will use the same implementation for X and Y - the READ operations will always reconstruct the same structure as the WRITE operations were performed on. Remember that our goal is to preserve ADT properties - avoid unrelated implementation details. Page 4

Case Study - Arrays Fixed-length "auto" arrays in C are easy to deal with: they are stored in contiguous memory, so if we know the address of the 0 th element in the array, the length of the array (in however many dimensions) and the type of each element, then a single fwrite will write it and a single fread will read it. Even if we re not so sure about the location of each element, we can simply write/read them one-at-a-time. Assumption - reader and writer implicitly know the size and type of the array. Page 5

Arrays of Unknown Size We can deal with arrays of unknown size by recording the size of the array (its length in each dimension) before the data. The information about the array is sometimes referred to as meta-data: data about the data. More commonly, this information is called the header of the data. Since we do not know how long the array data is before we read in the lengths, the only known location in the data is at the front. Therefore, it is easiest to store the length information at the beginning (or head) of the file. Later, we will see techniques that would allow us to put the header practically anywhere in the file - but this is an unnecessary complication for a simple structure like an array. For a 2-D array: R = number of rows C = number of columns R C data (Rows * Cols) data elements Assumption - reader and writer implicitly know the number of dimensions and type of each element. Page 6

Arrays of Unknown Dimension To remove the assumption that we already know the number of dimensions, we can also record this at the start of the file. Two methods: Method 1: The first value in the file is the number of dimensions. Method 2: The extents are listed, one by one, until a sentinel (i.e. -1, which is an illegal extent) is seen. The first method is easier to program, in general, and easier is better! For a 4x5x6 array: Method #1 3 4 5 6 data Method #2 4 5 6-1 data There is a third method that adds the number of dimensions in such a way that it usually requires no extra space, but it s a hack. Assumption - reader and writer implicitly know the type of each element. Page 7

Arrays of Unknown Type What kind of thing is each element in the array? Is it an int (and if so, how many bytes does it use - and how many bits per byte, what byte order, what representation, etc.) These are very important considerations when transferring information between machines. Q: But now we have a chicken-and-egg problem - how can the writer tell the reader what the reader needs to know in order to read what the writer has written, without reading this info? A: The reader and writer must agree on a canonical form to represent this info, or represent their data in canonical form! As the number of distinct representations dwindle, the canonical representation approach is getting more popular and practical. Examples: Java - defines all basic types completely WWW - HTML, JPEG, GIF, etc. Page 8

Data Descriptions However, it is still necessary to include some info about the type, so the reader can decipher it. type info data "What is it?" "I don t know, but it came with instructions!" Note: Use the header type info to describe as much as you can! And once described, don t repeat yourself... BAD: GOOD: "Here are 5 things. The first is a 32-bit integer, big-endian, two s complement notation. The second is a 32-bit integer, big-endian, two s..." "All numbers in this file are 32-bit integer, bigendian, two s complement notation..." "Here are 5 numbers." Page 9

Case Study - Strings It s easy to visualize a FILE as an array, but this is not always the most convenient way of considering the data. Another way of visualizing a FILE is as a string of bytes (or even as a string of bits). Unlike most of the examples of data structures that we have explored, which contain only keys, in real use the data stored in our data structures consist of records consisting of several fields. For this case study, we will assume that each of the fields is an ASCII string of unknown length. We could manage this by treating each string as an array, and using the methods we have already explored. Instead, we ll explore another approach - treating each record as a string itself, storing the information about the record inside the string itself. We will use a delimiter to represent the boundary between each field in a record (and possibly a different delimiter to represent the boundary between records). A delimiter must be a string that is distinct from any substring in the data (so we won t get confused about whether a particular substring is a delimiter, or part of the data). If the data is unrestricted (and might actually contain the delimiter itself), this can be challenging, but we will see how this can be done. Page 10

Delimited Strings I am using the convention of using a colon : as the delimiter. Any character (or string) could be used instead. (Colons are often used as delimiters in UNIX.) Schmoe:Joe:555-0246 \n Smith:Jane:555-1234 \n Smith:Joe:555-5678 \n Field delimiters special end-of-field char Record delimiter end-of-record char Contrast with a "header"-based representation: 6 3 8 SchmoeJoe555-0256 5 4 8 SmithJane555-1234 5 3 8 SmithJoe55 5-5678 headers Both represent the same info, but the string representation seems more "natural" in many situations. Page 11

Escaped Delimited Strings What do we do if the delimiter (or any of the delimiters, if there is more than one) appears in the data? Add an "escape" character that overrides the meaning of the delimiter. If a delimiter immediately follows the escape, then the delimiter is treated as an ordinary character. Note - I m using the convention of using a backslash as the escape character. Any character could be used instead. Examples abc:def abc\:def fields "abc", "def" field "abc:def" Q: So what do we do if the escape character appears in the data - especially immediately prior to the end of a field so that it is immediately followed by a delimiter)? A: Q: Are there any special cases where this will cause trouble? A: Note that in the worst case, if we pick a bad escape character, the escaped string could be twice as long as the original. Page 12

Annotated Strings Annotations are a combination of strings and what we ve done with arrays - they are a way to embed header-like information in the midst of a string (with this header itself possibly being a string of unknown length). Example: HTML tags (the "markups") <IMG SRC = "foobar.gif" ALT = "foo"> start delimiter end delimiter <H1> </H1> The escape is & However: The escaped form of < is < The escaped form of > is > The escaped form of & is & They chose not to escape the escape in the manner we have learned. Not very pretty. The HTML designers combined the escape mechanism with the ability to represent non-standard characters (by allowing multi-character representations of single characters, an enormous number of characters can be represented). Page 13

More Complicated Data Structures Using the mechanisms we have already seen, we can generalize to handle a number of other data structures: Lists, arrays, stacks, queues In these ADTs, order matters - the purpose of these ADTs is to represent an ordering of the elements. The details of the implementation (as an array, linked list, etc), are completely unimportant. Hash tables WRITE as if it was an array (or a delimited string, depending on data type considerations). READ must reconstruct the original ordering, using whatever implementation it prefers. Inclusion/Exclusion important - the only thing that matters is whether a key/value pair is present in the table, not where in the table it is. If we want to preserve the location of elements in the table, then we must record the hash function as well! WRITE - store key/value pairs as an array. READ - reconstruct table as needed directly from key/value pairs. (If need be, we can reconstruct a completely identical table.) Page 14

Writing and Reading Binary Trees We have seen several kinds of binary trees - expression trees, binary search trees, decision trees... What we need to preserve when we WRITE each of these is slightly different - what we need to preserve depends on what the tree represents. Binary Search Trees set of keys/values "balance" - if the tree is balanced, we may want to preserve this balance (or we might not care, as long as the resulting tree is as balanced). structure may be important, so if possible we will preserve it. Expression/Animal Trees the structure of the tree is absolutely essential! We know we can construct every data structure we ve seen in terms of these data structures, so once we have these we ll be ready for anything! Page 15

Binary Search Trees Let s imagine that we want to WRITE the contents of a tree in such a way that we get exactly the same tree when we READ it and reconstruct. Q: How can we do this without suffering too much? What tools do we already have? A: Consider the way that the ordinary TREEINSERT routine constructs the tree. When we construct a tree using ordinary TREEINSERT, we always add new nodes as leaves. In fact: The first node inserted is the root. The node s ancestors are always added to the tree before the node. The node s descendants are always added after the node. So, if we find a traversal that we can use to write out each node of a tree after its ancestors and before its descendants, we can just read in the keys, insert them into a new tree, and get back precisely the same tree! Q: What traversals have we seen that have this property? A: Q: Given your experiences with these traversals, which would you pick for your implementation? A: Page 16

Implementations of Tree Write and Read void treewrite (tree_t *tree, FILE *fout) { int rc; if (tree == NULL) return ; } /* Write the key (and value, if any). */ rc = fwrite (&tree->key, sizeof (int), 1, fout); assert (rc == 1); treewrite (tree->left, fout); treewrite (tree->right, fout); return ; void treeread (FILE *fin) { tree_t *tree = NULL; int key, rc; } for (;;) { /* Read the key (and value, if any). */ rc = fread (&key, sizeof (int), 1, fin); if (rc!= 1) break; tree = treeinsert (tree, key); } return (tree); Q: What is the big-o of treewrite? A: Q: What is the big-o of treeread? A: Page 17

Unordered Binary Trees - Three Methods With unordered binary trees (i.e. animal trees or expression trees), there is essential information in the structure of the tree itself. Typically, the values or keys stored in a node do not determine the location of the node in the tree - that information was determined by some other mechanism - perhaps a mechanism we (as writers and readers) don t even know. Method 1: If we do know the mechanism, then we can "record" the process of building the tree (or just the input to the process). For example, why store an expression tree when we can simply store the expression, if building the tree from an expression is easy? Page 18

Unordered Binary Trees - Method 2 Method 2: We can record the structure of the tree explicitly, as a string: A binary tree is either: A leaf - record the key/value of the leaf. An internal node (one or two non-null children) - This can be represented by a delimiter, the key/ value of the node, and then the representation of the children, followed by another delimiter, i.e. start end B A C ( B A C ) children We can use a special token to represent a NULL child. For this example, I ll use the string nil. Page 19

Unordered Binary Trees (ctd 2) Writing this is just another preorder walk: if (node == NULL) printf ("nil"); else if (!haschildren (node)) printf ("%s", node->key); else { printf ("(%s ", node->key); treewrite (node->left); printf (" "); treewrite (node->right); printf (")"); } Strictly speaking, the final ) is not necessary, but it can serve a useful purpose as a check. This "looks" nice (especially for expression trees, if you enjoy looking at prefix notation) / + 2 (/ (+ 12 (* 6 7))2) 12 * 7 6 Do not forget that this format was chosen for convenience, not because of necessity. The essential property of this format is that it can be created easily by a preorder walk, using delimiters to mark whether nodes are internal or external. Page 20

Unordered Binary Trees - Animal Trees Let s consider another example: animal trees - the same creatures you met in assignment 3. The algorithm is fundamentally the same as for the previous example, but the format is changed to make the result (perhaps) more human-readable: void anitreewrite (tree_t *tree, FILE *fout) { if (tree == NULL) return ; else if ((tree->yes == NULL) && (tree->no == NULL)) { fprintf (fout, "%c%s\n", ANIMAL_NODE_CHAR, tree->animal); } else { fprintf (fout, "%c%s\n", QUESTION_NODE_CHAR, tree->question); anitreewrite (tree->yes, fout); anitreewrite (tree->no, fout); } } Q: What is the big-o of anitreewrite? A: Page 21

Reading Animal Trees tree_t *anitreeread (FILE *fin) { tree_t *node = aninodecreate (NULL, NULL, NULL, NULL); char type = getchar (fin); /* get the type */ char *text = getline (fin); /* read a line... */ } switch (type) { case ANIMAL_NODE_CHAR: node->animal = text; break; case QUESTION_NODE_CHAR : node->question = text; node->yes = anitreeread (fin); node->no = anitreeread (fin); break } return (node); Q: What is the big-oh of anitreeread? A: Page 22

Unordered Binary Trees - Method 3 Method 3: Add a new field to each node which is used only by the WRITE and READ operations. (During all of the other tree operations, this field is completely ignored.) This field is used to store the inorder position of the node. (As an implementation detail, this field need not actually reside in the tree, and only needs to exist during the WRITE and READ operations. Conceptually, it s easier to think of it as always residing in each node.) WRITE READ Do an inorder traversal, filling in the position field of each node. Write the tree, using a preorder traversal (just as with a binary search tree). Along with the key/ value of each node, record its inorder position. Same as an ordinary binary search tree. INSERT each key/value using its inorder position as the key (instead of the real key). Requires extra storage, and is O(n log n) (best case), O(n 2 ) in the worst case, but is easy to implement. Page 23