Assignment: 7. Due: Language level: Allowed recursion:

Assignment: 7 Due: Language level: Allowed recursion: CS 135 Winter 2018 Graham, Nijjar Tuesday, March 13th, 2018 9:00pm Beginning Student with List Abbreviations Pure Structural and Structural Recursion with an Accumulator, unless otherwise specified filedir.rkt, binencoding.rkt, encodingbonus.rkt Files to submit: Warmup exercises: HtDP 14.2.1, 14.2.2, 15.1.1, 15.1.3, 18.1.1, 18.1.2, 18.1.3, 18.1.4, 18.1.5 Practice exercises: HtDP 14.2.3, 14.2.4, 14.2.6, 15.1.2, 15.1.4, 18.1.5 Make sure you read the OFFICIAL A07 post on Piazza for the answers to frequently asked questions. Unless stated otherwise, all policies from Assignment 06 carry forward. You may use the following list functions and constants: cons, cons?, empty, empty?, first, second, third, rest, list, append, length, string->list, list->string. You may not use reverse unless specified in the question. Some correctness marks will be allocated for avoiding exponential blowups, as demonstrated by the max-list example in slide 07-6. (But note that not every repeated function invocation leads to an exponential blowup.) You may not use local in this assignment. Sorry. 1. Note: For this question, you must use the provided a07lib.rkt file, which includes the structure definitions, data definitions, some sample data, and some functions to display data. To make use of these definitions and functions, you must place the a07lib.rkt file in the same directory (folder) as your assignment file for Question 1. At the beginning of your filedir.rkt, include the following: (require "a07lib.rkt") As a test, in your assignment file, try using the function (fs-print sample-fs) in a file that you have saved, which uses the provided print function to display the information in the Dir that has been consumed. You may reuse the examples provided in the question specifications below or our sample data from a07lib.rkt, but you should ensure you have an appropriate number of examples and tests. The structure definitions for a dir and a filedata are defined in the a07lib.rkt file so you must not include them in your file. It will cause an error if you try to include a structure CS 135 Winter 2018 Assignment 7 1

definition using define-struct in your own file, as they are already defined in a07lib.rkt. If you want to include the structure definitions in your file for easy reference, be sure to have them commented out in your code. For this question you will we working with files and directories similar to those you saw in A05. However, in this case the directory structure will be represented by a general tree. We call this tree a file system. The file data structure no longer requires a path as one of its fields, since the path of the file will be determined by its location within the tree. We will use the following data definitions for representing a file system (file tree) on a computer. Note that file names and directory names may be repeated in multiple places within the tree, but pathnames must be unique. A name of a directory may be the same as the name of a file. Do not include the define-struct in your file as code (comment it out, see above note): (define-struct dir (name contents)) ;; A Dir is a (make-dir Str FDList) ;; An FDList is one of: ;; * empty ;; * (cons File/Dir FDList) ;; requires: full names of files (name and format) within a ;; given directory must be unique ;; names of directories within a given directory must be unique ;; A File/Dir is one of: ;; * FileData ;; * Dir (define-struct filedata (name format size)) ;; A FileData is a (make-filedata Str Sym Num) ;; requires: ;; name is a non-empty string ;; that contains only alphanumeric characters ;; format is one of 'doc, 'txt, 'rtf, 'zip, ;; 'raw, 'jpg, 'wav, 'mp3 ;; size > 0, and represents the size of the file in kilobytes The following constant definitions are used in the question specifications below. (define file1 (make-filedata "oldfile" txt 1)) (define file2 (make-filedata "newfile" doc 10)) (define dir0 (make-dir "emptydir" empty)) (define dir1 (make-dir "onefile" (list file1))) (define dir1a (make-dir "onefilea" (list file1))) (define dir1b (make-dir "onefileb" (list file1))) (define dir2 (make-dir "twofiles" (list file1 file2))) (define dir2b (make-dir "twofiles" (list file1))) (define dir3 (make-dir "onesies" (list dir1 dir1a dir1b))) (define fs1 (make-dir "rootdir" (list file1 dir0 dir1 dir2 dir3 file2))) (define fs2 (make-dir "u" (list file2 dir0 dir1))) Place your solutions in filedir.rkt. CS 135 Winter 2018 Assignment 7 2

(a) Provide templates for FileData, Dir, File/Dir and FDList. (b) Write a function count-files that consumes a Dir and produces the total number of files in the entire file system tree. For example, (count-files dir0) produces 0, (count-files dir3) produces 3, and (count-files fs1) produces 8. (c) A directory is empty if it does not contain any sub-directory or file. Write a function empty-dir-exist? that consumes a Dir and produces true if an empty directory exists anywhere in the tree and produces false otherwise. (d) Write a function largest-filesize that consumes a Dir and produces the size of the largest FileData in the in the entire tree, or zero if there are no files. (e) Every file and directory in a file system has a hierarchical name. For a target file or directory fd, the hierarchical name consists of the names of all the ancestor directories one must traverse to reach fd, with the names separated by slashes (/). There is also a slash at the beginning of every file s hierarchical name. Write a function list-file-paths that consumes a Dir and produces a (listof Str) that contains the hierarchical names for all of the files in the entire file system. The function should produce the files hierarchical names in the same order as generated by the provided fs-print function. You may use string-append in this question. For example, (list-file-paths fs2) produces (list "/u/newfile" "/u/onefile/oldfile"). (f) Write a function backup-fs that consumes a Dir and a Sym representing a file s format and produces a Dir containing only the files whose format matches the consumed symbol. Note that some of the directories of the file system will be empty if they only contained files that did not match the format being being backed up. The backup Dir should have the same directory structure as the consumed Dir except any directory that contains no files and no subdirectories should be removed. If there are no files to be backed up, the backup Dir should have an empty FDList. For example, (backup-fs fs1 txt) will produce (make-dir "rootdir"(list file1 dir1 dir2b dir3)). 2. All information stored in a computer, characters, pixels, sound, etc., is represented by a sequence of zeros and ones, known as a binary code. Each zero or one is known as a bit. There are many different encoding schemes used by computer applications to represent data. In some applications, the binary codes for different data values are all the same length. ASCII is an example of this type of encoding, where each character on a standard English keyboard is represented by an 8-bit code. For example an A is represented by the ASCII code 01000001. Other applications may use encoding schemes where the codes for the data are different lengths. For example, you might represent an A with the code 1011 and represent a B with the code 01110. This question will have you working with binary encodings of characters where the codes have variable lengths. CS 135 Winter 2018 Assignment 7 3

One way to assign variable length, binary codes to characters is to use a binary, leaf-labelled tree. The leaves of the tree contain characters you want to represent. A binary encoding of a character is determined by following a path from the root of the tree to the leaf that contains that character. As you follow the path, when you take a left branch that represents a 0 in the code, and when you take a right branch that represents a 1 in the code. For example, consider the binary tree that has the leaves with values #\A, #\B, #\C, and #\space. #\space #\C #\A #\B Given this tree, the character encodings are as follows: Character Encoding #\A 100 #\B 101 #\C 11 #\space 0 For this question you will work with the following data definitions: ;; An EncodingTree is a tree with at least two leaf nodes, and is one of: ;; * a Char ;; * (list EncodingTree EncodingTree) ;; In addition, an EncodingTree has the following properties: ;; * the internal nodes of an EncodingTree always have exactly ;; two children ;; * the leaf node values are unique, ;; i.e. there will be no duplicate values in the tree. ;; A BinaryCode is a non-empty string that contains ;; only the characters #\0 and #\1. A starter file (binencoding.rkt) has been provided for you that includes the data definitions described above, as well as some constants that may be used for some of your tests and examples. It is a good idea to draw the trees that are represented by the constants, on paper, so it is easier to predict the expected values produced by your functions as you test them. CS 135 Winter 2018 Assignment 7 4

Place your solutions in binencoding.rkt. (a) Write a function called max-code-length that consumes a EncodingTree and produces a Nat which is the length of the longest code for any of the characters in the EncodingTree. For example (max-code-length big-nlt) produces 5, and (max-code-length (list #\q #\z)) produces 1. (b) Write a function called bincode->char that consumes a BinaryCode and an EncodingTree in that order. The function should produce the Char that is determined by the BinaryCode value and the EncodingTree. If the path represented by the BinaryCode does not exist in the EncodingTree, or following the path indicated by the code ends at an internal node of the tree, the function should produce the symbol Error. For example, if you were using the tree in the diagram above, bincode->char for the code "100" would produce #\A, but bincode->char for the code "10" would produce Error, and bincode->char for the code "01" would produce Error. (c) Write a function called decode-msg that consumes a BinaryCode and an EncodingTree in that order. The function should produce either a string that is formed by the characters represented in the BinaryCode, or should produce the symbol Error if it is not possible to decode the message with the given EncodingTree. For example (decode-msg big-binary-code big-nlt) would produce "The cat in the hat.", and (decode-msg "101101" 7-char-nlt) would produce Error. Notes: This solution may be done using generative recursion. It can be completed with a single, relatively small (< 20 lines) helper function with a carefully selected set of parameters. If you are finding yourself writing many and/or large functions to solve this problem, then you should rethink your approach. The function bincode->char may not be directly useful as a helper function here. However, a modified version could be very useful. (d) Write a function called encode-msg that consumes a Str and an EncodingTree in that order. The function should produce a BinaryCode that is formed by concatenating the BinaryCode values of the individual characters of the consumed Str. You may assume that all of the characters in the Str appear in the EncodingTree. If the string consumed is empty, the function should produce the empty string. For example, (encode-msg "The cat in the hat." big-nlt) produces big-binary-code. This concludes the list of questions for which you need to submit solutions. As always, do not forget to check your email for the basic test results after making a submission. 2% Bonus: For this question place your solution in the file encodingbonus.rkt. This question is an extension of question 2 on the assignment. You may use any of the code you have written for that question in your solution for the bonus question. CS 135 Winter 2018 Assignment 7 5

The encoding scheme described in question 2, can be used to compress files. For example, suppose we had a file that contained 10 different characters, and we were using a fixed-length binary code to represent those characters. The code length would have to be at least four bits (a 0 or 1) to generate enough unique codes to represent the different values in the file. Note that there are 16 different 4-bit codes. Now suppose that in that file, one of the characters appears 60 times, another character appears 32 times, and the other eight characters each appear once. Using the 4-bit encoding, this would mean we need 100*4, or 400 bits in total to represent all of the characters in the file. However, if we assigned a 1-bit code to the most frequent character in the file, a 2-bit code to the second most frequent character, and a 5-bit code to each of the other characters, we would have a total of (1*60) + (2*32) + (5*8), or 164 bits to represent all of the characters in the file. A compression ratio is the ratio of the total number of bits required to encode the entire file using the variable-length coding scheme, divided by the total number of bits required using the fixed-length encoding scheme. In the example above the compression ratio would be 164/400, or 0.41. A smaller number means better compression. Note that you can compress files that contain data other than characters. For example JPEG is a way of compressing digital image files, and MP3 is a way of compressing sound files. You could use an EncodingTree to generate binary codes for any type of data, not just characters. To create these codes, the different data values in the file (for example the colour of a pixel) would appear as the values in the leaf nodes of the tree. Write a function called compression-ratio that consumes an EncodingTree and a non-empty association list in that order. The keys in the association list are the same type as the data found in the leaves of the EncodingTree, and each associated value in the list is a Nat greater than 0. In the association list, the key represents a unique data value that occurs in a file, and the Nat is the frequency (i.e. number of occurrences) of that data value in the file. The function compression-ratio will produce the compression ratio that is achieved by taking the total number of bits required by the variable length codes generated by the EncodingTree to represent all of the data in the file, divided by the total number of bits required by a fixed-length encoding scheme on the same data. The minimum number of bits required for a single, fixed-length code will depend on the number of different data values that appear in the file. You will need to figure out how to compute this number based on the values consumed by the function. You may assume that all keys in the association list appear in the EncodingTree. Enhancements: Reminder enhancements are for your interest and are not to be handed in. The material below first explores the implications of the fact that Racket programs can be viewed as Racket data, before reaching back seventy years to work which is at the root of both the Scheme language and of computer science itself. The text introduces structures as a gentle way to talk about aggregated data, but anything that can be done with structures can also be done with lists. Section 14.4 of HtDP introduces a representation CS 135 Winter 2018 Assignment 7 6

of Scheme expressions using structures, so that the expression (+ ( 3 3) ( 4 4)) is represented as (make-add (make-mul 3 3) (make-mul 4 4)) But, as discussed in lecture, we can just represent it as the hierarchical list (+ ( 3 3) ( 4 4)). Scheme even provides a built-in function eval which will interpret such a list as a Scheme expression and evaluate it. Thus a Scheme program can construct another Scheme program on the fly, and run it. This is a very powerful (and consequently somewhat dangerous) technique. Sections 14.4 and 17.7 of HtDP give a bit of a hint as to how eval might work, but the development is more awkward because nested structures are not as flexible as hierarchical lists. Here we will use the list representation of Scheme expressions instead. In lecture, we saw how to implement eval for expression trees, which only contain operators such as +,,,/, and do not use constants. Continuing along this line of development, we consider the process of substituting a value for a constant in an expression. For instance, we might substitute the value 3 for x in the expression (+ ( x x) ( y y)) and get the expression (+ ( 3 3) ( y y)). Write the function subst which consumes a symbol (representing a constant), a number (representing its value), and the list representation of a Scheme expression. It should produce the resulting expression. Our next step is to handle function definitions. A function definition can also be represented as a hierarchical list, since it is just a Scheme expression. Write the function interpret-withone-def which consumes the list representation of an argument (a Scheme expression) and the list representation of a function definition. It evaluates the argument, substitutes the value for the function parameter in the function s body, and then evaluates the resulting expression using recursion. This last step is necessary because the function being interpreted may itself be recursive. The next step would be to extend what you have done to the case of multiple function definitions and functions with multiple parameters. You can take this as far as you want; if you follow this path beyond what we ve suggested, you ll end up writing a complete interpreter for Scheme (what you ve learned of it so far, that is) in Scheme. This is treated at length in Section 4 of the classic textbook Structure and Interpretation of Computer Programs, which you can read on the Web in its entirety at http://mitpress.mit.edu/sicp/. So we ll stop making suggestions in this direction and turn to something completely different, namely one of the greatest ideas of computer science. Consider the following function definition, which doesn t correspond to any of our design recipes, but is nonetheless syntactically valid: (define (eternity x) (eternity x)) CS 135 Winter 2018 Assignment 7 7

Think about what happens when we try to evaluate (eternity 1) according to the semantics we learned for Scheme. The evaluation never terminates. If an evaluation does eventually stop (as is the case for every other evaluation you will see in this course), we say that it halts. The non-halting evaluation above can easily be detected, as there is no base case in the body of the function eternity. Sometimes non-halting evaluations are more subtle. We d like to be able to write a function halting?, which consumes the list representation of the definition of a function with one parameter, and something meant to be an argument for that function. It produces true if and only if the evaluation of that function with that argument halts. Of course, we want an application of halting? itself to always halt, for any arguments it is provided. This doesn t look easy, but in fact it is provably impossible. Suppose someone provided us with code for halting?. Consider the following function of one argument: (define (diagonal x) (cond [(halting? x x) (eternity 1)] [else true])) What happens when we evaluate an application of diagonal to a list representation of its own definition? Show that if this evaluation halts, then we can show that halting? does not work correctly for all arguments. Show that if this evaluation does not halt, we can draw the same conclusion. As a result, there is no way to write correct code for halting?. This is the celebrated halting problem, which is often cited as the first function proved (by Alan Turing in 1936) to be mathematically definable but uncomputable. However, while this is the simplest and most influential proof of this type, and a major result in computer science, Turing learned after discovering it that a few months earlier someone else had shown another function to be uncomputable. That someone was Alonzo Church, about whom we ll hear more shortly. For a real challenge, definitively answer the question posed at the end of Exercise 20.1.3 of the text, with the interpretation that function=? consumes two lists representing the code for the two functions. This is the situation Church considered in his proof. CS 135 Winter 2018 Assignment 7 8