C Programming Pointer Accesses to Memory and Bitwise Manipulation This assignment consists of two parts, the second extending the solution to the first. Q1 [80%] Accessing Data in Memory Here is a hexdump of a memory region containing a scrambled quotation: 00000000 34 00 0f 3a 00 69 6e 64 69 66 66 65 72 65 6e 63 4..:.indifferenc 00000010 65 05 3f 00 62 65 05 47 00 62 79 06 02 00 66 6f e.?.be.g.by...fo 00000020 72 0a 68 00 70 65 6e 61 6c 74 79 09 74 00 70 75 r.h.penalty.t.pu 00000030 62 6c 69 63 06 21 00 54 68 65 05 2b 00 74 6f 08 blic.!.the.+.to. 00000040 16 00 72 75 6c 65 64 07 4e 00 65 76 69 6c 07 55..ruled.N.evil.U 00000050 00 6d 65 6e 2e 05 5a 00 2d 2d 08 00 00 50 6c 61.men..Z.--...Pla 00000060 74 6f 06 83 00 6d 65 6e 07 62 00 67 6f 6f 64 05 to...men.b.good. 00000070 11 00 74 6f 0a 7e 00 61 66 66 61 69 72 73 05 6f..to.~.affairs.o 00000080 00 69 73 06 1b 00 70 61 79.is...pay The first two bytes of the memory region contain the offset at which you will begin processing records: 0x0034. This offset of the first record is followed by a sequence of word records, each consisting of a positive integer value, another positive integer value, and a sequence of characters: Length of record Offset of next record Characters in word uint8_t uint16_t chars The first value in each record specifies the total number of bytes in the record. Since words are relatively short, this value will be stored as a uint8_t, which has a range of 0 255. The record length is followed immediately by a uint16_t value specifying the offset of the next word record in the list. This is followed by a sequence of ASCII codes for the characters that make up the word. (The term "word" is used a bit loosely here.) There is no terminator after the final character of the string, so be careful about that. Note that the length of the record depends upon the number of characters in the word, and so these records vary in length. That's one reason we must store the offset for each record. Since I'm using x86 hardware, integer values are stored in memory in little-endian order; that is, the low-order byte is stored first (at the smallest address) and the high-order byte is stored last (at the largest address). So the bytes of a multibyte integer value appear to be reversed. For example, if we have in int32_t variable holding the base-10 value 85147, the corresponding base-16 representation would be 0x14C9B, and the in-memory representation would look like this: or, represented in pure binary: 9B 4C 01 00 low high 10011011 01001100 00000001 00000000 low high The least-significant byte (corresponding to the lowest powers of 2) is stored at the lowest address, and the most-significant byte (corresponding to the highest powers of 2) is stored at the highest address. Version 2.00 This is a purely individual assignment! 1
As a programmer, you usually do not need to take the byte-ordering into account since the compiler will generate machine language compatible with your hardware, and that will make use of the bytes in the appropriate manner. But, when you're reading memory displays, you must take the byte-ordering into account. So, looking at the first two bytes of the memory block, we see that the word record we will process first occurs at relative offset 0x0034 from the beginning of the memory block. Let's consider how to interpret the hexdump shown earlier: 00000000 34 00 0f 3a 00 69 6e 64 69 66 66 65 72 65 6e 63 4..:.indifferenc 00000010 65 05 3f 00 62 65 05 47 00 62 79 06 02 00 66 6f e.?.be.g.by...fo 00000020 72 0a 68 00 70 65 6e 61 6c 74 79 09 74 00 70 75 r.h.penalty.t.pu 00000030 62 6c 69 63 06 21 00 54 68 65 05 2b 00 74 6f 08 blic.!.the.+.to. 00000040 16 00 72 75 6c 65 64 07 4e 00 65 76 69 6c 07 55..ruled.N.evil.U 00000050 00 6d 65 6e 2e 05 5a 00 2d 2d 08 00 00 50 6c 61.men..Z.--...Pla 00000060 74 6f 06 83 00 6d 65 6e 07 62 00 67 6f 6f 64 05 to...men.b.good. 00000070 11 00 74 6f 0a 7e 00 61 66 66 61 69 72 73 05 6f..to.~.affairs.o 00000080 00 69 73 06 1b 00 70 61 79.is...pay The first word record consists of the bytes: 06 21 00 54 68 65 The length of the first record is 0x06 or 6 in base-10, which means that the string is 3 characters long, since the length field occupies 1 byte and the offset of the next record occupies 2 bytes. The ASCII codes are 54 68 65, which represent the characters "The". The offset of the next record is 0x0021. The second word record consists of the bytes: 0a 68 00 70 65 6e 61 6c 74 79 The length is 0x0a (10 in base-10), so the string is 7 characters long (the ASCII codes represent "penalty"), and the next word record is at the offset 0x0068. And so forth... The complete quotation, with word record offsets, is: 0x0037: The 0x0024: penalty 0x006B: good 0x0065: men 0x0086: pay 0x001E: for 0x0005: indifference 0x003D: to 0x002E: public 0x0077: affairs 0x0081: is 0x0072: to 0x0014: be 0x0042: ruled 0x0019: by 0x004A: evil 0x0051: men. 0x0058: -- 0x005D: Plato Version 2.00 This is a purely individual assignment! 2
To indicate the end of the sequence of word records, the final word record specifies that its successor is at an offset of 0, which is invalid (since that's the offset of the pointer to the first word record). For Q1, you will implement a C function that parses a tangled linked list of binary word records in memory, as described above. Your function will process the sequence of word records, and populate an array of struct variables with your interpretation of the original data. You will use the following data type to represent a parsed word record: struct _WordRecord { uint16_t offset; offset at which word record was found in memory char word; malloc'd C-string containing the "word" }; typedef struct _WordRecord WordRecord; You will create one of these struct variables whenever you parse a word record, and place that struct variable into an array supplied by the caller of your function. The function you implement must conform to the following interface specification: / Untangle() parses a chain of records stored in the memory region pointed to by pbuffer, and stores WordRecord objects representing the given data into the array supplied by the caller. Pre: pbuffer points to a region of memory formatted as specified wordlist points to an empty array large enough to hold all the WordRecord object you'll need to create Post: wordlist[0:nwords-1] hold WordRecord objects Returns: the number of "words" found in the supplied quotation. Restrictions: you may not use any array bracket notation in your solution / uint8_t Untangle(const uint8_t pbuffer, WordRecord wordlist); As usual, the tar file that is posted for the assignment contains testing/grading code. In particular, the following files are in the q1 directory: driver.c Untangle.h checkanswer.h checkanswer.o Generator.h Generator.o test driver declarations for specified function declarations for answer-checking and grading function 64-bit Linux binary for checking/grading code declarations for test data generator 64-bit Linux binary for test data generator Create Untangle.c and implement your version of it, then compile it with the files above. Read the header comments in driver.c for instructions on using it. What to Submit You will submit your file Untangle.c to the Curator, via the collection point C06Q1. That file must include any helper functions you have written and called from your version of Untangle(); any such functions must be declared (as static) in the file you submit. You must not include any extraneous code (such as an implementation of main() in that file). Your submission will be graded by running the supplied test/grading code on it. Version 2.00 This is a purely individual assignment! 3
Q2 [20%] Simple Bitwise Encryption Read the posted notes on bitwise operations in C, and the related sections in your C reference. For this question, you will modify the C function you wrote for Q1 so that it processes a tangled list of mildly encrypted binary records in memory, processing them non-sequentially, and produces a simple text report. The function must conform to the following interface specification (which is the same as for Q1): / Untangle() parses a chain of records stored in the memory region pointed to by pbuffer, and stores WordRecord objects representing the given data into the array supplied by the caller. Pre: pbuffer points to a region of memory formatted as specified wordlist points to an empty array large enough to hold all the WordRecord object you'll need to create Post: wordlist[0:nwords-1] hold WordRecord objects Returns: the number of "words" found in the supplied quotation. Restrictions: you may not use any array bracket notation in your solution / uint8_t Untangle(const uint8_t pbuffer, WordRecord wordlist); The memory region pointed to by pbuffer will be formatted in exactly the same way as for Q1, except that the bytes that represent the offset of the next record and the characters in the word will have been "masked": Length of record Masked offset of next record Masked characters in word uint8_t uint16_t chars Each of the ASCII codes has been XORed with the number of bytes in the word. Each byte of the offset field has been XORed with the unmasked first byte in the word. You must "unmask" the masked bytes in order to properly display the quotation. Part of the assignment is for you to determine what operation(s) you can use to perform this unmasking. I will not answer any questions about how to do that, except to say that you should consider the properties of the various bitwise operations available in C. This is a good opportunity for you to discover the value of the Boolean algebra rules covered in Discrete Mathematics. Aside from the issue of unmasking the encrypted bytes, the logic of this question is identical to Q1, so we will not repeat the detailed description given there. However, we will give you an example illustrating what must be done: 00000000 39 00 06 61 69 6a 77 2d 05 20 2d 2f 2f 0c 41 41 9..aijw-. -.AA 00000010 48 7b 60 7a 7d 66 7d 65 6c 05 4f 61 63 6c 0a 08 H{`z}f}el.Oacl.. 00000020 74 73 6f 68 72 60 6f 73 06 1f 74 77 6b 66 0b 5b tsohr`os..twkf.[ 00000030 65 6d 6c 7d 6b 69 7c 6d 6c 05 3e 49 4b 76 07 37 eml}ki ml.>ikv.7 00000040 6d 69 6d 6a 60 05 28 62 60 67 07 34 61 65 66 68 mimj`.(b`g.4aefh 00000050 61 04 7f 61 60 05 2b 74 76 6d 05 31 74 76 6d 0c a..a`.+tvm.1tvm. 00000060 34 65 6c 67 7d 6c 7b 7d 68 60 67 07 1f 6d 69 65 4elg}l{}h`g..mie 00000070 76 6f 05 76 6f 6d 64 05 41 69 6b 71 0a f1 77 70 vo.vomd.aikq..wp 00000080 6e 73 6f 68 72 73 0c 63 61 68 6a 6a 6c 79 7d 60 nsohrs.cahjjly}` 00000090 67 6e gn Version 2.00 This is a purely individual assignment! 4
The first word record begins at offset 0x0039, and contains the bytes: 05 3e 49 4b 76 The length of the word is 0x02. XORing that with each byte of the string yields: 49 74 which represents the character string "It". And, XORing the first byte of the word with the bytes of the offset yields: 77 00 which is the offset of the next record. In this case, the encrypted quotation decodes to: 0x003C: It 0x007A: is 0x002B: the 0x006E: mark 0x0075: of 0x001C: an 0x0031: educated 0x0041: mind 0x005D: to 0x0048: be 0x004D: able 0x0058: to 0x0062: entertain 0x0054: a 0x0021: thought 0x007F: without 0x0089: accepting 0x0005: it. 0x000B: -- 0x0010: Aristotle The q2 directory of the posted tar contains the following files: driver.c Untangle.h checkanswer.h checkanswer.o Generator.h Generator.o test driver declarations for specified function declarations for answer-checking and grading function 64-bit Linux binary for checking/grading code declarations for test data generator 64-bit Linux binary for test data generator The only differences between these files and those for Q1 are in the data generator (which must encrypt the word records as they are created), and in the checking/grading code, which must decrypt those same word records). Create Untangle.c and implement your version of it, then compile it with the files above. Read the header comments in driver.c for instructions on using it. What to Submit You will submit your file Untangle.c to the Curator, via the collection point C06Q2. That file must include any helper functions you have written and called from your version of Untangle(); any such functions must be declared (as static) in the file you submit. You must not include any extraneous code (such as an implementation of main() in that file). Your submission will be graded by running the supplied test/grading code on it. Version 2.00 This is a purely individual assignment! 5
Pledge: Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically, you must include the following pledge statement in the submitted file: On my honor: - I have not discussed the C language code in my program with anyone other than my instructor or the teaching assistants assigned to this course. - I have not used C language code obtained from another student, or any other unauthorized source, either modified or unmodified. - If any C language code or documentation used in my program was obtained from an allowed source, such as a text book or course notes, that has been clearly noted with a proper citation in the comments of my program. <Student Name> Failure to include this pledge in a submission may result in a score of zero being assigned. Version 2.00 This is a purely individual assignment! 6