Creating a Data Type in C Bytes For this assignment, you will use the struct mechanism in C to implement a data type that represents an array of bytes. This data structure could be used kind of like a base class for a string type, or a more general dynamic array type. An array of bytes can be modeled using the C struct: Bytes type, represents an array of byte-sized values. struct bytes_t Automatically allocated default buffer. This is used to store the bytes until usage > DEFAULT_SIZE, then we switch to using dynamic memory (data). dflt is never used again once the switch occurs. uint8_t dflt[default_size]; Initially NULL, points to the dynamically allocated array of bytes. uint8_t *data; The number of slots or bytes in use, initially 0. size_t usage; }; Actual size of the array, initially DEFAULT_SIZE. size_t dim; typedef struct bytes_t bytes_t; Since C arrays don t automatically keep track of their dimensions or usage, it seems completely reasonable to capture all of that information inside of a struct, and use that to represent a flexible bytes type. Your bytes_t type must be able grow and shrink as necessary, and will require you to dynamically allocate and free memory. For example, inserting some bytes into a bytes_t variable may cause dynamic allocation to occur, and when you no longer need a bytes_t variable you must free the associated memory (in the appropriate function). Note: This is a two-part assignment, you will use the bytes_t type again so it's imperative that your code functions properly. Operations A data type consists of a collection of values and a set of operations that may be performed on those values. For a bytes type, it would make sense to provide common operations that let you manipulate bytes; for example: Makes a deep copy of *src and stores it in *dest. Similar to *dest = *src. Pre: src - Points to valid bytes_t variable and is not NULL. dest - Points to valid bytes_t variable and is not NULL. Post: If successful then *dest contains a copy of *src. Both *src and *dest should be completely separate entities. Changing one should not affect the other. Returns: If copying *src into *dest succeeds the function will return true. If dynamic allocation is required, but fails this function will return false. bool bytes_copy(bytes_t * const dest, const bytes_t * const src); This is a purely individual assignment! 1
The design of bytes_copy() follows the expected semantics of an assignment statement. The function will make a copy of the contents of *src and store it in *dest, allocating memory as necessary and setting dest->usage and dest- >dim as required. We will copy one aspect of an OO design; it's useful to provide a function that will create a new bytes_t object: Initializes a bytes_t variable. Pre: bytes - Points to valid bytes_t variable and is not NULL. Post: data Is initialized to NULL. usage - Is initialized to 0. dim - Is initialized to DEFAULT_SIZE. void bytes_init(bytes_t * const bytes); To some degree, this plays the role of a constructor in an OO implementation; the basic difference is that this is actually responsible for creating the new object. In contrast, in a Java implementation, you would call new to allocate memory for the object dynamically and then call the constructor to initialize that memory correctly. For this assignment an initialization function, a de-allocation function, and a print function are provided, your task is to complete the rest of the implementation. The required functions are: Status functions, these are one-liners: bool bytes_empty(const bytes_t * const bytes); uint8_t * bytes_data(const bytes_t * const bytes); size_t bytes_usage(const bytes_t * const bytes); More complex functions that provide common operations for the bytes_t. bool bytes_set(bytes_t * const bytes, size_t index, uint8_t val); bool bytes_get(const bytes_t * const bytes, size_t index, width_t width, endian_t order, uint64_t * const result); bool bytes_insert(bytes_t * const bytes, size_t index, uint8_t * buf, size_t len); bool bytes_erase(bytes_t * const bytes, size_t index, size_t len); uint8_t * bytes_range(const bytes_t * const bytes, range_t * const range); bool bytes_copy(bytes_t * const dest, const bytes_t * const src); The file bytes.h, posted on the course website, includes header comments for each of these functions. Pay attention to the comments in the header file. All the stated pre- and post-conditions are part of the assignment. Pay particular attention to avoiding memory leaks. You should place your implementation in the bytes.c file posted on the website. You should consider implementing additional "helper" functions. Those should be private to your C file, so make them static; they will be invisible to the testing code and never called directly from outside your file. Your solution will be compiled with a test driver, using the supplied header file, so if you do not conform to the specified interfaces there will very likely be compilation and/or linking errors. Design Given the C structure above, a natural question is, why should we use an automatically allocated buffer by default? Why not just dynamically allocate all the time? The short answer is that it makes the assignment more interesting, but the longer answer is that there's some overhead to using dynamic memory. To give you an idea of the penalty, below is a simple function that creates and then frees a few bytes_t variables. As simple benchmark, this function was executed 2 32-1 times in a tight loop. Using only the automatically allocated buffer, the This is a purely individual assignment! 2
program finished in about 5 minutes and 30 seconds, while using only dynamic memory required approximately 25 minutes to complete same task. This may or may not be a fair benchmark and may or may not represent real word use, however, it does illustrate that there is some additional overhead when dynamically allocating memory. If we choose DEFAULT_SIZE carefully, then most of our bytes_t variables won't require dynamically allocated memory at all, and when we do need a large amount of bytes we only have to pay the penalty when it's required. Using the default automatically allocated buffer is part of the assignment. You should only dynamically allocate when the usage grows beyond DEFAULT_SIZE. When this happens dynamically allocate memory and copy the contents from the default buffer, dlft. Expect do poorly on the tests if you don't follow this requirement. Rough benchmark function. Run me a bunch of times. void test_alloc() bytes_t a, b, c, d; bytes_init(&a); bytes_init(&b); bytes_init(&c); bytes_init(&d); } Larger default size than the assignment. for (int x = 0; x < 50; x++) Do some work a.data[x] = x; b.data[x] = x; c.data[x] = x; d.data[x] = x; } bytes_free(&a); bytes_free(&b); bytes_free(&c); bytes_free(&d); Helpful Hints and Information As you start working on the bytes project, here are some helpful hints and information that will be relevant for any assignment that uses dynamic memory in this class: If the curator produces no output while grading your assignment, your code is crashing. The test suite is a wrapper that calls your functions and prints out the results. It doesn't do anything else, so the problem is most likely inside of your code. Given the nature of dynamic memory errors your program can crash in unexpected (unrelated) places and also run fine on one machine and then crash on another. To expand on those points, where the code crashes is sometimes different than where the problem is actually located. For example, if the instructor provided bytes_free() function sometimes crashes on the curator, chances are there is something wrong with an another function. Most likely, the previously tested function broke something that is only now being triggered inside of bytes_free(). So long story short, you need to backtrack, look at the previous function calls etc. Further, depending on the nature of the bug, the provided driver will run fine on your machine but crash on the curator. This is often due to very small array bounds issues, i.e. going 1 out of bounds, or improperly using realloc (ignoring the return value). Further, these errors may happen sporadically making the problems difficult to find. Look closely at your bytes.c file for the issues described above, gdb may also be helpful. If you really can't find the issue come to office hours or post on forum. This is a purely individual assignment! 3
Other Tips Don't do things like this: bytes_t *new; new->usage = etc; return *new; bytes_t *new doesn't point to anything (or we don't know what it points to), so dereferencing it on the next line will likely cause problems. Surprisingly, this may actually run locally, but is a terrible idea, a logic error, and almost certainly won't work on the curator. Generally, the functions have bytes_t * as the parameters and you should use those preferentially rather than declaring new bytes_t variables. Also, remember there is nothing wrong with just declaring a bytes_t type and returning it: bytes_t b; fine b.data = malloc(...); return b; also fine, returns a copy of the values in b return &b; BAD! returning a pointer to local variable, DON'T do this, it's a terrible idea Valgrind Valgrind is a helpful tool for finding many array bounds issues as well as memory leaks. While it's available on Ubuntu/CentOS, it's already installed on the rlogin cluster. So if you are developing/testing on rlogin you should be to use Valgrind with no additional work. To invoke Valgrind: (assuming the executable is called driver): valgrind --leak-check=yes driver A quick start guide is here: http:valgrind.org/docs/manual/quick-start.html What to Submit You will submit a single.c file, containing nothing but the implementation of the specified C functions. Be sure to conform to the specified function interfaces. Your submission will be compiled with a test driver and graded according to how many cases your solution handles correctly. This assignment will be graded automatically. You will be allowed up to ten submissions for this assignment. Test your function thoroughly before submitting it. Make sure that your function produces correct results for every test case you can think of. Any additional requirements specified above will be checked manually by a TA after the assignment is closed. Any shortcomings will be assessed a penalty, which will be applied to your score from the Curator. The course policy is that the submission that yields the highest score will be checked. If several submissions are tied for the highest score, the latest of those will be checked. The Student Guide and other pertinent information, such as the link to the proper submit page, can be found at: http:www.cs.vt.edu/curator/ This is a purely individual assignment! 4
Pledge Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically, you must include the following pledge statement in the submitted file: On my honor: - I have not discussed the C language code in my program with anyone other than my instructor or the teaching assistants assigned to this course. - I have not used C language code obtained from another student, or any other unauthorized source, either modified or unmodified. - If any C language code or documentation used in my program was obtained from another source, such as a text book or course notes, that has been clearly noted with a proper citation in the comments of my program. - I have not designed this program in such a way as to defeat or interfere with the normal operation of the Curator System. <Student Name> Failure to include this pledge in a submission is a violation of the Honor Code. This is a purely individual assignment! 5