A Capacity: 10 Usage: 4 Data: - PDF Free Download

Creating a Data Type in C Integer Set For this assignment, you will use the struct mechanism in C to implement a data type that represents sets of integers. A set can be modeled using the C struct: struct _CSet { uint32_t Capacity; // dimension of the set's array uint32_t Usage; // number of elements in the set int32_t Data; // pointer to the set's array }; typedef struct _CSet CSet; So, a CSet object A that represents the set 17, 9, 23, 44 could be initialized like this: A Capacity: 10 17 9 23 44 We will say that a CSet object A is proper if and only if it satisfies each of the following conditions: 1. If A.Capacity == 0 then A.Usage == 0 and A.Data == NULL. 2. If A.Capacity > 0 then A.Data points to an array of dimension A.Capacity. 3. A.Data[0 : A.Usage-1] are the values stored in the set (in unspecified order). 4. A.Data[A.Usage : A.Capacity-1] equal INT32_MIN. A data type consists of a collection of values and a set of operations that may be performed on those values. For a set type, it would make sense to provide the common set operations; for example: / Sets punion to be the union of the sets pa and pb. Pre: punion, pa and pb are proper Post: pa and pb are unchanged For every integer x, x is contained in punion iff x is contained in pa or in pb (or both). punion->capacity == pa->capacity + pb->capacity punion->usage == pa->usage + pb->usage - number of elements that occur in both pa and pb punion is proper Returns: true if the union is successfully created; false otherwise / bool CSet_Union(CSet const punion, const CSet const pa, const CSet const pb); Version 2.00 This is a purely individual assignment! 1

Some Notes on Function Interface Design In this case, the use of a bool return type allows the function to indicate whether the operation was completed successfully. The functions we will implement are designed to pass pointers to CSet objects, rather than to pass CSet objects by copy. While that would not duplicate the array (see the deep copy discussion later in this specification), passing a CSet object would still entail copying the actual members of the object, and that's wasteful of both time and space. The interface allows the union to be a different CSet object than either of the CSet objects being combined. That has semantic integrity, since in mathematics writing A B does not imply that either of the two sets will be changed. That is, is the name of a different set than A or B (although the union could equal one of them). A B On the other hand, the preconditions do not require that punion point to a different CSet object than pa or pb (in fact, all three could point to the same object). That allows the user the flexibility to use the function to combine computing the union with overwriting one of the source sets with the union, analogous to writing "let " in mathematics. It should be noted that this flexibility also creates some potential problems in implementing the computation of the union. A A B The interface also illustrates the complexities of using the const qualifier in C. The specification of the parameter const CSet const pa implies that the value of the parameter pa cannot be modified within the function, nor can the value of the pointer's target be changed via a use of that pointer. Any violations of those restrictions are to be flagged as errors at compile time. On the other hand, if the target of the pointer is modified via some other means, the compiler will not be expected to detect that possibility, much less to flag it as an error, or even with a warning. So, if the user calls CSet_Union() and the first and second parameters are aliases (that is, pointers to the same CSet object), there is no violation of the const specifier. This doesn't indicate there's something "wrong" with the way const works in C, just that there are subtleties which need to be understood. Note also that it's possible to avoid the issue by changing the preconditions (and enforcing them). If the preconditions for CSet_Union() specified that punion could not alias either pa or pb, then the implementation could simply check those conditions and return false if either were violated. The function does provide an excuse to make good use of the const qualifier, applied to the pointer and/or its target, as appropriate. In the case of the function above, there is no logical reason the function should modify either of the CSet objects whose union is being found, unless the caller deliberately creates an alias, so the pointers to those are both are qualified as const CSet const. But the function does need to modify the CSet object that will hold the union, so the pointer to that object is qualified as CSet const. Note that all three pointers are const -qualified so that the function cannot make any of them point to a different location in memory. Obviously there's no reason for the function to give any of those pointers a new target; qualifying them as const pointers ensures that we won't make any mistakes of that sort within the definition of the function, unless we were to do something disreputable... The first post-condition might seem to be superfluous; since the use of const would seem to prevent any modifications to pa or pb. However, C allows many typecasts that are potentially unwise, including casting away const-ness. The stated post-condition assures the caller that the implementation of the function does not violate the intent indicated by the way const was applied to the parameters. Version 2.00 This is a purely individual assignment! 2

Other Operations The most basic operation for a complex data type is the initialization of a new object of that type. Even though C does not provide for automatic initialization via a "constructor", it is still useful to supply a function that will allow the client to control the initialization of new objects explicitly. For the CSet type we have the following function, which allows the user to specify the initial capacity of the set, and provides a guaranteed filler value in each position in the set's array: / Initializes a raw pset object, with capacity Sz. Pre: pset points to a CSet object, but pset may be proper or raw Sz has been initialized Post: If successful: pset->capacity == Sz pset->usage == 0 pset->data points to an array of dimension Sz, or is NULL if Sz == 0 if Sz!= 0, pset->data[0 : Sz-1] == INT32_MIN else: pset->capacity == 0 pset->usage == 0 pset->data == NULL Returns: true if successful, false otherwise / bool CSet_Init(CSet const pset, uint32_t Sz); The CSet type raises a deep-copy issue, since the data array is allocated dynamically. Since a user might very well need to make a copy of a CSet object, we will provide a deep-copy function to support that. The cost of making a deep copy of a CSet object may also be significant, due to the potential size of that array, so we should take care to optimize the implementation: / Makes a deep copy of a CSet object. Pre: ptarget and psource are proper Post: psource is unchanged If successful: ptarget->capacity == psource->capacity ptarget->usage == psource->usage ptarget[0:ptarget->capacity-1] == psource[0:psource->capacity-1] ptarget->data!= psource->data, unless ptarget == psource ptarget is proper. else: ptarget is unchanged Returns: true if successful, false otherwise / bool CSet_Copy(CSet const ptarget, const CSet const psource); In this case, there is no requirement that ptarget and psource are different. The rationale is that it's easy to handle that case (do nothing), and while it's strange to specify copying an object to itself, a user may choose to do so, or even do so accidentally. See the Appendix: Deep Copy for a full discussion of the issues this function must solve. Version 2.00 This is a purely individual assignment! 3

Here's another support function we'll include: / Adds Value to a pset object. Pre: pset is proper Value has been initialized Post: If successful: Value is a member of pset pset->usage has been incremented pset->capacity has been increased, if necessary pset is proper else: pset is unchanged Returns: true if successful, false otherwise / bool CSet_Insert(CSet const pset, int32_t Value); One of the post-conditions deserves some comments. If the CSet object pset is full, its Capacity is supposed to be increased. That requires allocating a new array dynamically, which could fail; that's why the return type is bool. Once the new array is created, the values must be copied into it from the old array, the new element must be added, and the old array must be deallocated in order to avoid a memory leak. This may seem to be a good place to consider using the C Standard Library function realloc(). However, you must be careful to ensure that the resulting CSet object is proper. How should you determine the size of the new array? On the one hand, it is good to avoid wasting memory. On the other hand, the resizing operation described above is fairly expensive, and it is desirable to avoid doing it often. If we double the size of the array, we can show that the average cost of an insertion is O(1). In essence, if we currently have a full array of size N, we get one very expensive insertion as we create a new array of size 2N, and then we get up to N more insertions that are very cheap. So, the average cost is quite good. Here are the (mostly mathematical) set operations we will support: inserting a new value into a set removing a value from a set determining if a value is a member of a set forming the union of two sets forming the (nonsymmetric) difference of two sets determining if two sets are equal determining if one set is a subset of another set determining if a set is empty making a copy of a set There are many other operations that would be included in a production version of a set type (intersection, cardinality, isproper-subset-of, etc.). Pay attention to the comments in the header file. All the stated pre- and post-conditions are part of the assignment. Pay particular attention to avoiding memory leaks. Version 2.00 This is a purely individual assignment! 4

Testing/Grading Harness Download the posted tar file, C01.tar from the course website and unpack it on a CentOS 7 system. You should receive the following files: driver.c CSet.h CSet.c gradecset.h gradecset.o driver for running all the tests and scoring; see embedded comments! declaration of CSet type shell for implementation of CSet type interface for testing/grading code 64-bit CentOS 7 binary for testing/grading code Unpack the posted tar file, and complete the implementation file (CSet.c). You can then compile with the following commands: gcc o driver std=c99 Wall ggdb3.c.o This will probably not work on CentOS 6, or other Linux distros, due to the presence of incompatible libraries, but it does work on CentOS 7 and on rlogin. You should also note that the posted code will, indeed, compile. And, if you execute it as is, it will not perform correctly, because the given implementations in CSet.c are almost entirely incomplete. You will find it valuable to implement some static ("private") helper functions. Those must be placed in your.c file. Your solution will be compiled with a test driver, using the supplied header file, so if you do not conform to the specified interfaces there will very likely be compilation and/or linking errors. Your solution is expected to achieve a clean run on Valgrind (see the Appendix: Valgrind). Failure to do so will result in a penalty of up to 10%. The requirements related to memory management and computational complexity, will be checked manually by a TA after the assignment is closed. Any shortcomings will be assessed a penalty, which will be applied to your score from the testing harness. What to Submit You will submit your completed version of CSet.c, containing nothing but the implementation of the specified C functions and any static helper functions you write. Be sure to conform to the specified function interfaces. Your submission will be compiled with the test driver: gcc std=c99 Wall m64 ggdb3.c.o The test/grading harness writes a collection of text files to report the results. The file Summary.txt contains overall score information. There are also individual log files for each tested feature, and those contain verbose output showing the tests and results. The Student Guide and other pertinent information, such as the link to the proper submit page, can be found at: http://www.cs.vt.edu/curator/ Version 2.00 This is a purely individual assignment! 5

Pledge Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically, you must include the following pledge statement in the submitted file: // On my honor: // // - I have not discussed the C language code in my program with // anyone other than my instructor or the teaching assistants // assigned to this course. // // - I have not used C language code obtained from another student, // or any other unauthorized source, including the Internet, // either modified or unmodified. // // - If any C language code or documentation used in my program // was obtained from an allowed source, such as a text book or course // notes, that has been clearly noted with a proper citation in // the comments of my program. // // <Student Name> Failure to include this pledge in a submission may result in being assigned a score of zero. Version 2.00 This is a purely individual assignment! 6

Appendix: Deep Copy When an assignment is performed with C struct variables, the operation is carried out (logically) by copying the values of the data members from the object on the right side to the corresponding data members of the object on the left side: struct Point { int x; int y; }; P x: 19 y: 71 struct Point P = {19, 71}; struct Point Q; Q = P; Q x: 19 y: 71 So, what's wrong with that? Nothing. But, suppose we have a struct type that includes a data member that is a pointer to something that's allocated dynamically (like a CSet object); assume we have the CSet object A from page 1 and we perform an assignment: CSet B; B = A; The result will be that B and A will "share" the dynamically-allocated array, because the value of A.Data (which is the address of the array) will have been copied to B. A Capacity: 10 17 9 23 44 B Capacity: 10 The effect of the assignment is to make a shallow copy of A. That is, the data members of A are duplicated, but if A contains a pointer, the target of that pointer is not duplicated. In most cases, we would expect that performing an assignment like the one above would result in producing a full (or deep) copy of A. That is, we'd expect the result would have been: A Capacity: 10 17 9 23 44 B Capacity: 10 17 9 23 44 This is often referred to as maintaining the semantic integrity of an operation. That is, certain operations carry certain expectations (e.g., addition of variables yields a new value but does not modify either variable being added). An implementation that does not satisfy such common, reasonable expectations is likely to cause confusion. Version 2.00 This is a purely individual assignment! 7

The possible consequences of making a shallow copy are severe. Since the two objects share their dynamic content, changes to that content via one object will be visible via the other object. (Of course, there may be a circumstance in which that's exactly what we would want.) Worse, if the dynamic content is deallocated via one of the objects, the other object will have lost its dynamic content, but that object will still hold a pointer to the no-longer-accessible memory. That's likely to lead to runtime errors. In C solutions, we fix the shallow copy problem by implementing a specialized copy function that makes a deep copy of an object. For the CSet type, we would implement the following function: / Makes a deep copy of a CSet object. Pre: ptarget and psource are proper Post: psource is unchanged If successful: ptarget->capacity == psource->capacity ptarget->usage == psource->usage ptarget[0:ptarget->capacity-1] == psource[0:psource->capacity-1] ptarget->data!= psource->data, unless ptarget == psource ptarget is proper. else: ptarget is unchanged Returns: true if successful, false otherwise / bool CSet_Copy(CSet const ptarget, const CSet const psource); A deep copy function must: deallocate any previous deep content in the target object duplicate the purely shallow content from the source object into the target object allocate memory for the necessary dynamic content in the target object duplicate the deep content from the source object into the target object It should be noted that the deep copy problem is not unique to C. The same scenario would arise in the other languages you are likely to use, including Java (where it is most commonly handled by supplying a clone() method) and C++ (where it is most commonly handled with a copy constructor and an assignment operator overload). In all cases, the same basic steps must be carried out. Version 2.00 This is a purely individual assignment! 8

Appendix: Valgrind Valgrind is a tool for detecting certain memory-related errors, including out of bounds accessed to dynamically-allocated arrays and memory leaks (failure to deallocate memory that was allocated dynamically). A short introduction to Valgrind is posted on the Resources page, and an extensive manual is available at the Valgrind project site (www.valgrind.org). For best results, you should compile your C program with a debugging switch (-g or ggdb3); this allows Valgrind to provide more precise information about the sources of errors it detects. For example, I ran my solution for this project, with one of the test cases, on Valgrind: wmcquain@centosvm CSet > valgrind --leak-check=full --show-leak-kinds=all --log-file=vlog.txt --track-origins=yes -v driver And, I got good news... there were no detected memory-related issues with my code: Memcheck, a memory error detector Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info Command: driver HEAP SUMMARY: in use at exit: 0 bytes in 0 blocks total heap usage: 236 allocs, 236 frees, 23,200 bytes allocated ==10669== All heap blocks were freed -- no leaks are possible ==10669== ==10669== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2) ==10669== ==10669== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2) And, I got good news... there were no detected memory-related issues with my code. That's the sort of results you want to see when you try your solution with Valgrind. On the other hand, here s what I got from a preliminary version of the solution, before I cleaned it up: Memcheck, a memory error detector Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info Command: driver HEAP SUMMARY: in use at exit: 548 bytes in 9 blocks total heap usage: 236 allocs, 227 frees, 23,200 bytes allocated Searching for pointers to 9 not-freed blocks Checked 73,848 bytes 12 bytes in 1 blocks are definitely lost in loss record 1 of 9 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4081FF: Init (gradecset.c:2346) by 0x404950: testcsetsubset (gradecset.c:1096) by 0x400B86: main (driver.c:103) 48 bytes in 1 blocks are definitely lost in loss record 2 of 9 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x407A5B: Copy (gradecset.c:2140) by 0x405B88: testcsetunion (gradecset.c:1473) by 0x400BE5: main (driver.c:110)... Valgrind may also detect out-of-bounds accesses to arrays and uses of uninitialized values; a Valgrind analysis of your solution should not show any of those either. Version 2.00 This is a purely individual assignment! 9