Structures and Unions in C Leo Ferres Department of Computer Science Universidad de Concepción leo@inf.udec.cl July 5, 2010 1 Introduction 2 Structures [W1] Structures in C are defined as data containers consisting of a sequence of named members of various types. For instance, take the code in struct1 1a. It defines a struct that consists of four members: gender, age, money, nice hairdo. 1a 1b 1c struct1 1a structs are declared and initialized similarly to variables, except for the preposition of the reserved word struct. So, to define a Customer struct, we can do declinitstruct 1b c1; /*just declared, not init*/ c2 = m,20,200, y /*declared and init d*/ c1, never used. c2, never used. We could have also declared struct variables in the struct declaration itself: struct2 1c c1,c2,c3[10]; c1, never used. c2, never used. c3, never used. 1
and initialize it later in the program. Note that now there is also an array of ten Customer structs in c3. The members of a structure are stored in consecutive locations in memory, although the compiler is allowed to insert padding between or after members (but not before the first member) for efficiency. The size of a structure is equal to the sum of the sizes of its members, plus the size of the padding. [W2] When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system). Data alignment means putting the data at a memory offset equal to some multiple of the word size, which increases the system s performance due to the way the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding. For example, when the computer s word size is 4 bytes, the data to be read should be at a memory offset which is some multiple of 4. Consider the following struct: 2a 2b 2c alignedstruct 2a struct MyData short Data1; short Data2; short Data3; myd; myd, never used. If we ask for thesizeof(md), after compilation it will say 6 bytes (offset is 0, 2, 4). If we re compiling on a regular RISC x86 computer, then theshort type is two: 3*short=6. So this is fine. However, if we do, for instance a sizeof(c1), from struct2 1c above, then the compiler will return 12, when 2*char+int+short=8 before compilation (offset is 0, 2, 4, 6), not a multiple of 4. Where does this twelve come from? realstruct 2b char Padding1[1]; /* For the following short to be aligned on a 2 byte boundary */ char Padding2[3]; The compiled size of the structure is now 12 bytes. It is important to note that the last member is padded with the number of bytes required to conform to the largest type of the structure. In this case 3 bytes are added to the last member to pad the structure to the size of a long word. alignedchunk 2c /* after reordering */ The compiled size of the structure now matches the pre-compiled size of 8 bytes. Note that Padding1[1] has been replaced (and thus eliminated) by Data4 and Padding2[3] is no longer necessary as the structure is already aligned to the size of a long word [W2]. 2
The way we refer to any particular member of the structure is structure-name. member as in c1.age = 22; if( c1.gender == f )... Although the names of structure members never stand alone, they still have to be unique; there can t be another id or usage in some other structure. So far we haven t gained much. The advantages of structures start to come when we have arrays of structures, or when we want to pass complicated data layouts between functions. Suppose we wanted to make a symbol table for up to 100 identifiers. We could extend our definitions like char gender[100]; char nice_hairdo[100]; short age[100]; int money[100]; but a structure lets us rearrange this spread-out information so all the data about a single identifer is collected into one lump: sym[100]; Thus to print a list of all identifiers that haven t been used, together with their line number, for( i=0; i<nsym; i++ ) if( sym[i].age <= 30 ) printf("%d\t%s\n", sym[i].age, sym[i].gender); What if we want to use pointers? *psym; The way we actually refer to an member of a structure by a pointer is like this: ptr -> structure-member The symbol -> means we re pointing at a member of a structure; -> is only used in that context. ptr is a pointer to the (base of) a structure that contains the structure member. The expression ptr->structure-member refers to the indicated member of the pointed-to structure. Thus we have constructions like: psym->gender = f ; psym->age = 56; (*psym).age = 56; // equivalent to the one above and so on (poionter arithmetic still works here). Let s look at some real code. Here s a simple skeleton of a program that reads 100 students from stdin, and then prints them to stdout [O1]. 3 students.c 3 #include<stdio.h> #define SIZE 100 /*max 100 students*/ student type 4a print students func 4b read students 4c int main(int argc, char *argv[]) 3
declarations and inits 5a reading loop 5b writing loop 5c return 0; main, never used. SIZE, used in chunk 5. We will first define our student struct using typedef: 4a student type 4a (3) typedef struct student_type char name[20]; int ID; student_t; student t, used in chunks 4 and 5a. The above code only means that we have defined a new type student t using struct student type. We can now use student t as a type, effectively encapsulating a lot of knowledge. Thus, in order to print a student, we will need a pointer to the student type student t, and access the 2 student t members, name and ID using the -> operator, since we re passing a pointer to a student t, as we do in <<print students func>>. 4b print students func 4b (3) void print_student(student_t *s) printf("name: %s\n", s->name); printf("id: %d\n", s->id); print student, used in chunk 5c. Uses student t 4a. Perhaps the only thing we didn t know about the following piece of code is the strcpy function, which takes a const char* for the destination, and a const char* for the source. In our case below, the destination is the name member of student t* s. We simply return s 4c read students 4c (3) student_t* read_student(student_t *s) int ID; char name[20]; printf("enter ID and name\n"); if (scanf("%d %19s", &ID, name) == EOF) return NULL; s->id = ID; strcpy(s->name, name); return s; Uses student t 4a. 4
count will hold the number of student t in students[size], while n will be the index for the iteration. Notice that at the end we know how many students are there because of count. 5a declarations and inits 5a (3) student_t students[size]; int count = 0; int n; Uses SIZE 3 and student t 4a. Note that student+count is adding count to the students pointer! If for some reason the function read student returns NULL, then we break from the while loop. 5b reading loop 5b (3) while (count < SIZE) if (read_student(students + count) == NULL) break; count++; Uses SIZE 3. Similarly, in the code below student+count is adding count to the students pointer and passing this to the printing function. 5c writing loop 5c (3) for (n = 0; n < count; n++) print_student(students + n); Uses print student 4b. 3 Unions Reference symbols [W1]: http://en.wikipedia.org/wiki/c_syntax#structures_and_unions [W2]: http://en.wikipedia.org/wiki/data_structure_alignment [O1]: http://jan.newmarch.name/os/l5_2.html Index of variables c1: 1b, 1c c2: 1b, 1c c3: 1c main: 3 myd: 2a print student: 4b, 5c SIZE: 3, 5a, 5b student t: 4a, 4b, 4c, 5a 5