Daa Srucures and Algorihms The maerial for his lecure is drawn, in ar, from The Pracice of Programming (Kernighan & Pike) Chaer 2 1 Goals of his Lecure Hel you learn (or refresh your memory) abou: Common daa srucures and algorihms Why? Shallow moivaion: Provide examles of oiner-relaed C code Why? Deeer moivaion: Common daa srucures and algorihms serve as high level building blocks A ower rogrammer: Rarely creaes rograms from scrach Ofen creaes rograms using high level building blocks 2 1
A Common Task Mainain a able of key/value airs Each key is a sring; each value is an in Unknown number of key-value airs For simliciy, allow dulicae keys (clien resonsibiliy) In Assignmen #, mus check for dulicae keys! Examles (suden name, grade) ( john smih, 8), ( jane doe, 9), ( bill clinon, 81) (baseball layer, number) ( Ruh, ), ( Gehrig, ), ( Manle, ) (variable name, value) ( maxlengh, 2000), ( i, ), ( j, -10) Daa Srucures and Algorihms Daa srucures Linked lis of key/value airs Hash able of key/value airs Algorihms Creae: Creae he daa srucure Add: Add a key/value air Search: Search for a key/value air, by key Free: Free he daa srucure 2
Daa Srucure #1: Linked Lis Daa srucure: Nodes; each conains key/value air and oiner o nex node "Manle" Algorihms: Creae: Allocae Table srucure o oin o firs node Add: Inser new node a fron of lis Search: Linear search hrough he lis Free: Free nodes while raversing; free Table srucure 5 Linked Lis: Daa Srucure sruc Node { cons char *key; in value; sruc Node *nex; ; sruc Table { sruc Node *firs; ; sruc Table sruc Node sruc Node 6
Linked Lis: Creae (1) sruc Table *Table_creae(void) { = (sruc Table*) malloc(sizeof(sruc Table)); ->firs = ; reurn ; = Table_creae(); Linked Lis: Creae (2) sruc Table *Table_creae(void) { = (sruc Table*) malloc(sizeof(sruc Table)); ->firs = ; reurn ; = Table_creae(); 8
Linked Lis: Add (1) void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); ->key = key; ->value = value; ->nex = ->firs; ->firs = ; These are oiners o srings ha exis in he RODATA secion Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 9 Linked Lis: Add (2) void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); ->key = key; ->value = value; ->nex = ->firs; ->firs = ; Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 10 5
Linked Lis: Add () void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); ->key = key; ->value = value; ->nex = ->firs; ->firs = ; "Manle" Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 11 Linked Lis: Add () void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); ->key = key; ->value = value; ->nex = ->firs; ->firs = ; "Manle" Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 12 6
Linked Lis: Add (5) void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); ->key = key; ->value = value; ->nex = ->firs; ->firs = ; "Manle" Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 1 Linked Lis: Search (1) in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; for ( = ->firs;!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); "Manle" 1
Linked Lis: Search (2) in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; for ( = ->firs;!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); "Manle" 15 Linked Lis: Search () in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; for ( = ->firs;!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); "Manle" 16 8
Linked Lis: Search () in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; for ( = ->firs;!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); "Manle" 1 Linked Lis: Search (5) in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; for ( = ->firs;!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); "Manle" 18 9
Linked Lis: Search (6) in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; for ( = ->firs;!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; 1 in value; in found; found = Table_search(,, &value); "Manle" 19 Linked Lis: Free (1) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); Table_free(); "Manle" 20 10
Linked Lis: Free (2) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); "Manle" Table_free(); 21 Linked Lis: Free () void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); "Manle" nex Table_free(); 22 11
Linked Lis: Free () void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); "Manle" nex Table_free(); 2 Linked Lis: Free (5) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); "Manle" nex Table_free(); 2 12
Linked Lis: Free (6) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); "Manle" nex Table_free(); 25 Linked Lis: Free () void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); Table_free(); "Manle" nex 26 1
Linked Lis: Free (8) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free(); Table_free(); "Manle" nex 2 Linked Lis: Free (9) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; for ( = ->firs;!= ; = nex) { nex = ->nex; free(); free() Table_free(); "Manle" nex 28 1
Linked Lis Performance Creae: Add: Search: Free: fas fas slow slow Wha are he asymoic run imes (big-oh noaion)? Would i be beer o kee he nodes sored by key? 29 Daa Srucure #2: Hash Table Fixed-size array where each elemen oins o a linked lis 0 ARRAYSIZE-1 sruc Node *array[arraysize]; Funcion mas each key o an array index For examle, for an ineger key h Hash funcion: i = h % ARRAYSIZE (mod funcion) Go o array elemen i, i.e., he linked lis hashab[i] Search for elemen, add elemen, remove elemen, ec. 0 15
Hash Table Examle Ineger keys, array of size 5 wih hash funcion h mod 5 16 % 5 is 1 1861 % 5 is 1 199 % 5 is 0 1 2 16 Revoluion 199 WW2 1861 Civil 1 How Large an Array? Large enough ha average bucke size is 1 Shor buckes mean fas search Long buckes mean slow search Small enough o be memory efficien No an excessive number of elemens Forunaely, each array elemen is jus soring a oiner This is OK: 0 ARRAYSIZE-1 2 16
Wha Kind of Hash Funcion? Good a disribuing elemens across he array Disribue resuls over he range 0, 1,, ARRAYSIZE-1 Disribue resuls evenly o avoid very long buckes This is no so good: 0 ARRAYSIZE-1 Wha would be he wors ossible hash funcion? Hashing Sring Keys o Inegers Simle schemes donʼ disribue he keys evenly enough Number of characers, mod ARRAYSIZE Sum he ASCII values of all characers, mod ARRAYSIZE Hereʼs a reasonably good hash funcion Weighed sum of characers x i in he sring (Σ a i x i ) mod ARRAYSIZE Bes if a and ARRAYSIZE are relaively rime E.g., a = 65599, ARRAYSIZE = 102 1
Imlemening Hash Funcion Poenially exensive o comue a i for each value of i Comuing a i for each value of I Insead, do (((x[0] * 65599 + x[1]) * 65599 + x[2]) * 65599 + x[]) * unsigned in hash(cons char *x) { in i; unsigned in h = 0U; for (i=0; x[i]!='\0'; i++) h = h * 65599 + (unsigned char)x[i]; reurn h % 102; Can be more clever han his for owers of wo! (Described in Aendix) 5 Hash Table Examle Examle: ARRAYSIZE = Looku (and ener, if no resen) hese srings: he, ca, in, he, ha Hash able iniially emy. Firs word: he. hash( he ) = 9651569. 9651569 % = 1. Search he linked lis able[1] for he sring he ; no found. 0 1 2 5 6 6 18
Hash Table Examle (con.) Examle: ARRAYSIZE = Looku (and ener, if no resen) hese srings: he, ca, in, he, ha Hash able iniially emy. Firs word: he. hash( he ) = 9651569. 9651569 % = 1. Search he linked lis able[1] for he sring he ; no found Now: able[1] = makelink(key, value, able[1]) 0 1 2 5 6 he Hash Table Examle (con.) Second word: ca. hash( ca ) = 8958856. 8958856 % = 2. Search he linked lis able[2] for he sring ca ; no found Now: able[2] = makelink(key, value, able[2]) 0 1 2 5 6 he 8 19
Hash Table Examle (con.) Third word: in. hash( in ) = 6888005. 6888005% = 5. Search he linked lis able[5] for he sring in ; no found Now: able[5] = makelink(key, value, able[5]) 0 1 2 5 6 he ca 9 Hash Table Examle (con.) Fourh word: he. hash( he ) = 9651569. 9651569 % = 1. Search he linked lis able[1] for he sring he ; found i! 0 1 2 5 6 he in ca 0 20
Hash Table Examle (con.) Fourh word: ha. hash( ha ) = 8655599. 8655599 % = 2. Search he linked lis able[2] for he sring ha ; no found. Now, inser ha ino he linked lis able[2]. A beginning or end? Doesnʼ maer. 0 1 2 5 6 he in ca 1 Hash Table Examle (con.) Insering a he fron is easier, so add ha a he fron 0 1 2 5 6 he in ha ca 2 21
Hash Table: Daa Srucure enum {BUCKET_COUNT = 102; sruc Node { cons char *key; in value; sruc Node *nex; ; sruc Table { sruc Node *array[bucket_count]; ; sruc Table 0 1 2 2 806 102 sruc Node sruc Node Hash Table: Creae sruc Table *Table_creae(void) { = (sruc Table*)calloc(1, sizeof(sruc Table)); reurn ; = Table_creae(); 0 1 102 22
Hash Table: Add (1) void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); in h = hash(key); ->key = key; ->value = value; ->nex = ->array[h]; ->array[h] = ; 0 1 2 2 806 102 Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); These are oiners o srings ha exis in he RODATA secion Preend ha Ruh hashed o 2 and Gehrig o 2 5 Hash Table: Add (2) void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); in h = hash(key); ->key = key; ->value = value; ->nex = ->array[h]; ->array[h] = ; Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 0 1 2 2 806 102 6 2
Hash Table: Add () void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); in h = hash(key); ->key = key; ->value = value; ->nex = ->array[h]; ->array[h] = ; 0 1 2 2 806 102 Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); Preend ha Manle hashed o 806, and so h = 806 "Manle" Hash Table: Add () void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); in h = hash(key); ->key = key; ->value = value; ->nex = ->array[h]; ->array[h] = ; Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 0 1 2 2 806 102 h = 806 "Manle" 8 2
Hash Table: Add (5) void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); in h = hash(key); ->key = key; ->value = value; ->nex = ->array[h]; ->array[h] = ; Table_add(,, ); Table_add(,, ); Table_add(, "Manle", ); 0 1 2 2 806 102 h = 806 "Manle" 9 Hash Table: Search (1) in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; in h = hash(key); for ( = ->array[h];!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); 0 1 2 2 806 102 "Manle" 50 25
Hash Table: Search (2) in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; in h = hash(key); for ( = ->array[h];!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; 0 1 2 2 806 102 in value; in found; found = Table_search(,, &value); Preend ha Gehrig hashed o 2, and so h = 2 "Manle" 51 Hash Table: Search () in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; in h = hash(key); for ( = ->array[h];!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); 0 1 2 2 806 102 "Manle" h = 2 52 26
Hash Table: Search () in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; in h = hash(key); for ( = ->array[h];!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); 0 1 2 2 806 102 "Manle" h = 2 5 Hash Table: Search (5) in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; in h = hash(key); for ( = ->array[h];!= ; = ->nex) if (srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; in value; in found; found = Table_search(,, &value); 0 1 2 2 806 102 1 "Manle" h = 2 5 2
Hash Table: Free (1) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 "Manle" 55 Hash Table: Free (2) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 b = 0 "Manle" 56 28
Hash Table: Free () void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 b = 0 "Manle" 5 Hash Table: Free () void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 b = 1,, 2 "Manle" 58 29
Hash Table: Free (5) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 b = 2 "Manle" 59 Hash Table: Free (6) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 nex b = 2 "Manle" 60 0
Hash Table: Free () void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 nex b = 2 "Manle" 61 Hash Table: Free (8) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 b = 2,, 2 b = 2,, 806 b = 80,, 102 "Manle" 62 1
Hash Table: Free (9) void Table_free(sruc Table *) { sruc Node *; sruc Node *nex; in b; for (b = 0; b < BUCKET_COUNT; b++) for ( = ->array[b];!= ; = nex) { nex = ->nex; free(); free(); Table_free(); 0 1 2 2 806 102 b = 102 "Manle" 6 Hash Table Performance Creae: fas Add: fas Search: fas Free: slow Wha are he asymoic run imes (big-oh noaion)? Is hash able search always fas? 6 2
Key Ownershi Noe: Table_add() funcions conain his code: void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); ->key = key; Caller asses key, which is a oiner o memory where a sring resides Table_add() funcion sores wihin he able he address where he sring resides 65 Key Ownershi (con.) Problem: Consider his calling code: sruc Table ; char k[100] = ; Table_add(, k, ); srcy(k, ); Via Table_add(), able conains memory address k Clien changes sring a memory address k Thus clien changes key wihin able Wha haens if he clien searches for Ruh? Wha haens if he clien searches for Gehrig? 66
Key Ownershi (con.) Soluion: Table_add() saves coy of given key void Table_add(sruc Table *, cons char *key, in value) { sruc Node * = (sruc Node*)malloc(sizeof(sruc Node)); ->key = (cons char*)malloc(srlen(key) + 1); srcy(->key, key); Why add 1? If clien changes sring a memory address k, daa srucure is no affeced Then he daa srucure owns he coy, ha is: The daa srucure is resonsible for freeing he memory in which he coy resides The Table_free() funcion mus free he coy 6 Summary Common daa srucures and associaed algorihms Linked lis fas inser, slow search Hash able Fas inser, (oenially) fas search Invaluable for soring key/value airs Very common Relaed issues Hashing algorihms Memory ownershi 68
Aendix Suid rogrammer ricks relaed o hash ables 69 Revisiing Hash Funcions Poenially exensive o comue mod c Involves division by c and keeing he remainder Easier when c is a ower of 2 (e.g., 16 = 2 ) An alernaive (by examle) 5 = 2 + 16 + + 1 2 16 8 2 1 0 0 1 1 0 1 0 1 5 % 16 is 5, he las four bis of he number 2 16 8 2 1 0 0 0 0 0 1 0 1 Would like an easy way o isolae he las four bis 0 5
Recall: Biwise Oeraors in C Biwise AND (&) 5 & 15 & 0 1 0 1 0 0 0 1 Mod on he chea! E.g., h = 5 & 15; 0 0 1 1 0 1 0 1 0 0 0 0 1 1 1 1 Biwise OR ( ) 0 1 0 1 0 1 1 1 Oneʼs comlemen (~) Turns 0 o 1, and 1 o 0 E.g., se las hree bis o 0 x = x & ~; 5 0 0 0 0 0 1 0 1 1 A Faser Hash Funcion unsigned in hash(cons char *x) { in i; unsigned in h = 0U; for (i=0; x[i]!='\0'; i++) h = h * 65599 + (unsigned char)x[i]; reurn h % 102; Previous version unsigned in hash(cons char *x) { in i; unsigned in h = 0U; for (i=0; x[i]!='\0'; i++) h = h * 65599 + (unsigned char)x[i]; reurn h & 102; Wha haens if you misakenly wrie h & 102? Faser 2 6
Seeding U Key Comarisons Seeding u key comarisons For any non-rivial value comarison funcion Trick: sore full hash resul in srucure in Table_search(sruc Table *, cons char *key, in *value) { sruc Node *; in h = hash(key); /* No % in hash funcion */ for ( = ->array[h%102];!= ; = ->nex) if ((->hash == h) && srcm(->key, key) == 0) { *value = ->value; reurn 1; reurn 0; Why is his so much faser?