SYSC 2006 C Winter 2012 String Processing in C D.L. Bailey, Systems and Computer Engineering, Carleton University
References Hanly & Koffman, Chapter 9 Some examples adapted from code in The C Programming Language, Second Edition, Kernighan & Ritchie, Prentice Hall, 1988
Objectives Understand how C implements character strings Look at a few string functions from the C standard library (caller's view) Illustrate string processing algorithms by reimplementing some of the standard library functions
String Types Unlike C++, Java and Python, C does not have a named string type C++: type string Java: type String Python: type str C strings are implemented using arrays of characters
String Constants A sequence of characters in double quotes is a string constant or string literal Example: "SYSC" Stored as an array of characters, terminated by '\0' (the null character) C compiler creates the array and initializes it
String Constants Note 1: '\0' is not the same as '0' (the character zero) Note 2: number of elements in the array is 1 more than the number of chars between the quotes Adjacent string constants are concatenated at compile time "Hello, " "world!" and "Hello, world!" are equivalent
String Variables This declaration: char dept[] = "SYSC"; allocates an array called dept, initialized with 5 chars: 'S', 'Y', 'S', 'C', '\0'
String Variables char dept[] = "SYSC"; is equivalent to: char dept[5]; dept[0] = 'S'; dept[1] = 'Y'; dept[2] = 'S'; dept[3] = 'C'; dept[4] = '\0';
String Variables We don't need to initialize all the elements in a character array char dept[5]; dept[0] = 'I'; dept[1] = 'M'; dept[2] = 'D'; dept[3] = '\0'; dept[4] is uninitialized (that's o.k., because the string is properly terminated with '\0')
String Variables Can't assign a string literal to a character array This isn't permitted: char dept[5];... dept = "SYSC"; // Error!
String Variables const qualifier tells the compiler that the array elements should never be altered (compiler should flag any attempt to do so) const char dept[] = "SYSC";
String Operations C's operators are not overloaded to support string operations Example: in C++, Java and Python, + is the string concatenation operator In C, + cannot be used to concatenate two character strings
<string.h> C standard library provides several functions that provide common string operations Prototypes are found in <string.h>
strlen int strlen(const char s[]); Returns the length of its character string argument, excluding '\0' #include <string.h>... char greeting[] = "Hello"; int len; len = strlen(greeting); // returns 5 (not 6)
strcmp int strcmp(const char s[], const char t[]); Returns negative value if s < t 0 if s == t positive value if s > t
strcmp Example: char name1[30]; char name2[30]; // Initialization of name1 and name2 // not shown if (strcmp(name1, name2)!= 0) { // strings are different }
strstr char *strstr(const char s[], const char t[]); Returns the location of substring t in string s as a character pointer (we'll study pointers later) If substring t isn't found in s, returns the value NULL NULL is defined in several header files If all you need to know is whether or not t is in s, but you don't care where, just check if the function returns NULL
strstr Example: char phrase[] = "quick brown fox jumped"; if (strstr(phrase, "fox") == NULL) { printf("fox is not in the string"); } else { printf("fox is in the string"); } Output is: fox is in the string
strcpy char *strcpy(char s[], const char t[]); ignore char * return type for now Copies all chars in t to s, including '\0' Programmer is responsible for ensuring that s is big enough to hold all chars copied from t
strcat char *strcat(char s[], const char t[]); ignore char * return type for now Concatenates t to end of s, including '\0' Programmer is responsible for ensuring that s is big enough to hold all chars copied from t
A strlen Implementation Loop over the string, counting characters until we reach null int CU_strlen(const char s[]) { int i = 0; while (s[i]!= '\0') { i = i + 1; } return i; }
A strcmp Implementation Loop, comparing the two strings on a characterby-character basis, until we find two characters that differ Calculate the difference of those two characters to determine if 1 string is > or < than the other If while looping we reach the end of both strings before finding chars that differ, the two strings are equal
A strcmp Implementation int CU_strcmp(const char s[], const char t[]) { int i; for (i = 0; s[i] == t[i]; i = i + 1) { if (s[i] == '\0') return 0; } // i is first pos'n where s and t differ return s[i] - t[i]; }
A strcpy Implementation Loop over the source string, copying chracters into the destination string, until we reach the end of the source string Null terminate the destination string
A strcpy Implementation void CU_strcpy(char s[], const char t[]) { int i = 0; while (t[i]!= '\0') { s[i] = t[i]; i = i + 1; } // Terminate s s[i] = '\0'; }
A strcat Implementation Loop over the destination string, until we find null Loop over the source string, copying chracters into the destination string, until we reach the end of the source string 1st character copied from source overwrites null in destination Null terminate the destination string
A strcat Implementation void CU_strcat(char s[], const char t[]) { int i, j; for (i = 0; s[i]!= '\0'; i = i + 1) ; // find end of s // Copy t to s, except for null for (j = 0; t[j]!= '\0'; j = j + 1) { s[i] = t[j]; i = i + 1; } s[i] = '\0'; // Terminate s }