60-141 Lecture 9: File Processing Quazi Rahman 1
Outlines Files Data Hierarchy File Operations Types of File Accessing Files 2
FILES Storage of data in variables, arrays or in any other data structures, is temporary such data is lost when a program terminates. Files are used for permanent retention of data. Computers store files on secondary or permanent storage devices. 3
Primary vs. Secondary Storage Primary: faster, volatile, more expensive, smaller capacity for storage. Used during the execution of a program. Register: Fastest access. Data exists for few executions. Cache: Faster access. Frequently accessed data exists for longer period. RAM: Fast access. Data exists for the period of program execution. Secondary: slower, permanent, cheaper, larger capacity for storage. Used to permanently store data. Hard Disk, CD-ROM, Flash Drives 4
Program I/O Model DISK I/O Bus Data Cache Cache Register CPU Data Bus Program RAM 5
Data Hierarchy All data items processed by a computer are reduced to binary data, just combinations of 0 s and 1 s. A bit (short of binary digit, 0 or 1) is the smallest data item in a computer. It is very hard for humans to read and manipulate bits, so we represent bit sequence in some meaningful ways. One such representation is byte. (8 bits = 1 byte). A byte is a very useful data unit. It is used to represent ASCII character set, like a,b,c A,B,C 1,2,3 etc. 6
Data Hierarchy Simpler Data can be combined together to represent more complex, but more meaningful and easy to handle data structures, Such as: bits bytes (smallest usable data unit) bytes characters / numbers characters/numbers fields (struct members) fields records (structures) records file 7
Files Essentially a file is a sequence of bytes on secondary storage. One can think of a file as an array of bytes. Each file end with a special marker character called end-offile (EOF): EOF <ctrl>d in Unix. EOF <ctrl>z in Windows. DISK (Logical Layout) H e l l o W o r l d! E O F 8
Simplified File Management DISK (Logical Layout) H e l l o W o r l d! E O F Other files may be here. Accessing this space directly will cause errors. File Allocation Table (FAT) File Name File Path O/S uses low level system calls to create, maintain and access files as requested by an application. Date / Time Owner / Permission 9
File Operations OPEN (fopen()) Opens a file and returns a pointer to the beginning of the file (unless otherwise specified). READ (fscanf()) Reads data from a opened file. WRITE (fprintf()) Writes data into a opened file. CLOSE (fclose()) Closes a opened file and flushes the data stream buffers. 10
File and Streams Although the actual data (bytes) stored on the disk may be physically scattered all over the disk cylinders, we can safely assume a sequential and continuous layout of the data in given file. Data retrieval (reading) from a file can be assumed as a stream of bytes coming out from the disk to memory (program). Writing data into a file is just a opposite data stream to the disk. DISK Input Stream (Reading Data) 10011110 00011110 11001010 Program Output Stream (Writing Data) 11
File Types Two formats of files to store data: Text Files The content of the file is represented by ASCII characters. Ex: Any text file we can open with a text editor like, pico, nedit, notepad etc. Binary Files Stores binary values directly (series of bytes) Opening these files with any text editor will result in a meaningless data representation (unreadable symbols). Ex: Database files, pictures, music etc. 12
File Types Two ways to access a file: Sequential-Access Data can be read in one direction from the beginning of the file to the end of the file. Size of a block of data may vary. Random-Access Allow operation to seek or move the read/write head to a particular byte position. Data are stored in same sized blocks. 13
Sequential Access Files Main two operations: Reading from a file. Writing into a file. Steps: Reading from a file Request to open an exiting file. Check if the file was actually opened. Read something from the file. Put the values into some variables. When reading is done Close the file. 14
Sequential Access Files Steps: Writing into a file Request to open an exiting file or to create a new file. Check if the file was actually opened or created. Write something to the file. Write values of variables into the file When writing is done Close the file. 15
Sequential Access Files For both reading or writing we have to establish a connection (data stream) between the program and the file. We need a file pointer (FILE*) to point the file. We have to open the file using the file pointer and the standard library function fopen(). /*prototype of fopen()*/ FILE* fopen(const char* filename, const char* mode); (mode: why open? r = read, w = write, a = append.) Example: FILE* pfile; pfile = fopen( myfile.dat, r ); 16
Writing into a Sequential File #include <stdio.h> int main( ) { int num1 = 5; int num2 = 15; int num3 = 25; FILE *outfileptr; outfileptr = fopen( myfile.dat, w ); if ( outfileptr == NULL ) { printf( ERROR: File could not be opened! ); else { fprintf(outfileptr, %d %d %d\n, num1, num2, num3 ); fclose( outfileptr ); return 0; 17
Writing into a Sequential File #include <stdio.h> int main( ) { int num1 = 5; int num2 = 15; int num3 = 25; Include <stdio.h> to use File I/O functions FILE *outfileptr; outfileptr = fopen( myfile.dat, w ); if ( outfileptr == NULL ) { printf( ERROR: File could not be opened! ); else { fprintf(outfileptr, %d %d %d\n, num1, num2, num3 ); fclose( outfileptr ); return 0; 18
Writing into a Sequential File #include <stdio.h> int main( ) { int num1 = 5; int num2 = 15; int num3 = 25; Declare a pointer to the FILE structure. This will represent the file to program FILE *outfileptr; outfileptr = fopen( myfile.dat, w ); if ( outfileptr == NULL ) { printf( ERROR: File could not be opened! ); else { fprintf(outfileptr, %d %d %d\n, num1, num2, num3 ); fclose( outfileptr ); return 0; 19
Writing into a Sequential File #include <stdio.h> int main( ) { int num1 = 5; int num2 = 15; int num3 = 25; FILE *outfileptr; Open the file for writing. If the file does not exist, a new file will be create in the local path. If the file exists, it will be opened and overwritten. outfileptr = fopen( myfile.dat, w ); if ( outfileptr == NULL ) { printf( ERROR: File could not be opened! ); else { fprintf(outfileptr, %d %d %d\n, num1, num2, num3 ); fclose( outfileptr ); return 0; 20
File Opening Modes r w a Open a file for reading. Open (Create) a file for writing. Contents of an existing file will be overwritten. Open (Create) a file for appending; For an existing file, writing will be done at the end. r+ Open for reading and writing, start at beginning. w+ Open for reading and writing (overwrite the existing file) a+ Open for reading and writing (append if the file exists) 21
Writing into a Sequential File #include <stdio.h> int main( ) { int num1 = 5; int num2 = 15; int num3 = 25; Check if the outfileptr is NULL. If it is, then the file was not opened or created as requested! FILE *outfileptr; outfileptr = fopen( myfile.dat, w ); if ( outfileptr == NULL ) { printf( ERROR: File could not be opened! ); else { fprintf(outfileptr, %d %d %d\n, num1, num2, num3 ); fclose( outfileptr ); return 0; 22
What Can Go Wrong? Trying to open a file for writing when no disk space is available (or your quota is exceeded). Trying to open a file from a path/directory that does not exists. Trying to open a file for which you have no proper permissions. Trying to open a file whose contents are corrupted. Sharing violation, when some other program is writing to the file at the same time! (Exclusive lock imposed to avoid data overwriting). 23
Writing into a Sequential File #include <stdio.h> int main( ) { int num1 = 5; int num2 = 15; int num3 = 25; FILE *outfileptr; outfileptr = fopen( myfile.dat, w ); fprintf() works the same way as printf(), except that it specifies the FILE pointer to which you print (write) the output data. if ( outfileptr == NULL ) { printf( ERROR: File could not be opened! ); else { fprintf(outfileptr, %d %d %d\n, num1, num2, num3 ); fclose( outfileptr ); return 0; 24
Writing into a Sequential File #include <stdio.h> int main( ) { int num1 = 5; int num2 = 15; int num3 = 25; FILE *outfileptr; outfileptr = fopen( myfile.dat, w ); fclose() is important to close the opened file and flush all data buffers and release the file for other programs to use. if ( outfileptr == NULL ) { printf( ERROR: File could not be opened! ); else { fprintf(outfileptr, %d %d %d\n, num1, num2, num3 ); fclose( outfileptr ); return 0; 25
File Contents: myfile.dat DISK (Logical Layout) 1 5 2 4 6 7 5 0 \ n E O F Delimiters There are four fields (four integers). The space is used as the delimiter (separator) for these fields. Other delimiter may also be used, such as, \n, comma, \t. 26
Reading from a Sequential File #include <stdio.h> int main() { int num; FILE *infileptr; infileptr = fopen( myfile.dat, r ); if ( infileptr == NULL ) { printf( ERROR: File could not be opened! ); else{ while (!feof( infileptr )) { fscanf( infileptr, %d, &num ); printf( Number: %d\n, num); fclose( intfileptr ); return 0; 27
Reading from a Sequential File #include <stdio.h> int main() { int num; FILE *infileptr; Declare a file pointer infileptr = fopen( myfile.dat, r ); if ( infileptr == NULL ) { printf( ERROR: File could not be opened! ); else{ while (!feof( infileptr )) { fscanf( infileptr, %d, &num ); printf( Number: %d\n, num); fclose( intfileptr ); return 0; 28
Reading from a Sequential File #include <stdio.h> int main() { int num; FILE *infileptr; Open the file for reading infileptr = fopen( myfile.dat, r ); if ( infileptr == NULL ) { printf( ERROR: File could not be opened! ); else{ while (!feof( infileptr )) { fscanf( infileptr, %d, &num ); printf( Number: %d\n, num); fclose( intfileptr ); return 0; 29
Reading from a Sequential File #include <stdio.h> int main() { int num; FILE *infileptr; infileptr = fopen( myfile.dat, r ); if ( infileptr == NULL ) { Continue reading the file until the end, or there is nothing (more) to read. printf( ERROR: File could not be opened! ); else{ while (!feof( infileptr )) { fscanf( infileptr, %d, &num ); printf( Number: %d\n, num); fclose( intfileptr ); return 0; 30
Reading from a Sequential File #include <stdio.h> int main() { int num; FILE *infileptr; infileptr = fopen( myfile.dat, r ); if ( infileptr == NULL ) { Read an integer from the file and store it into a variable (num) printf( ERROR: File could not be opened! ); else{ while (!feof( infileptr )) { fscanf( infileptr, %d, &num ); printf( Number: %d\n, num); fclose( intfileptr ); return 0; 31
Reading from a Sequential File #include <stdio.h> int main() { int num; FILE *infileptr; infileptr = fopen( myfile.dat, r ); if ( infileptr == NULL ) { Print the value of num in the console printf( ERROR: File could not be opened! ); else{ while (!feof( infileptr )) { fscanf( infileptr, %d, &num ); printf( Number: %d\n, num); fclose( intfileptr ); return 0; 32
Reading from a Sequential File #include <stdio.h> int main() { int num; FILE *infileptr; infileptr = fopen( myfile.dat, r ); if ( infileptr == NULL ) { printf( ERROR: File could not be opened! ); else{ while (!feof( infileptr )) { fscanf( infileptr, %d, &num ); printf( Number: %d\n, num); fclose( intfileptr ); return 0; Close the file 33
Break 34
Accessing Sequential File The standard library provides many functions for opening files, closing files, reading data from files and for writing data to files. Some of them are: fopen() Opens a file in the desired mode of operation. fclose() Closes an opened file and flushes the buffer. fprintf()/fscanf() Similar to printf()/scanf(), only use a FILE pointer as the first parameter to indicate where the data is being streamed to/from. 35
Accessing Sequential File feof() Used to test if the read/write head is currently located at the end-of-file (EOF) marker. Reading passed the EOF is a serious error! fgetc() Reads one character from a file. fputc() Writes one character to a file. fgets() Reads a line from a file. fputs() Writes a line to a file. 36
Sequential vs. Random Access Files As we stated previously, records in a sequential file created with the formatted output function fprintf() are not necessarily the same length. However, individual records of a random-access file are normally fixed in length and may be accessed directly (and thus quickly) without searching through other records. This makes random-access files appropriate for transaction processing systems that require rapid access to specific data. Airline reservation systems, Banking systems, Point-of-sale systems, and many more. 37
Random Access Files In Random-access file processing, we can access a record directly. Random access files are highly structured and well organized. Operations such as fwrite, fread and fseek are used. We need to keep track of specific byte positions, to calculate where a given record is located in the file, or where to write a given record. EX: calculate the byte offset starting from given byte position. Data can be added easily to a random-access file without destroying other data in the file. 38
Random Access Files To facilitate random-access, data is written in fixed-length records. Since every record is the same length, the computer can quickly calculate (as a function of the record key) the exact location of a record in relation to the beginning of the file. Data stored previously in a file with fixedlength records can also be changed and deleted without rewriting the entire file. 39
Fixed-Length Records 0 100 200 300 400 500 Byte offsets 1 2 3 k 100 bytes 100 bytes 100 bytes 100 bytes 100 bytes 100 bytes To find record k: fseek( FilePtr, (k 1)*sizeof(struct rec), SEEK_SET) Which File After how many Bytes Starting Byte 40
Random-Access File I/O Writing to a random access file involves positioning the write head to the beginning offset of the record and write command of an entire record at a specific starting offset. Example to write one record: fseek( FilePtr, (ID 1) * sizeof(student), SEEK_SET); fwrite( &s1, sizeof(student), 1, FilePtr); The structure to be written Byte size of the structure How many structures. To which file. 41
Random-Access File I/O Reading involves positioning the read head to the beginning offset of the record and then read an entire record into a structure. Example to seek and read one record: fseek( FilePtr, (ID 1) * sizeof(student), SEEK_SET); fread( &s2, sizeof(student), 1, FilePtr); 42
Other Variations When storing fixed size records we often waste a lot of unused disk space within the record. If space efficiency is in question, we can use variable size records. That is to use only the space we actually need to store our data. This, however, would complicate the way we organize the file. Finding records can no longer be easily calculated. We have to keep additional index tables to maintain the offset positions of each record in the file. This discussion is beyond the scope of this course and will be discussed in other data structure courses. 43
More on Files Three files and their associated streams are automatically opened when program execution begins. Standard input file (FILE* stdin) Standard output file (FILE* stdout) Standard error file (FILE* stderr) 44
More on Files fseek() Position the read/write head at position specified by an off-set byte from: SEEK_SET beginning of the file SEEK_CUR current location of the head SEEK_END end of thee file. 45
More on Files We can check the return values of fscanf, fseek and fwrite to check if they performed properly fscanf returns number of fields successfully read or EOF if any problem occurred. fseek returns 0 if the operation was successful, otherwise returns a non-zero value. fwrite returns the number of element successfully written. 46
Creating a Sequential Access File Problem 1: Write a simple C program to access (read/write) a sequential file to be used as a clients account file for a bank. Each client has only three data: Account number (an integer) Name (a string) Balance (a double) 47
Problem 1: include<stdio.h> int main(){ int account; char name[30]; double balance; FILE* cfile; cfile = fopen("alldata/clients.dat", "w"); if(cfile == NULL) printf("file could not be opened!\n"); else { printf("enter account, name and balance.\n"); printf("enter '0' to end input.\n? "); scanf("%d", &account); while( account!= 0 ){ scanf("%s%lf", name, &balance); fprintf(cfile,"%d %s %.2f\n",account,name,balance); printf("? "); scanf("%d", &account); fclose(cfile); 48
Problem 1: printf("data entry done...\n"); printf("read the file? [y/n]: "); char ch; fflush(stdin); scanf("%c", &ch); if(ch == 'y') { cfile = fopen("alldata/clients.dat", "r"); if(cfile == NULL) printf("file could not be opened!\n"); else { fscanf(cfile, "%d%s%lf", &account, name, &balance); while(!feof(cfile) ) { printf("%d %s %.2f\n", account, name, balance); fscanf(cfile,"%d%s%lf",&account,name,&balance); fclose(cfile); return 0; 49
Reading Sequential File Problem 2: Write a simple C program to read input from a sequential file. 50
Example 2 #include<stdio.h> int main(){ int account; char name[50]; double balance; scanf("%d%s%lf", &account, name, &balance); while(!feof(stdin) ){ printf("%d %s %.2f\n", account, name, balance); scanf("%d%s%lf", &account, name, &balance); return 0; 51
Creating a Random Access File Problem 3: Write a simple C program to access (read/write) a Random Access file to be used as a clients account file for a bank. The bank has only 20 clients Each client record has only three fields: Account number (an integer) Name (a string) Balance (a double) 52
Creating a Random Access File #include<stdio.h> #define NUM 20 typedef struct { int account; char name[30]; double balance; Client; void main() { int i; Client blankclient = {0,, 0.0; FILE* fptr; if((fptr = fopen(rclient.dat, wb )) == NULL) printf( File could not be opened.\n ); else { for(i = 0; i < NUM; i++) fwrite(&blankclient, sizeof(client), 1, fptr); fclose(fptr); 53
Accessing a Random Access File #include<stdio.h> #define NUM 20 typedef struct { int account; char name[30]; double balance; Client; void main() { int i; Client client = {0,, 0.0; FILE* fptr; if((fptr = fopen(rclient.dat, rb+ )) == NULL) printf( File could not be opened.\n ); else { //see next slide 54
Accessing a Random Access File else { printf( Enter account number (1 to 20, 0 to end)\n? ); scanf( %d, &client.account); while(client.account!= 0) { printf( Enter name and balance\n? ); scanf( %s%lf, &client.name, &client.balance); fseek(fptr, (client.account 1)*sizeof(Client), SEEK_SET); fwrite(&client, sizeof(client), 1, fptr); printf( Enter account number\n? ); scanf( %d, &client.account); fclose(fptr); 55
Reading a Random Access File #include<stdio.h> #define NUM 20 typedef struct { int account; char name[30]; double balance; Client; void main() { int i; Client client = {0,, 0.0; FILE* fptr; if((fptr = fopen(rclient.dat, rb )) == NULL) printf( File could not be opened.\n ); else { //see next slide 56
Reading a Random Access File else { printf("%-10s%-10s%10s\n", "Account", "Name", "Balance"); rewind(fptr); while(!feof(fptr)) { fread(&client, sizeof(client), 1, fptr); if(client.account!= 0) { fclose(fptr); printf("%-10d%-10s%10.2f\n", client.account, client.name, client.balance); 57
Access Random Access File #include<stdio.h> void main() { int i; FILE* fptr; Client clientarray[3]; for(i = 0; i < 3; i++) { clientarray[i].account = i + 1; printf("enter name and balance\n? "); scanf("%s%lf", &clientarray[i].name, &clientarray[i].balance); if((fptr = fopen("rclientarray.dat", "wb+")) == NULL) printf("file could not be opened.\n"); else { fwrite(clientarray, sizeof(client), 3, fptr); 58
Next topics: Dynamic Data Structures Read chapter 12 59