SQL for Palm Zhiye LIU MSc in Information Systems 2002/2003

Size: px

Start display at page:

Download "SQL for Palm Zhiye LIU MSc in Information Systems 2002/2003"

Noreen Gwen Briggs
5 years ago
Views:

1 Zhiye LIU MSc in Information Systems 2002/2003 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student)

2 Summary Most of the Palm applications need to use databases. Although there are some commercial and non-commercial DBMS for Palm, these systems are focusing on the end users but not the programmers. It is necessary to have a system to help the programmers manage the databases in Palm. The objective of this project is to design and implement a C library for using SQL on Palm OS. The project starts with the background research of the Palm database structure. Palm database is quite different from relational database. It does not have a concept of table. And then, the system design and implementation is started after the review of related system SQL Anywhere. The system implements the SQL fragment with CLI. The users include the library in their C programs and make the function call. And the SQL fragment will be passed as a parameter. Then the SQL fragment will be parsed. And the database operations will be carried out according to the result of syntax parsing. Finally, the result from the database operations will be stored in the result set and returned to the users. The result of the evaluation of the system suggests that the system has its limitation of the usability and functionality. Further improvements need to be made before it is developed into a valuable system in practice. I

3 Acknowledgments I would like to thank my supervisor, Mr. Peter Mott for his help, guidance and advice throughout the project. Also, I want to thank Dr. Julika Matravers for the useful advice on interim report. II

4 Content Pages SUMMARY... I ACKNOWLEDGMENTS...II CONTENT PAGES... III 1 INTRODUCTION BACKGROUND AND MOTIVATION OF THE PROJECT ORIGINAL AND REVISED OBJECTIVES Original Objectives Revised Objectives MAIN CHALLENGES MSC PROJECT SCHEDULE OVERVIEW OF PROJECT PLAN REVIEW OF CURRENT SYSTEM SQL ANYWHERE Always Available Adaptability DISCUSSION SYSTEM DESIGN AND IMPLEMENTATION SYSTEM ARCHITECTURE Design Decision Late Changes 31/07/ EMBEDDED SQL VS CLI Embedded SQL CLI Comparison of the Two Standards SQL GRAMMAR DESIGN AND IMPLEMENTATION SQL Grammar Design SQL Grammar Implementation PARSER DESIGN AND IMPLEMENTATION Parser Design Procedure of Parsing and Executing the SQL Query DATA STRUCTURE DESIGN AND IMPLEMENTATION Data Type Database Content Storing the Data Retrieving the Data Result Set Design and Implementation TESTING TESTING ENVIRONMENT AND METHOD...28 III

5 4.2 TESTING PROCEDURE EVALUATION EVALUATION CRITERIA RESULT AND ANALYSIS LIMITATION AND FUTURE WORKS SQL GRAMMAR DATABASE OPERATION RESULT SET DESIGN OTHER LIMITATIONS AND FUTURE WORKS CONCLUSION...38 REFERENCES...39 APPENDIX A. PERSONAL REFLECTION...41 APPENDIX B. OBJECTIVES AND DELIVERABLE FROM...43 APPENDIX C. MARKING SCHEME AND INTERIM REPORT HEADER SHEET IV

6 1 Introduction 1.1 Background and Motivation of the Project In recent years, mobile computing has been growing faster and faster. Just as being discussed in [1], although it is still in its infancy, businesses and consumers alike, in countries around the world, are looking for new ways to integrate mobile products into their everyday work and lives. In the meanwhile, the development of Palm OS is enabling people to find their own solutions in Palm platform. And, because of its openness and simplicity, thousands of programmers are developing applications on Palm OS. [1] Nearly all Palm OS applications use a database. As all known, there are many commercial and non-commercial database management systems available on PC, such as DB2, SQL Server, and Oracle etc. But the situation is quite different in Palm OS. There is no such database management system in Palm OS. Palm OS has its own data structure. And the way it manages the database is quite different from the relational database. The aim of this project is to help the programmers with their database programming in Palm OS. First, the data structure in Palm database is different. In Palm, there are two kinds of database: record database and resource database. Record database files end up with.pdb and resource database files end up with.prc. What I need to use for my project here is the record database. Introducing in [10], record database contains the components (Figure 1.1): A database header that is used to store the database description. And it also stores fields refer to the information blocks and raw record data in the database. A list of record entries, each of which describes a block of raw record. The application information block and the sort information block. They are both optional. The raw record data. Such data is stored in linear format and referenced from the record list in the database header. 1

7 Figure 1.1: PDB database format [10] In the database header, there are database name, type, creator, and record list etc. And in the list of record entries, there are local ID, attributes, and unique ID. For example, we have a record database student created by PALM. The header then will store the database name student, type record database, creator PALM etc. And if there are three records in this database, there will be three record entries. Each of these entries will store the local ID, which is the local offset from the top of the PDB to the start of the raw record data for this entry. And also they will store the number of attributes and the unique IDs of each record. [10] Second, quite different from relational database, Palm database does not have a concept of tables. The users have full control of the structure of each record of a Palm database. In other words, each record can have a different structure, or all records can have the same. Users who use to work with relational database will find it hard to getting start with the Palm database. So, it is necessary to have some database manipulation application to help the users to manage the database in Palm OS. There are some commercial applications available now to deal with the problem. But they are all end-user oriented. The programmers working on Palm OS still have to learn and adapt themselves to the brand-new database-programming environment. Is it possible that the programmers who do not know about the detail of the Palm database program with standard SQL on Palm OS? 2

8 1.2 Original and Revised Objectives Original Objectives The overall objective is to design and implement a simple SQL preprocessor for use on the Palm OS. And the minimum requirement is as the objectives, but a very basic system will meet minimum requirements. Again incomplete implementation can be permitted for the minimum Revised Objectives This project aims to help the programmers to use SQL in Palm OS. It is not necessary to use the parser to parse the whole C code from the programmer. What this system is to process is only the SQL fragment. So the objectives are changed. The overall objective of this project is to design and implement a C library for using SQL on Palm OS. And the minimum requirement is to provide a design of a C-language library for a non-trivial fragment of SQL for use on the Palm OS. Fully describe how relational tables and fields will be represented within the Palm OS. 1.3 Main Challenges There are three main challenges in this project. The first one is how to process the C code to take out the SQL fragment and parse it. The second one is how to change the SQL query to the database operation in Palm OS. And the third one is the data structure in Palm. All of these challenges are relating closely to the success of this project. 3

9 1.4 MSc Project Schedule To make sure this project process smoothly and finish in time, a project plan was set up at the beginning of this project. The table (Table 1.1) below shows the belief schedule of the project. Table 1.1: MSc Project Schedule Date Milestone 20/03/03 First meeting, briefly introduce of the project. 27/03/03 Agree on the objectives of the project, set up different steps of completing the project. 03/04/03 Complete the embedded SQL fragment. 10/04/03 Complete the CLI SQL fragment; compare it with the embedded one. 17/04/03 Understand the database operation on Palm OS. 01/05/03 Finish draft edition of interim report, discuss about it. 08/05/03 Finish final edition of interim report. 05/05/03 Complete the experiment on database operation on Palm OS. 13/06/03 Understand how to build a compiler, what compiler technology is needed for building the pre-processor. 16/06/03 Referring to ADO model, decide the functions, arguments and the return values. 20/06/03 Design the architecture of the pre-processor. 04/07/03 Finish designing the pre-processor. 01/08/03 Try to implement the design. 15/08/03 Finish the initial project report. Continue working on the implementation. 06/09/03 Finish the final project report. The project log is available on 4

10 1.5 Overview of Project Plan At the beginning of this project, background reading and research of related system were done to make the problem clear. And then, further research was carried out to help the design of the system. This research is about the architecture of related system, the techniques used to design the SQL fragment, the design of SQL grammar, and also the design of data structure. After that, the system architecture was set up according to the system requirement. And the SQL fragment and grammar were designed. After finished designing the syntax parser and the data structure, the implementation of the system started. Some of the designs were changed in the procedure of implementation. The whole system was then implemented after the coding finished. And the testing was carried out. Most of the testing was to debug the code and test the reliability of the system. Finally, evaluation about the functionality and usability of the system was made to inspect the weakness and limitation of the system. And suggestion about the future work was made in the end. 5

11 2 Review of Current System 2.1 SQL Anywhere One of the best approaches to provide database management in Palm OS is SQL Anywhere by Sybase Inc. It is a comprehensive package that provides data management and enterprise data synchronization, introduced by [2] from Sybase. Actually, it is not focusing the same object as this project. It is a whole package providing the server and client solution. The server side of it is running on PC. And the client side of it can be run on either PC or handheld device. The application runs on Palm OS is just a part of the whole package. However, it is industry-leading software and carries some great ideas that were referred to in this project Always Available The basic idea of SQL Anywhere is mentioned in [2]. It is to enable people access to the data and corporate applications from anywhere at any time they want. And it does not care about the connection or application type. Also, it provides offline operation and data synchronization Adaptability SQL Anywhere supports most standard database access, such as ADO.NET, ODBC etc. And it supports most of the development tools. Users can develop applications on it using their familiar techniques and tools. 2.2 Discussion SQL Anywhere is designed with a totally different idea from this project. It is not used to manage the databases in Palm memory, but to manage remote databases. Which approach is better? Store the databases in remote server or store the databases in Palm memory. Both these approaches have their advantages and disadvantages. Obviously, store the databases in remote server will save a lot of memory space in Palm. In fact, most of the Palm devices do not have big memory. So, storing the databases in the remote server 6

12 and just running the application on Palm to manage the database is saving the memory space. This approach is feasible with the development of mobile network. And, what is more, it would let the users to use the same database from anywhere. Users do not have to export data from one database to another database if they want to carry the data they are working with. The advantage of storing the databases in Palm memory is quite straightforward. It is simple and convenience. The users do not have to worry about the network quality. They even do not have to worry there is network connection or not. They just work with the database in the memory and save the data whenever they want. And with the trend that the memory of Palm is getting bigger and bigger nowadays, the memory problem will not exist in the near future. Of course, the latter approach is leaving the programmers to handle the difficulty of dealing with the data structure in Palm database. To help the programmers with the database programming on Palm is the most important motive of this project. 7

13 3 System Design and Implementation 3.1 System Architecture Design Decision At the beginning, decision was made to design and implement a C code pre-processor (Figure 3.1). That is because the objective of this project is to process the C code written by the programmers. These C programs will have some lines of code, which is the SQL fragment. So it needs a pre-processor to parse the original C code to take out the SQL fragment. Also, one of the reasons to build a pre-processor is that the SQL fragment was first designed to build with embedded SQL, which needs several lines for a SQL query. The pre-processor will parse the original C source code written by the programmers to find out the SQL fragment. And then this SQL fragment will be parsed by the SQL syntax parser. The table name, attributes names and search condition will be stored after the parsing job. After that, the replacer will replace the SQL fragment with Palm database operation functions. These functions are created according to the table name, attributes names and search condition, and the symbol table. Finally, the new C code will be generated. 8

Figure 3.1: Old system architecture design 3.1.2 Late Changes 31/07/03 As suggested by Mr.

14 Figure 3.1: Old system architecture design Late Changes 31/07/03 As suggested by Mr. Peter Mott, the supervisor of this project, the pre-processor is not necessary to meet the objective of this project; a C library would be enough. It was decided to design the SQL fragment with CLI but not embedded SQL. Therefore the SQL fragment will not present in the C code as multiple lines a fragment. But the SQL fragment will become just one line of code a sentence of SQL query. This change made it possible to have the SQL fragment as a parameter of a C function call. And there is not need to parse the whole C source code anymore. The basic objective is concerning about how the SQL fragment will be processed. So, to parse the whole C code to find out the SQL fragment does not relate much to this project. And the most important part, the SQL syntax parser, remains the same in both designs. It was then decided that the design should be changed to C library (Figure 3.2). The programmers include the C library in their program and make the function call. And then the SQL query will be parsed first to get the information such as table name, search condition etc. After that, there will be the database operation and the data will be stored in the result set. Finally, the result set will be 9

15 returned to the users. Figure 3.2: New system architecture design The flow of the whole procedure of making the function call (Figure 3.3) is showing below: 10

Figure 3.3: Overview of the function call 3.2 Embedded SQL VS CLI There are two standards to design the SQL fragment, embedded SQL and CLI. They are both proved to be successful in practice.

16 Figure 3.3: Overview of the function call 3.2 Embedded SQL VS CLI There are two standards to design the SQL fragment, embedded SQL and CLI. They are both proved to be successful in practice. In this project, both of these standards have their advantages and disadvantages Embedded SQL Advantages: Embedded SQL is an old standard. That means it is quite static. There are many successful examples using Embedded SQL in C. It is always easier to follow the successful way of others. This method allows the programmer to write plain SQL statements right into the source code of the client program. [12, 15] Disadvantages: To distinguish the embedded SQL from the programming language, all the SQL statements 11

17 must be prefixed with a special command EXEC SQL. To process the SQL statements, there must be a pre-processor to read all the lines that start with EXEC SQL. Only when all the lines are read, the pre-processor can know the whole SQL query. Apparently, it is more complex comparing with CLI CLI Advantages: The SQL statements are treated as string. To use CLI, programmers just need to use a function call and supply the string of SQL statements. That means the operation for programmers will be really easy. Programmers can do multiple row operations using one single function call. [15] Disadvantages: How to take out the SQL statements from the function call is the main problem. Suppose that the programmer store the string of SQL statements in a variable and use the function call with the variable. How to recognize the variable and take out the value of it is another problem. CLI is quite new. That means there are little examples that we can refer to. If there are problems encountered when implementing it, finding the solution may not be that easy Comparison of the Two Standards By comparing these two standards, a conclusion can be drawn here. Embedded SQL is an old standard and is complex to implement. CLI is a fashionable way of doing the database connection and operation. It is very obvious that CLI is a much simpler and easier way to implement. But considering CLI is very new, embedded SQL may be a safer way to implement. Both sides have its advantages and disadvantages. At the beginning, the design of the system was a pre-processor. And it was decided that embedded SQL was easier to implement. The pre-processor that is going to be built has to take out the SQL fragment from C source code. How easy for it to find and retrieve the SQL fragment is one of the most important problems. And embedded SQL, a way so straightforward that every SQL 12

18 sentences are following the EXEC SQL, seems much easier to achieve this goal. Looking at CLI, it stores all the SQL sentences in variables. So the pre-processor may not be that easy to find out the SQL fragment. It was hard to decide which way to go at the beginning stage. So experiments were made to try both ways at first. More and more experiments implementing the SQL fragment was done, clearer and clearer the picture was. CLI was found to be easier to implement. It is very obvious that using CLI, only one function call will satisfy the requirement of executing the SQL statements. But embedded SQL must have a few lines of code (Figure 3.4) to include the whole SQL statement, as discussed in [11]. Sure enough, CLI do not need a pre-processor to parse the whole C code to find out the SQL fragment. But embedded SQL needs a pre-processor. EXEC SQL SELECT * EXEC SQL FROM emp EXEC SQL WHERE age > 30 AND name = John Figure 3.4: Example Embedded SQL code And the problem with variables, which was hard to implement before, was solved referring to the programming guide in [13, 14]. There is an example here to explain it. Since the design has been changed to a C library, the way to execute the SQL query is also changed to function call. And if the programmers make the function call, their code will be like: ExecuteSQL( select * from emp ); The SQL fragment select * from emp will be passed as a parameter. Or the code will be like: ExecuteSQL(expr); 13

19 That expr is a variable storing the SQL fragment select * from emp. The function call will pass the value of this variable. It is just the same as using the string of the SQL fragment directly. So, decision was made at this stage that CLI was easier and more suitable than embedded SQL in this project. 3.3 SQL Grammar Design and Implementation The most important part of this system is the SQL syntax parser. The syntax parser will parse the SQL fragment and take out the important information such as table name, attributes names and search condition. And then the database operation can carry on with this information. So there must be a standard of the SQL grammar SQL Grammar Design Here is the SQL grammar designed for this system. The first design of SQL grammar is just trying to prove the concept of the design. It is not a complete SQL grammar. It includes two parts: statement and elements. Its statements are designed mainly referring to [4]. And also [3, 5, 6, 7, 18, 19] are important references to the design. Statement: statement ::= select-statement All SQL fragments are statements. Statement will be only select-statement in the first design. The SELECT statement is the most useful one, which is to select the desire attributes from a table. select-statement ::= SELECT select-list FROM table-name [WHERE search-condition] The SELECT statement starts with a SELECT keyword, which corresponds to the projection operation in relational algebra. Follow by it is the select-list, which is the list of desired attributes from the table. And then comes a FROM keyword. The table-name comes after it is the table to be scanned for the attributes. In the end, there is a WHERE clause. It is an optional choice for users. 14

20 Users can have their search-condition comes after the WHERE keyword. The search condition consists of a predicate involving attributes of the table that appears in the FROM clause. This predicate will return a Boolean value TRUE, FALSE (or UNKNOWN). And one search condition may be consisted of another search condition. In this implementation, to make it simple, there will be allowed only one search condition and no more than two predicates in the WHERE clause. Here the SELECT syntax to be processed by the parser is slightly different from the standard SQL. It is simpler. First of all, it allows duplicates. There are no all / distinct keywords in this grammar. Duplicates will not be removed. Secondly, there are no order by / group by keywords. The parser will only finish the basic database operation. To make the program simpler, these functions are not provided. And, what is most important, only single table operation is provided. Since there is no concept of table in Palm database, multiple table operation in Palm database would be a little bit more complex than in relational database system. So it was decided to provide only single table operation for the basic system. Besides these, all the syntax is the same as standard SQL. To implement the SELECT operation, the whole SELECT sentence must be put in ExecuteSQL(). For example, if an operation below is to be done: select name, age from emp where ID = 1000 There will be an expression: ExecuteSQL( select name, age from emp where ID = 1000 ) The function call ExecuteSQL() will pass the SQL fragment as a parameter. The attributes names, table name and the search condition will be parsed and stored for further processing. Element: The elements used in the grammar definition above will be explained here: select-list ::= * select-sublist [, select-sublist] select-list is the aggregation of select-sublist. 15

21 select-sublist ::= column-name select-sublist is a column name of the table. table-name ::= user-defined-name table-name is a table name that defined by the user. search-condition ::= boolean-term [OR search-condition] search-condition is a Boolean term or it could be a Boolean term with an OR operation with another search-condition. The search condition will first be implemented with the fixed format column-name + operator + constant. Other formats will not be implemented in the basic system. boolean-term ::= boolean-factor [AND boolean-term] Boolean-term is a Boolean factor or it could be a Boolean factor with an AND operation with another boolean-factor. boolean-factor ::= [NOT] boolean-primary Boolean-factor is a boolean-primary with or without the NOT operator. boolean-primary ::= comparison-predicate ( search-condition ) Boolean-primary is either a comparison-predicate or a search-condition. comparison-predicate ::= expression comparison-operator expression Comparison-predicate is an expression and a comparison-operator and an expression. comparison-operator ::= < > <= >= = <> Comparison-operator is one of <, >, <=, >=, = or <>. expression ::= column-name primary Expression is either a column-name or a primary. 16

22 primary ::= digit literal ( expression ) Primary is digit, literal or expression. column-name ::= [table-name.]column-identifier Column-name is a column-identifier with or without table-name as a prefix. column-identifier ::= user-defined-name Column-identifier is a column name defined by the user. literal ::= ''{character}'' Literal is any character in the character set. digit ::= Digit is any digital number SQL Grammar Implementation Above is just the part of the design that has been implemented in the basic system. Considering the system to be built first will be a very simple one, not all of the standard SQL grammar will be implemented. Once the basic function works in the system, how to implement the others is just a matter of time. So only the most important operation will be implemented first. Once this operation works in the system, the implementation of other operations will be quite straightforward. How to retrieve the data from a database is always the critical problem in database operation. Therefore, SELECT operation is the one that will be considered to implement first. The select operation is also the most complex one. So the implementation of it will of course be a good example for implementing others. The grammar standard has been set up. But how does the parser work? How the SQL query to be 17

23 executed? The procedure of performing the SQL query is also important in this System. 3.4 Parser Design and Implementation Parser Design The parser used to parse the SQL syntax is a recursive-decent parser. And the grammar used by it is an LL(1) grammar the SQL grammar discussed above. LL(1) is the grammar has a predictive parsing table containing no duplicate entries. This stands for Left-to-right parse, Leftmost-derivation, 1-symbol lookahead. [20] The parser will examines the SQL fragment from left to right in one pass. And the parser expends non-terminals into right-hand sides following the order of a leftmost derivation. That is, it will call functions corresponding to non-terminals. Also the parser will just read in one token of the input each time, never look at more than one token ahead. [20, 21] Procedure of Parsing and Executing the SQL Query The select query is the most usual query and it is the one to be implemented first. The procedure of performing the select query will be an example of introducing how the SQL query to be parsed and executed. There are several important steps to process the SQL query. First of all, users call the predefined C function and pass the SQL query as a parameter. And then the syntax parser will parse the SQL fragment to get the syntax of it (Figure 3.5). 18

24 Figure 3.5: SQL syntax parse How the syntax parser works is the most critical problem here. To make it more understandable, there is a simple example here. The syntax parser is going to parse the SQL query SELECT name, age FROM emp WHERE age>25. It has to go through the following steps to finish the job. [8] First of all, from the query, the scanner can retrieve the name of the column and the name of the table. To retrieve the name of the column and the name of the table, the scanner will scan through the query. It will target the SELECT, FROM and WHERE key words. And the name list of the column is between SELECT and FROM. In this example, there are only two column names here, which is name and age. And then the table name is coming right after the FROM and before the WHERE. It is emp here. Finally, there is the search condition age>25 coming after WHERE. And then, these data will be passed to the parser. The parser will express it in relational algebra. For example, the relational algebra for this example will be project(select(emp, age>25), name, age). Secondly, the relational algebra is parsed to relations. Each operator in the relational algebra is going to become one or two relations and returns one result relation. Looking at the example here, relation R1 is defined for operator select and relation R2 is defined for operator project. So there will be: 19

25 R1 = select(emp, age>25); R2 = project(r1, name, age); emp, R1 and R2 are pointers to result relations. And then the parser can carry on further operation on these relations. After outputting the relations, the parser will implement these relations in C. All those expressions above are only for designing. But not the format for real C code. That is to say, they still have to be implemented in C. To implement them in C, format similar to three argument form will be used. This means the parameters of select and project will be replaced. Pointers will be used again. For example, the search condition age>25 will be represented as a pointer condition, which points to a structure of elements (operator, field, constant) that has the value (>, age, 25). And the list of column names will be represented as a pointer attributes, which points to an array {attribute1, attribute2, } that has the value {name, age} here. So the relations will become: R1 = select(emp, condition); R2 = project(r1, attributes); And then they can be implemented in C. The select function will pass the relation emp and structure condition to the operation code. And each tuple of the relation will be taken out and tested. Only the tuples satisfy the search condition will be returned as the result relation R1. And the project function will pass the relation R1 and array attributes. The desired columns will be projected to a new table and returned as the result relation R2. R2 here is actually the result of the query. After these three steps, the execution of the SQL query is finished. And the result is ready for returning to the users. 20

26 3.5 Data Structure Design and Implementation This system is to enable the users to work on the Palm database. And the desired design is that the users should feel that they are working on relational database. So, all operations on the Palm database must base on predefined data structures. The design includes the data structures of the data storing in the database and the data in the result set Data Type Before designing the data structure, the decision must be made is the data type. It was decided that the data storing the database would have only one data type: string Database Content And then the data structure of the database is the most critical problem. Palm database is not like relational database. It does not have a concept of tables. The structure of each record of a Palm database is controlled by the user. In other words, each record can have a different structure, or all records can have the same. So, to distinguish each column in a table, the metadata of the table must be stored as well. In implementation, there are two specific databases to store the metadata of a database. The data stored in one specific database is all the information of the tables in the object database and the information of the other specific database. This information includes the name of each table, number of records, the starting index of the table, the number of columns, the starting index of the column names in the other specific database. For example, there is a table named student which has 5 columns, 10 records and the records start from index 4, the column names start from index 6. The metadata of it would be (student, 10, 4, 5, 6). Each record in the Palm database has an index number, which represents the position of it in the database. And all the tables in a database are storing together. So the starting index and number of records here enable users to calculate the position of each table in the database. And the other specific database is used to store the names of columns of each table Storing the Data As mentioned above, the data and the content of a database will be stored separately. And all the data in a database, although this data belongs to different tables, is storing together. Data belonging to the same table is storing together. For example, there is a table emp, which has 5 21

27 records and the first record of it has the index 3. So that in this database, from index 3 to index 7 is the data of table emp Retrieving the Data The data structure design ensures the data can be stored in the database. And then the problem is how to retrieve this data. How to select and project it into the desired result? The whole procedure of retrieving the data is mainly opening three databases one by one and retrieving the data from them. First of all, the two content databases will be opened according to the name of the object database. If a user want to perform the SQL query select name, age from student where level = 1 on the database school, the databases which stores the content of the database school will be opened first. The metadata of table student will be retrieved, for example: The Palm database function DmNumRecordsInCategory() will be used to get maximum index of the content database. And then the database will be gone through to read the records one by one to check if it stores the content. [9, 17] if (findreocrd->tablename = TName) { numberofrecords = findrecord->numberofrecords; index = findrecords->index; } Here the variable findrecord is used to store the record. If the tablename of it has the same value as TName, it stores the desired table content. Then the number of records in this table and the index of it will be got out and stored. For example, the metadata is (student, 100, 40, 5, 30) here. Also, the column names will be retrieved from the other content database. And then the database school will be opened. All the records from index 40 to 139 will be got out of the database. After that, a check of these records will be performed to find out the ones that 22

28 have the field level equaling 1. And all records satisfy this condition will have their fields name and age stored in the result set. The result set will be returned to the user as the result of the function call. The procedures of opening the three databases are almost the same. The procedure of retrieving the data from the object database (Figure 3.6) is the most important one. The operations on the other two databases are just preparing for this step. Figure 3.6: Retrieving data from object database After selecting and projecting the records from the database, the result must be stored and returned to the users. The result will be presented to the users as a result set Result Set Design and Implementation There are two main problems of result set. The first one and the most critical one is that the number of records and the number of fields in result set will be dynamic. The reason is very obvious. Each select query will return a different result. Either the number of records or the number of fields will change each time. 23

29 The second problem is the structure of result set must be simple enough for the users to use in their programs. And there must be an easy-to-use function to access data in the result set. To tackle these two problems, a link list is designed for the structure of the result set. A result set is a list of tuples. A tuple is a list of strings. An example result set could be: ( (Mike, Jones, year 1), (Jean, Stone, year 3), (Tom, Johnson, year 1) ) And the link list is actually a structure in C. Here are the three structures of fields, tuples and result set: typedef struct { char *data; struct Cell *next; } Cell; The structure Cell is the basic structure. It has two members. The data is a pointer to the string of the data. The next is a pointer to the next Cell in the list. Many Cells linked together make a tuple. Diagrammatically a tuple represented as a list below (Figure 3.7): Figure 3.7: The list of a tuple typedef srtuct { struct Cell *data; struct Tuple *next; } Tuple; The structure of tuple is similar to the structure Cell. The first member of it is a pointer to the 24

30 data in this tuple. And the second member of it is a pointer to the next tuple in the list. Many tuples linked together make a result set. typedef struct { int records; struct Tuple *last; struct Tuple *next; struct Tuple *first; } Result; The structure Result is the structure of the result set. It has four members. The records is the number of tuples in the result set. The last is the pointer to the last tuple in the result set. The next is the pointer to the start of the next tuple to be retrieved. The next pointer will move from the first tuple to the last tuple in the result set when the users are going through the whole result set. And the first is the pointer to the first tuple in the result set. To represent a complete result set, a list of list is used (Figure 3.8): Tuple 1 Tuple 2 Tuple 3 Figure 3.8: An example result set The result of the SQL query (just for SELECT query) will be stored in the structures above and returned to the users. Here is an example code using the result set: Result *result; 25

$result = SQL_select( DBName, select forename, surname from student where marks > 60 ); for (i = 0; i < result->records; i++) { printf( Forename: %s, next(result, 0)); } The result here is the result$

31 result = SQL_select( DBName, select forename, surname from student where marks > 60 ); for (i = 0; i < result->records; i++) { printf( Forename: %s, next(result, 0)); } The result here is the result set, which stores the forenames and surnames of the students whose marks are higher than 60. And then all the forenames of these students will be printed out. The next() here is the function defined for the users to use the result set. The first parameter of it is the pointer to the result set. The second parameter of it is the index of the column that is going to be read. The users pass the pointer to the result set and the index to the function, and then the function will return the string of the desired field. How the next() function works is showing below (Figure 3.9): Figure 3.9: Procedure of reading the result set 26

32 After returning the result set to the users and providing the function next() to access data in the result set, the system ends its task. 27

33 4 Testing After the developing of software system is finished. There are always some unavoidable problems in front of the programmer: is it this system workable, is it bug-free? So, implementation is always followed by testing in software life cycle. 4.1 Testing Environment and Method The compiling and testing environment being used is CodeWarrior Development Studio for Palm OS Version 8.0 Evaluation Editions by Metrowerks [16]. It is one of the most famous development tools and environments for Palm OS. The C program in this project will be debugged and compiled in this environment. Since this system is implemented as a C library, it is not like usual executable programs. It cannot be compiled and run by itself. To test this system, testing programs must be used. The C library will be included in these testing programs. These programs will be compiled first. And then they will run on Palm OS. The testing of the C library is base on these programs. 4.2 Testing Procedure The process of testing can be divided into several parts according to the different modules of the system. First of all, the syntax parser itself can be taken out and built into an independent program. So it can be tested as a module of the system first. And the part of the database operation will be executed after the syntax parsing is finished. It will be tested when the testing of the parser is finished. The data structure of the result set will be used in the final step of the database operation. And then the function to access the result set is useful. So the testing on the result set will be carried out in the end. The testing of the syntax parser is simple. The code of the parser is taken out first. And then the code of simple input and output is added to make it a workable program. After that, files containing SQL queries are used as the input to the program. And the output of the program 28

34 should be the table name, attributes names and search condition. If the parser works fine with no bugs, it will produce correct outputs for different inputs. For example, the input select name, department from employee where salary = 1000 will have the output table name = employee, attribute[1] = name, attribute[2] = department, field = salary and constant = After testing the syntax parser, the testing of the database operation will start. The aim of this is to prove the database operation can work with the results from the parser. That is, the operation will be carried out and correct data will be retrieved from the database. So, the input of this module is the output of the syntax parser, which includes the table name, attributes names and search condition. And the testing is base on a simple object database and two content databases. The SQL query will be changed to test different operation on database. The output of this module is the desired tuples from the database. And then, the result set will be tested. The main testing in this part is to test two functions. One is used to write data into the result set, the other is used to access data in the result set. Also, this part will be tested separately. The first function will write some simple data into the result set. And then the second function will be called to access the data and output the result. The structure of the result set and the functions will be tested. Desired output result should be the same as the input data. Finally, all of the modules will be integrated and tested as a whole system. Although the modules passed the testing above should work fine separately, there may still be unexpectable problems when they are integrated and working together. So this step is necessary. This part of testing will base on simple databases and the SQL query on the tables in these databases. The result from the database operation will be stored in the result set. And the result set will be accessed and the data will be displayed in the end. The final output will prove whether the system is in a working order or not. 29

35 5 Evaluation Evaluation is always one of the most crucial stages in the life cycle of software development. It is also so important in this project, because that it is the stage to check if the requirement of the system has been met. 5.1 Evaluation Criteria The overall objective of this project is to design and implement a C library for using SQL on Palm OS. The system is not completely finished according to it, although the minimum requirement is met. The minimum requirement is to provide a design of a C-language library for a non-trivial fragment of SQL for use on the Palm OS. Fully describe how relational tables and fields will be represented within the Palm OS. The design of the whole system is finished, but not all of these designs are implemented. And with the data structure described in section 3.5, the relational tables and fields are represented in Palm OS. However, the system is not completely implemented. It only implements parts of the SQL language. The SELECT statement is the one that has been implemented mostly. And other statements are designed but not implemented because of the restriction of time. However the evaluation can be carried out focusing on the three main challenges mentioned in section 1. These three challenges are about parsing the SQL fragment, Palm database operation and data structure. 5.2 Result and Analysis Parsing the SQL fragment. Since the design of the system has been changed to C library. The SQL fragment will be passed as a parameter. There is not need to parse the whole C source code to find out the SQL fragment. So the job of the syntax parser is only to parse the SQL fragment. Although the function of the parser is very limited in this system, the parser can parse the SQL fragment fast and correct. And its limitation is that it cannot be used to parse all SQL fragments. Since the system only implements the design of SELECT 30

36 statement, the parser can and only parse the SELECT query. Also, there is only one table selection providing in the system. Table joining has not been implemented yet. Palm database operation. Since the example databases being used have not great quantity of data, the system seems to perform well when doing the SELECT query. But the search method used in the SELECT query is just a sequential search. That is, each record in the database must be opened and checked if it satisfies the search condition. It can be predicted that the procedure will become very slow when the data becomes large. Data structure. The data structure designed and implemented for the system proved to be successful in practice. It can represent the relational tables and fields in Palm database without any problems. And the operation on it also works fine for the basic system. From the evaluation results, it is obvious that the system is very limited. This system is just a prototype of a complete system. Further discussion of it will be carried out in next section. 31

37 6 Limitation and Future Works With no doubt, the current system is a very limited one. Further work needs to be done before the system becomes a useful one in practice. The main limitation of the system can be summarized as a few points. 6.1 SQL Grammar The SQL grammar was design to fully support the standard SQL. So a complete system should allow the users to work on with standard SQL. But the current system only implements the SELECT statement. So, it is very obvious that the limitation determines the uselessness of the current system. To make the system practically valuable, most of the standard SQL grammar must be implemented. So the future work of it will try to make the grammar the same as some standards, such as the one introduced in [4]: Statement: statement ::= delete-statement-searched drop-table-statement insert-statement select-statement update-statement-searched All the SQL fragments must be executed using the function call ExecuteSQL(). For example, ExecuteSQL( select age from emp ) do the select operation of attribute age from table emp. Statement can be delete-statement-searched, which represents the operation deleting records from table. Statement can be drop-table-statement, which represents the operation to delete a table. Statement can also be insert-statement, which represents inserting new records into the table. Statement can be select-statement, which is the most useful, to select the desire attributes from a table. Finally, statement can be update-statement-searched, which is to update the records in a table. select-statement ::= SELECT select-list 32

38 FROM table-list [WHERE search-condition] The SELECT statement should also be changed a little bit from the first design. Multiple table selection should be provided. Users can also perform the join table operation. delete-statement-searched ::= DELETE FROM table-name [WHERE search-condition] The DELETE statement starts with the DELETE keyword and the FROM keyword. And the table-name comes after them is the table to be operated on. The WHERE clause in the final place is also optional. Without it means that delete all the attributes from the table. With it means that only the attributes meet the search-condition will be deleted. In standard SQL there is a LIMIT keyword for DELETE statement. Considering that it is usually not that useful, this keyword is not provided for the SQL grammar. Here is an example. It deletes the records, which have name John and age bigger than 30 from the table emp : ExecuteSQL( delete from emp where name = John and age > 30 ) drop-table-statement ::= DROP TABLE table-name The DROP statement starts with the keywords DROP and TABLE. And then there is the table-name, which is the table to be deleted. Here is an example. It deletes the table student : ExecuteSQL( drop table student ) insert-statement ::= 33

39 INSERT INTO table-name [( column-identifier [, column-identifier]...)] VALUES (insert-value [, insert-value]... ) The INSERT statement starts with the keywords INSERT INTO. The table-name following them is the table to be operated on. After the table name, users can specify the column names here. The column names here are necessary sometimes. That is, when the users are inserting values into the columns but the order of columns in the database are not as expected. Or the users are not inserting values into every column. And after that is a VALUES keyword. The insert-values in the bracket are the values to be inserted. If a value is not specified for a column here, then the default value NULL will be used for that column. Being different from standard SQL, it is the only form for the INSERT operation in this syntax. So it is not able to copy information from a SELECT query. Also it is not possible to use the form that uses a list of column SET assignments. Here is an example. It inserts a new student record into the student table with value (Tom, 2013, 23, computing, NULL). ExecuteSQL( insert into student (name, ID, age, major, region) values(tom, 2013, 23, computing, NULL) ) update-statement-searched UPDATE table-name SET column-identifier = {expression NULL } [, column-identifier = {expression NULL}]... [WHERE search-condition] The UPDATE statement starts with the UPDATE keyword. The table-name coming after it is the table to be updated. And then there is a list of column SET assignments. It describes how the columns of data matched by the WHERE clause are to be updated. That the columns have not been assigned in the SET clause are left unchanged. Again, the WHERE clause here is optional. Without it means all the specified columns are to be updated. With it, only the columns match the 34

CSCE-608 Database Systems. COURSE PROJECT #2 (Due December 5, 2018)

CSCE-608 Database Systems. COURSE PROJECT #2 (Due December 5, 2018) CSCE-608 Database Systems Fall 2018 Instructor: Dr. Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu Office Hours: MWF 10:00am-11:00am Grader: Sambartika Guha Email: sambartika.guha@tamu.edu