Chapter 18: Persistence This chapter introduces persistent data and methods for storing information in a file and database. You'll learn the basics of SQL and how App Engine lets you use objects to store and retrieve database data. When one defines a variable in Python, e.g., "x=5", a memory cell is allocated to the variable x. That memory cell is in the random access memory (RAM) of the computer. When the program ends, that memory cell is de-allocated, and for all intensive purposes the data (e.g., the 5) is no longer available. We call such data transient data. Persistent data, on the other hand, is information stored on the hard disk of a computer-- in a database or a file. It is data that lives even after a program or web session ends. Your profile information on Facebook is persisent data. When you submit data in a web form, the data you submit is stored in a database on the web site's servers. This data is persistent since it will be there next time you visit the site. As an ordinary computer user, you work with persistent data each time you save or open files using your computer's operating system (e.g. Windows or Mac) When you save a file in a word processor, the word processor takes care of saving your data persistently. Programmers are given access to persistent data through file system library code, which allows for access to the same operations that users perform with the operating system, or through higher level database management code. In this chapter, we'll focus mainly on applications that use database management code. but we'll begin by introducing some simple code for direct file access. Files Python provides direct access to the file system on your computer. Here's an example of writing to a file: f = open('test.dat','w') f.write("note 1\n") f.write("note 2")
f.close() In the example, the open function is used to gain access to the file 'test.dat' from the hard disk. The second parameter of the open function says the type of access desired, either "r" for read, "w" for write, or "a" for append. In this case we are creating a new file, so "w" is specified. The write function just writes out its string parameter to the file. '\n' is the new line character, so the two write calls cause two lines of text to be written to the file. close tells the OS that your program is done with the file so that other programs or users can gain access to it. Conversely, you can read from an existing file with code like the following: f = open("test.dat","r") while True: line = f.readline() if line=="": break print line f.close() This code loops through a file reading each line and printing it to the command-line console. The File object f keeps track of a file pointer that tracks the current location in the file. The File function readline returns the next line in the file, starting at that current location, and moves the current location file pointer to the next line. readline returns the empty string if the file pointer is at the end of the file. The sample code uses this fact to pop out of the loop. DBMS and SQL A database management system (DBMS) provides a higher-level access to persistent data than the direct access code shown above. Instead of working directly with files, the database system provides facilities for creating and modifying tables of data, and then querying that data to find records that meet a given criteria. The fact that the data in the tables is stored in files is hidden from the database programmer. Most DBMSs provide SQL access. SQL stands for Structured Query Language and is the standard language by which data is stored and retrieved from a relational database.
Data is stored in tables. Each table has fields and each entry in the table is a record. The following SQL command: CREATE TABLE Customer (FirstName char(50), LastName char(50), Address char(50), City char(50),country char(25), BirthDate date) creates a table with six fields. At this point, only the table structure is created, and there are not yet any records (you have a cookie cutter, but no cookies): Customer FirstName LastName Address City Country BirthDate After such a table is created, you can add records (cookies) to it with the SQL Insert statement: INSERT INTO customer (FirstName, LastName, Address,City, Country, BirthDate) VALUES ('David', 'Wolber', '1329 Willard Street', 'San Francisco', 'USA', 'Jan-10-1999') Such Insert statements populate the table with records of actual data: FirstName LastName Address City Country BirthDate David Wolber 1329 Willard Street San Francisco USA Jan-10-1999 Bob Jones 800 Rose St Oakland USA Jan-12-2000 So the SQL Create statement defines the structure for tables of data, and the Insert statement adds records of concrete data to that table. SQL also provides an Update command for modifying a pre-existing record. The third fundamental SQL command is the Select command. It allows you to query the database to find all records in a table that meet a given criteria. For instance, the command: SELECT * FROM customer WHERE City='San Francisco' would return a list of all customer records whose field City has the value 'San Francisco'. Since a query is the most common SQL operation, let's consider the syntax
of Select in a bit more detail. The Select command is of the form: SELECT <fields> from <Table> where <FilterExpression> The job of Select is to return (a portion of) records from a table. Commonly, we select all fields using the syntax: 'Select *'. We could also limit the fields in the records sent back with a command such as: SELECT LastName FROM customer WHERE City='San Francisco' The Where clause is the filter-- it specifies which records of the table should be returned. It can contain references to any of the fields and use AND, OR, and NOT. If we wanted San Franciscan's born in the 21st century, we would use a query such as: SELECT * FROM customer WHERE City='San Francisco' AND BirthDate>12-31-1999 The W3c has a great on-line tutorial for learning the basics of SQL and trying out queries. You can access it at http://www.w3schools.com/sql/ sql_tryit.asp Object-Relational Mapping Most programming languages provide library code that allows the programmer to access a DBMS and make SQL statements. For instance JDBC is a library that allows one to call SQL from a Java program. Some programming environments, such as Google's App Engine, hide much of the SQL layer and allow the programmer to access a database in an object-oriented manner. Instead of working with tables, the programmer defines classes. Instead of working with records, the programmer works with objects (instances of classes). Such a scheme is referred to as an object-relational mapping because items in an object-oriented language are 'mapped' to relational database items. class object <==> table <==> record In most programming environments (e.g., Ruby on Rails, Java's Hibernate) the programmer uses separate tools or languages to set up the database and the object-relational mapping. For instance, with Ruby on Rails, the programmer creates a database using a DBMS such as MySQL, then creates the tables using migrations, which is Ruby on Rails DBMS-independent SQL
language for table creation. Google's App Engine, on the other hand, does not require the programmer to be familar with a DBMS or even table creation statements. It allows the database setup and table creation to be specified using the same language, Python, as is used for the rest of the programming work. Though DBMS setup is not rocket science, it is not trivial either, so App Engine's framework significantly eases the task of database programming for beginners. Google's Datastore API Google's Datastore API allows a programmer to work in the world of objects with no database setup. Google supplies the servers and the underlying database. The programmer just creates and uses persistent objects with the Datastore API, and Google takes care of the plumbing. The ease at which a system can be built is incredible and one key to the technology commonly referred to as 'cloud computing'. Here's an App Engine example of how to create a class that represents a database table: class Customer(db.Model): lastname = db.stringproperty(required=true) firstname= db.stringproperty(required=true) city = db.stringproperty(required=true) Any class that inherits from App Engine's library class db.model by definition represents a persisent object. The fields are defined directly in the class (static fields) and must be of some type defined in the db module. In this sample, all the fields are strings. The "required=true" means that all created records must contain non-empty values for these fields. Class definitions like Customer above only define the strucure of a table, and just like with the SQL Create statement, no records (objects) are created. To create objects, and perform the equivalent task of the SQL insert statement, we just create instances of the class in an ordinary object-oriented manner. The following code creates a record in the Customer table: customer1 = Customer() # object creation customer1.lastname="wolber" customer1.firstname="david" customer1.city="san Francisco" customer1.put() # this stores to database
The object creation statement creates an object in memory only. Because Customer is a persisent class (a subclass of db.model), all instances have the capability of being stored persisently. However, no database operation occurrs until the put function is called. The put function creates a new persisent record with the object's data members as fields in that record. The DataStore API also allows you to query the database with SQL-based statements. We use the class db.gqlquery which essentially allows us to make SQL Select statements and get a list of objects back. Here are some examples of queries to the Customer table: # get all customers customers=db.gqlquery("select * FROM Customer") # get all with last_name of Wolber customers = db.gqlquery("select * FROM Customer WHERE lastname = 'Wolber' ") # get all customers from San Francisco city = "San Francisco" customers = db.gqlquery("select * FROM Customer WHERE city = :1",city) The first query returns a list of all objects of type Customer. The second returns only those whose lastname field is set to 'Wolber'. Note that you must use double quotes around the whole query, then single quotes around the string to match (e.g., 'Wolber'). The third query returns all customers from San Francisco. The ':1' is an example of Python string replacement syntax and allows for the placement of variables into a query. In the example, the value of the variable city replaces the :1 in the query. If the queries above were executed within a web application controller, one could send the return value of the query, the list 'customers' as a template value, e.g., template_values={'customers',customers} In the HTML template, the list customers can be referred to within a for loop: {% for customer in customers %} {{customer.lastname}}<br/> {{customer.city}}
{% endfor %} Note that the variable within the double curly brackets cannot be the entire element (each customer), as a customer is a complex object. Instead, we must display parts of the customer (lastname, city) within the double curly brackets. Summary Google's App Engine eliminates a major part of programming persistence: the setup and administration of a database. Instead, programmers are allowed to work with tables and records stored in the cloud (Google's servers), and not worry about maintaining a local database. They're also allowed to work primarily within an object-oriented framework, using direct SQL only to specify queries. 2. Problems 1. Write a program that copies the contents of one file into another. 2. Go to http://sites.google.com/site/usfcomputerscience/persistence-withbig-data and download the files attached at the bottom. Then make the following changes: Change the CustomerT table so that it includes a field for email. You'll need to modify the database (model.py), the index.html file, and the controller. Be sure that the email is displayed in the list of customers. Below the list of customers, provide a form that allows the user to list only customers with a certain first name. You'll need a new controller class to "handle" submission of the form, and you'll need to use a query with a "Where" clause.