processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections MCS 507 Lecture 23 Mathematical, Statistical and Scientific Software Jan Verschelde, 19 October 2012 Scientific Software (MCS 507) processing data with a database 19 Oct 2012 1 / 39
processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 2 / 39
MySQL & MySQLdb MySQL is an open source database, developed by the company MySQL AB. In February 2008, Sun Microsystems acquired MySQL AB and the expertise of the GPL software for $1 billion. In January 2010, Oracle acquired Sun for $7.38 billion. MySQL can be downloaded for free from http://www.mysql.com/downloads. Following the instructions, install MySQL first. MySQLdb is an interface to connect Python to MySQL. MySQLdb is an API (Application Programming Interface) to use Python scripts to work with MySQL databases. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 3 / 39
client/server computing MySQL is the M in LAMP: Linux Apache MySQL Python. your computer MySQL Client MySQL Server Server s disk Scientific Software (MCS 507) processing data with a database 19 Oct 2012 4 / 39
processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 5 / 39
starting and stopping the daemon We will run MySQL as root on Linux or Mac OS X. Otherwise, management of users is needed. On Linux, login as root. On Mac OS X, use sudo. $ mysqld_safe Shutting the MySQL server down: $ mysqladmin shutdown On Mac OS X, put sudo in front of the commands. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 6 / 39
creating and deleting databases The command mysqladmin is used in MySQL for server administration. We need to use it to create first a database. On a Mac OS X, at the prompt $: $ mysqladmin create Book We have created a database with name Book. To delete the database Book: $ mysqladmin drop Book On Mac OS X, put sudo in front of the commands. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 7 / 39
MySQL to create the table Book $ mysqladmin create Library $ mysql... mysql> use Library Database changed mysql> create table Book -> (id INT, title CHAR(80), available SMALLINT) -> ; Query OK, 0 rows affected (0.08 sec) We created a table Book with attributes (1) id of domain INT; (2) title of domain CHAR, 80 wide; and (3) available of domain SMALLINT. With drop table Book; we remove the table. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 8 / 39
show and describe mysql> show tables; +-------------------+ Tables_in_Library +-------------------+ Book +-------------------+ 1 row in set (0.00 sec) mysql> describe Book; +-----------+-------------+------+-----+---------+-------+ Field Type Null Key Default Extra +-----------+-------------+------+-----+---------+-------+ id int(11) YES NULL title char(80) YES NULL available smallint(6) YES NULL +-----------+-------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 9 / 39
entering data mysql> insert into Book values -> (1,"primer on scientific programming",1); Query OK, 1 row affected (0.00 sec) mysql> select * from Book; +------+----------------------------------+-----------+ id title available +------+----------------------------------+-----------+ 1 primer on scientific programming 1 +------+----------------------------------+-----------+ 1 row in set (0.00 sec) mysql> select title from Book; +----------------------------------+ title +----------------------------------+ primer on scientific programming +----------------------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 10 / 39
processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 11 / 39
using MySQLdb in Python MySQLdb is an interface to use MySQL from within a Python session. As root or as sudo python: >>> import MySQLdb >>> L = MySQLdb.connect(db="Library") >>> c = L.cursor() Observe: run Python as superuser, otherwise no access; with connect(), we identify the database Library; L.cursor() returns a new object to represent a database cursor used to manage all operations. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 12 / 39
retrieving information Any command typed in a session with mysql can be passed as string to execute() on a cursor: >>> c.execute("show tables") 1L >>> c.fetchone() ( Book,) To show all records in the table Book: >>> c.execute("select * from Book") 1L >>> c.fetchall() ((1L, primer on scientific programming, 1),) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 13 / 39
processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 14 / 39
GTFS of our CTA We can download the schedules of the CTA: http://www.transitchicago.com/developers/gtfs.aspx GTFS = General Transit Feed Specification is an open format for packaging scheduled service data. A GTFS feed is a series of text files with data on lines separated by commas (csv format). Each file is a table in a relational database. We call our database CTA and will add tables reading the information from stops.txt. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 15 / 39
fields in stops.txt The first line in stops.txt lists: 1 stop_id: type INT 2 stop_code: type INT 3 stop_name: type CHAR(80) 4 stop_lat: type FLOAT 5 stop_lon: type FLOAT 6 location_type: type INT 7 parent_station: type INT 8 wheelchair_boarding: type SMALLINT Scientific Software (MCS 507) processing data with a database 19 Oct 2012 16 / 39
creating database and table $ mysqladmin create CTA Then we start mysql: mysql> use CTA; Database changed mysql> create table stops -> (id INT, code INT, name CHAR(80), -> lat FLOAT, lon FLOAT, tp INT, -> ps INT, wb SMALLINT); Query OK, 0 rows affected (0.37 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 17 / 39
describe mysql> describe stops; +-------+-------------+------+-----+---------+-------+ Field Type Null Key Default Extra +-------+-------------+------+-----+---------+-------+ id int(11) YES NULL code int(11) YES NULL name char(80) YES NULL lat float YES NULL lon float YES NULL tp int(11) YES NULL ps int(11) YES NULL wb smallint(6) YES NULL +-------+-------------+------+-----+---------+-------+ 8 rows in set (0.01 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 18 / 39
manual insertion The first data line in stops.txt contains 1,1,"Jackson & Austin Terminal",41.87632184, -87.77410482,0,,1 mysql> insert into stops values -> (1,1,"Jackson & Austin Terminal", -> 41.87632184,-87.77410482,0,0,1); Query OK, 1 row affected (0.00 sec) mysql> select name from stops where id = 1; +---------------------------+ name +---------------------------+ Jackson & Austin Terminal +---------------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 19 / 39
deleting rows To delete a row, given its id: mysql> delete from stops where id = 1; Query OK, 1 row affected (0.65 sec) mysql> select * from stops; Empty set (0.01 sec) If the where clause is omitted, then all rows in the table are deleted. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 20 / 39
processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 21 / 39
filling a table Typing 12,165 is rather tedious... After filling the table stops of the database we query the table for a name: mysql> select name from stops where id = 3021; +----------------------+ name +----------------------+ California & Augusta +----------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 22 / 39
the main program def main(): """ Opens the file with name filename, reads every line and insert the data into the table stops. """ L = MySQLdb.connect(db="CTA") c = L.cursor() print opening, filename,... file = open(filename, r ) # we skip the first line d = file.readline() while True: d = file.readline() if d == : break InsertData(c,d) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 23 / 39
inserting data def InsertData(c,s): """ Inserts the data in the comma separated string using the cursor c. """ L = s.split(, ) d = insert into stops values ( d = d + ( 0, if L[0] == else L[0] +, ) d = d + ( 0, if L[1] == else L[1] +, ) d = d + L[2] +, + L[3] +, + L[4] +, d = d + ( 0, if L[5] == else L[5] +, ) d = d + ( 0, if L[6] == else L[6] +, ) w = L[7]; L7 = w[0:len(w)-2] + ) d = d + ( 0) if L[7] == else L7) # print d c.execute(d) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 24 / 39
querying the table mysql> select id from stops -> where name = "California & Augusta"; +-------+ id +-------+ 3021 17154 +-------+ 2 rows in set (0.00 sec) mysql> select name from stops where id = 17154; +----------------------+ name +----------------------+ California & Augusta +----------------------+ 1 row in set (0.01 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 25 / 39
querying with Python $ sudo python dbctastopquery.py give a stop id : 3021 3021 has name California & Augusta $ sudo python dbctastopquery.py give a stop id : 0 0 has name None Scientific Software (MCS 507) processing data with a database 19 Oct 2012 26 / 39
the main program def main(): """ Connects to the database, prompts the user for a stop id and the queries the stops table. """ L = MySQLdb.connect(db="CTA") c = L.cursor() id = input( give a stop id : ) n = getstopname(c,id) print id, has name, n Scientific Software (MCS 507) processing data with a database 19 Oct 2012 27 / 39
executing the query def getstopname(c,id): """ Given a cursor c to the CTA database, queries the stops table for the stop id. Returns None if the stop id has not been found, otherwise returns the stop name. """ s = select name from stops w = where id = %d % id q = s + w r = c.execute(q) if r == 0: return None else: t = c.fetchone() return t[0] Scientific Software (MCS 507) processing data with a database 19 Oct 2012 28 / 39
processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 29 / 39
fields in stop_times.txt The first line in stop_times.txt lists: 1 trip_id: type INT 2 arrival_time: type TIME 3 departure_time: type TIME 4 stop_id: type INT 5 stop_sequence: type INT 6 stop_headsign: type VARCHAR(80) 7 pickup_type: type INT 8 shape_dist_traveled: type INT Scientific Software (MCS 507) processing data with a database 19 Oct 2012 30 / 39
adding a new table mysql> create table stop_times -> (id INT, arrival TIME, departure TIME, -> stop INT, seq INT, head VARCHAR(80), -> ptp INT, sdt INT); Query OK, 0 rows affected (0.37 sec) Note the types TIME and VARCHAR. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 31 / 39
describe mysql> describe stop_times; +-----------+-------------+------+-----+---------+-------+ Field Type Null Key Default Extra +-----------+-------------+------+-----+---------+-------+ id int(11) YES NULL arrival time YES NULL departure time YES NULL stop int(11) YES NULL seq int(11) YES NULL head varchar(80) YES NULL ptp int(11) YES NULL sdt int(11) YES NULL +-----------+-------------+------+-----+---------+-------+ 8 rows in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 32 / 39
manual insertion mysql> insert into stop_times values ( -> 46035893,"12:09:14","12:09:14",6531,29, -> "Midway Orange Line",0,18625); Query OK, 1 row affected (0.00 sec) mysql> select departure, head from stop_times; +-----------+--------------------+ departure head +-----------+--------------------+ 12:09:14 Midway Orange Line +-----------+--------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 33 / 39
filling the table On Mac OS X laptop: $ sudo python dbctafillstoptimes.py opening../lec22/cta/stop_times.txt... dbctafillstoptimes.py:26: Warning: Out of range value for column id at row 1 c.execute(d) Redo on a fast Linux Workstation: # time python dbctafillstoptimes.py opening../lec22/cta/stop_times.txt... dbctafillstoptimes.py:26: Warning: Out of range value for column id at row 1 c.execute(d) real 5m32.433s user 1m11.921s sys 0m17.735s Scientific Software (MCS 507) processing data with a database 19 Oct 2012 34 / 39
about the complexity While running dbctafillstoptimes.py, the memory consumption of Python and mysql was of the same magnitude, about 300Mb. mysql> select count(*) from stop_times; +----------+ count(*) +----------+ 2921276 +----------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 35 / 39
inserting data def InsertData(c,s): """ Inserts the data in the comma separated string using the cursor c. """ L = s.split(, ) d = insert into stop_times values ( d = d + ( 0, if L[0] == else L[0] +, ) d = d + \" + L[1] + \" +, d = d + \" + L[2] + \" +, d = d + L[3] +, + L[4] +, d = d + L[5] +, + L[6] +, w = L[7]; L7 = w[0:len(w)-2] + ) d = d + ( 0) if L[7] == else L7) # print d c.execute(d) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 36 / 39
querying stop_times mysql> select head from stop_times -> where stop = 3021 and -> arrival < "05:30:00"; +----------------+ head +----------------+ 63rd Pl/Kedzie 63rd Pl/Kedzie 63rd Pl/Kedzie 63rd Pl/Kedzie +----------------+ 4 rows in set (0.94 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 37 / 39
an involved query mysql> select name, departure, head -> from stops, stop_times -> where stops.id = 3021 -> and stops.id = stop_times.stop -> and stop_times.departure < "05:30:00"; +----------------------+-----------+----------------+ name departure head +----------------------+-----------+----------------+ California & Augusta 04:43:49 63rd Pl/Kedzie California & Augusta 05:03:49 63rd Pl/Kedzie California & Augusta 05:19:49 63rd Pl/Kedzie California & Augusta 05:12:49 63rd Pl/Kedzie +----------------------+-----------+----------------+ 4 rows in set (0.57 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 38 / 39
Summary + Exercises With Python scripts we read files into MySQL database. Visit http://www.mysqltutorial.org. 1 Install MySQL and MySQLdb on your computer. 2 Write a Python script to return the name of stop, given its id, using the table stops. 3 Design a GUI with Tkinter to query the stop name: one entry field for the stop id, another for the name of the stop, and one button in the middle to execute the query. Note that the GUI allows to query given the stop id or given the stop name. The fourth homework is due on Monday 22 October, 10AM: exercises 3 and 6 of Lecture 13; exercises 2 and 4 of Lecture 14; exercises 1 and 3 of Lecture 15; exercises 2 and 5 of Lecture 16; exercises 3 and 4 of Lecture 17. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 39 / 39