processing data with a database

Similar documents
processing data from the web

making connections general transit feed specification stop names and stop times storing the connections in a dictionary

pygtfs Documentation Release Yaron de Leeuw

Field required - The field column must be included in your feed, and a value must be

User Interfaces. MCS 507 Lecture 11 Mathematical, Statistical and Scientific Software Jan Verschelde, 16 September Command Line Interfaces

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client

User Interfaces. getting arguments of the command line a command line interface to store points fitting points with polyfit of numpy

Hands-on GTFS. Omaha, NE October 29, U.S. Department of Transportation Federal Transit Administration

Operating systems fundamentals - B07

Web Clients and Crawlers

Web Interfaces. the web server Apache processing forms with Python scripts Python code to write HTML

Package SIRItoGTFS. May 21, 2018

Web Interfaces for Database Servers

Review for Second Midterm Exam

LECTURE 21. Database Interfaces

Creating Your First MySQL Database. Scott Seighman Sales Consultant Oracle

Defining Functions. turning expressions into functions. writing a function definition defining and using modules

Linux Network Administration. MySQL COMP1071 Summer 2017

More MySQL ELEVEN Walkthrough examples Walkthrough 1: Bulk loading SESSION

Kaivos User Guide Getting a database account 2

Draft. Students Table. FName LName StudentID College Year. Justin Ennen Science Senior. Dan Bass Management Junior

Assignment 6: SQL III Solution

EASYLAMP UBUNTU V1.0 DOCUMENT OWNER: OUDHUIS, JONATHAN INGRAM MICRO CLOUD EUROPE

Midterm Exam II MCS 275 Programming Tools 14 March 2017

Welcome to MCS 360. content expectations. using g++ input and output streams the namespace std. Euclid s algorithm the while and do-while statements

solving polynomial systems in the cloud with phc

Bitnami MySQL for Huawei Enterprise Cloud

EASYLAMP REDHAT V1.0 DOCUMENT OWNER: OUDHUIS, JONATHAN INGRAM MICRO CLOUD EUROPE

Welcome to MCS 275. Course Content Prerequisites & Expectations. Scripting in Python from OOP to LAMP example: Factorization in Primes

Data Modelling and Databases Exercise dates: March 22/March 23, 2018 Ce Zhang, Gustavo Alonso Last update: March 26, 2018.

CS 377 Database Systems. Li Xiong Department of Mathematics and Computer Science Emory University

Assignment 6: SQL III

Advanced MySQL Query Tuning

Random Walks & Cellular Automata

Python Programming Exercises 1

Bitnami MariaDB for Huawei Enterprise Cloud

callback, iterators, and generators

How To Start Mysql Use Linux Command Line Client In Ubuntu

Lists and Loops. defining lists lists as queues and stacks inserting and removing membership and ordering lists

List Comprehensions and Simulations

Provider: MySQLAB Web page:

Numerical Integration

Mysql Tutorial Show Table Like Name Not >>>CLICK HERE<<<

SQL Structured Query Language Introduction

SQL: Data De ni on. B0B36DBS, BD6B36DBS: Database Systems. h p:// Lecture 3

Graphical User Interfaces

What is SQL? Toolkit for this guide. Learning SQL Using phpmyadmin

MySQL Installation Guide (OS X)

Lab # 1. You will be using MySQL as a database management system during the labs. The goal of this first lab is to familiarize you with MySQL.

Hydra Installation Manual

PYTHON MYSQL DATABASE ACCESS

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Root Finding Methods. sympy and Sage. MCS 507 Lecture 13 Mathematical, Statistical and Scientific Software Jan Verschelde, 21 September 2011

Getting Started with MySQL

CS Programming Languages: Python

Branching and Enumeration

Data Analysis and Integration

Databases. Course October 23, 2018 Carsten Witt

1 INTRODUCTION TO EASIK 2 TABLE OF CONTENTS

How To Start Mysql Using Linux Command Line Client In Ubuntu

MySQL: an application

Random Walks & Cellular Automata

CSCI-UA: Database Design & Web Implementation. Professor Evan Sandhaus Lecture #23: SQLite

CSC 337. Relational Databases and SQL. Rick Mercer

CIS 192: Lecture 11 Databases (SQLite3)

MySQL Guide. Meher Krishna Patel. Created on : Octorber, 2017 Last updated : December, More documents are freely available at PythonDSP

Mysql Information Schema Update Time Null >>>CLICK HERE<<< doctrine:schema:update --dump-sql ALTER TABLE categorie

MySQL User Conference and Expo 2010 Optimizing Stored Routines

Mastering Linux. Paul S. Wang. CRC Press. Taylor & Francis Group. Taylor & Francis Croup an informa business. A CHAPMAN St HALL BOOK

Infotek Solutions Inc.

Oracle Exam 1z0-882 Oracle Certified Professional, MySQL 5.6 Developer Version: 7.0 [ Total Questions: 100 ]

PHP: Cookies, Sessions, Databases. CS174. Chris Pollett. Sep 24, 2008.

Assignment 5: SQL II Solution

Flexible Engine. Startup Guide

turning expressions into functions symbolic substitution, series, and lambdify

Tired of MySQL Making You Wait? Alexander Rubin, Principal Consultant, Percona Janis Griffin, Database Evangelist, SolarWinds

Exam. Question: Total Points: Score:

CS1 Lecture 2 Jan. 16, 2019

Data Modelling and Databases Exercise dates: March 20/March 27, 2017 Ce Zhang, Gustavo Alonso Last update: February 17, 2018.

CS 327E Lecture 3. Shirley Cohen. February 1, 2016

Database extensions for fun and profit. Andrew Dalke Andrew Dalke Scientific, AB Gothenburg, Sweden

CS 1110 SPRING 2016: GETTING STARTED (Jan 27-28) First Name: Last Name: NetID:

Chapter 1 An introduction to relational databases and SQL

How To Start Mysql Use Linux Command Line Windows 7

CITS2401 Computer Analysis & Visualisation

CET W/32 MySQL support module

itexamdump 최고이자최신인 IT 인증시험덤프 일년무료업데이트서비스제공

IBM DB2 UDB V7.1 Family Fundamentals.

Database Programming with SQL 5-1 Conversion Functions. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

WHAT IS A DATABASE? There are at least six commonly known database types: flat, hierarchical, network, relational, dimensional, and object.

Building a 64-bit CentOS 7 Workstation using Oracle Virtual Box

MySQL 5.0 Certification Study Guide

Advanced MySQL Query Tuning

Mysql Server 4.1 Manually Windows 7 Start Service

EE221 Databases Practicals Manual

Testing Software with Pexpect

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Getting started with MySQL Proxy

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Chapter 13 : Informatics Practices. Class XI ( As per CBSE Board) SQL Commands. New Syllabus Visit : python.mykvs.in for regular updates

Transcription:

processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections MCS 507 Lecture 23 Mathematical, Statistical and Scientific Software Jan Verschelde, 19 October 2012 Scientific Software (MCS 507) processing data with a database 19 Oct 2012 1 / 39

processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 2 / 39

MySQL & MySQLdb MySQL is an open source database, developed by the company MySQL AB. In February 2008, Sun Microsystems acquired MySQL AB and the expertise of the GPL software for $1 billion. In January 2010, Oracle acquired Sun for $7.38 billion. MySQL can be downloaded for free from http://www.mysql.com/downloads. Following the instructions, install MySQL first. MySQLdb is an interface to connect Python to MySQL. MySQLdb is an API (Application Programming Interface) to use Python scripts to work with MySQL databases. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 3 / 39

client/server computing MySQL is the M in LAMP: Linux Apache MySQL Python. your computer MySQL Client MySQL Server Server s disk Scientific Software (MCS 507) processing data with a database 19 Oct 2012 4 / 39

processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 5 / 39

starting and stopping the daemon We will run MySQL as root on Linux or Mac OS X. Otherwise, management of users is needed. On Linux, login as root. On Mac OS X, use sudo. $ mysqld_safe Shutting the MySQL server down: $ mysqladmin shutdown On Mac OS X, put sudo in front of the commands. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 6 / 39

creating and deleting databases The command mysqladmin is used in MySQL for server administration. We need to use it to create first a database. On a Mac OS X, at the prompt $: $ mysqladmin create Book We have created a database with name Book. To delete the database Book: $ mysqladmin drop Book On Mac OS X, put sudo in front of the commands. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 7 / 39

MySQL to create the table Book $ mysqladmin create Library $ mysql... mysql> use Library Database changed mysql> create table Book -> (id INT, title CHAR(80), available SMALLINT) -> ; Query OK, 0 rows affected (0.08 sec) We created a table Book with attributes (1) id of domain INT; (2) title of domain CHAR, 80 wide; and (3) available of domain SMALLINT. With drop table Book; we remove the table. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 8 / 39

show and describe mysql> show tables; +-------------------+ Tables_in_Library +-------------------+ Book +-------------------+ 1 row in set (0.00 sec) mysql> describe Book; +-----------+-------------+------+-----+---------+-------+ Field Type Null Key Default Extra +-----------+-------------+------+-----+---------+-------+ id int(11) YES NULL title char(80) YES NULL available smallint(6) YES NULL +-----------+-------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 9 / 39

entering data mysql> insert into Book values -> (1,"primer on scientific programming",1); Query OK, 1 row affected (0.00 sec) mysql> select * from Book; +------+----------------------------------+-----------+ id title available +------+----------------------------------+-----------+ 1 primer on scientific programming 1 +------+----------------------------------+-----------+ 1 row in set (0.00 sec) mysql> select title from Book; +----------------------------------+ title +----------------------------------+ primer on scientific programming +----------------------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 10 / 39

processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 11 / 39

using MySQLdb in Python MySQLdb is an interface to use MySQL from within a Python session. As root or as sudo python: >>> import MySQLdb >>> L = MySQLdb.connect(db="Library") >>> c = L.cursor() Observe: run Python as superuser, otherwise no access; with connect(), we identify the database Library; L.cursor() returns a new object to represent a database cursor used to manage all operations. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 12 / 39

retrieving information Any command typed in a session with mysql can be passed as string to execute() on a cursor: >>> c.execute("show tables") 1L >>> c.fetchone() ( Book,) To show all records in the table Book: >>> c.execute("select * from Book") 1L >>> c.fetchall() ((1L, primer on scientific programming, 1),) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 13 / 39

processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 14 / 39

GTFS of our CTA We can download the schedules of the CTA: http://www.transitchicago.com/developers/gtfs.aspx GTFS = General Transit Feed Specification is an open format for packaging scheduled service data. A GTFS feed is a series of text files with data on lines separated by commas (csv format). Each file is a table in a relational database. We call our database CTA and will add tables reading the information from stops.txt. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 15 / 39

fields in stops.txt The first line in stops.txt lists: 1 stop_id: type INT 2 stop_code: type INT 3 stop_name: type CHAR(80) 4 stop_lat: type FLOAT 5 stop_lon: type FLOAT 6 location_type: type INT 7 parent_station: type INT 8 wheelchair_boarding: type SMALLINT Scientific Software (MCS 507) processing data with a database 19 Oct 2012 16 / 39

creating database and table $ mysqladmin create CTA Then we start mysql: mysql> use CTA; Database changed mysql> create table stops -> (id INT, code INT, name CHAR(80), -> lat FLOAT, lon FLOAT, tp INT, -> ps INT, wb SMALLINT); Query OK, 0 rows affected (0.37 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 17 / 39

describe mysql> describe stops; +-------+-------------+------+-----+---------+-------+ Field Type Null Key Default Extra +-------+-------------+------+-----+---------+-------+ id int(11) YES NULL code int(11) YES NULL name char(80) YES NULL lat float YES NULL lon float YES NULL tp int(11) YES NULL ps int(11) YES NULL wb smallint(6) YES NULL +-------+-------------+------+-----+---------+-------+ 8 rows in set (0.01 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 18 / 39

manual insertion The first data line in stops.txt contains 1,1,"Jackson & Austin Terminal",41.87632184, -87.77410482,0,,1 mysql> insert into stops values -> (1,1,"Jackson & Austin Terminal", -> 41.87632184,-87.77410482,0,0,1); Query OK, 1 row affected (0.00 sec) mysql> select name from stops where id = 1; +---------------------------+ name +---------------------------+ Jackson & Austin Terminal +---------------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 19 / 39

deleting rows To delete a row, given its id: mysql> delete from stops where id = 1; Query OK, 1 row affected (0.65 sec) mysql> select * from stops; Empty set (0.01 sec) If the where clause is omitted, then all rows in the table are deleted. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 20 / 39

processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 21 / 39

filling a table Typing 12,165 is rather tedious... After filling the table stops of the database we query the table for a name: mysql> select name from stops where id = 3021; +----------------------+ name +----------------------+ California & Augusta +----------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 22 / 39

the main program def main(): """ Opens the file with name filename, reads every line and insert the data into the table stops. """ L = MySQLdb.connect(db="CTA") c = L.cursor() print opening, filename,... file = open(filename, r ) # we skip the first line d = file.readline() while True: d = file.readline() if d == : break InsertData(c,d) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 23 / 39

inserting data def InsertData(c,s): """ Inserts the data in the comma separated string using the cursor c. """ L = s.split(, ) d = insert into stops values ( d = d + ( 0, if L[0] == else L[0] +, ) d = d + ( 0, if L[1] == else L[1] +, ) d = d + L[2] +, + L[3] +, + L[4] +, d = d + ( 0, if L[5] == else L[5] +, ) d = d + ( 0, if L[6] == else L[6] +, ) w = L[7]; L7 = w[0:len(w)-2] + ) d = d + ( 0) if L[7] == else L7) # print d c.execute(d) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 24 / 39

querying the table mysql> select id from stops -> where name = "California & Augusta"; +-------+ id +-------+ 3021 17154 +-------+ 2 rows in set (0.00 sec) mysql> select name from stops where id = 17154; +----------------------+ name +----------------------+ California & Augusta +----------------------+ 1 row in set (0.01 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 25 / 39

querying with Python $ sudo python dbctastopquery.py give a stop id : 3021 3021 has name California & Augusta $ sudo python dbctastopquery.py give a stop id : 0 0 has name None Scientific Software (MCS 507) processing data with a database 19 Oct 2012 26 / 39

the main program def main(): """ Connects to the database, prompts the user for a stop id and the queries the stops table. """ L = MySQLdb.connect(db="CTA") c = L.cursor() id = input( give a stop id : ) n = getstopname(c,id) print id, has name, n Scientific Software (MCS 507) processing data with a database 19 Oct 2012 27 / 39

executing the query def getstopname(c,id): """ Given a cursor c to the CTA database, queries the stops table for the stop id. Returns None if the stop id has not been found, otherwise returns the stop name. """ s = select name from stops w = where id = %d % id q = s + w r = c.execute(q) if r == 0: return None else: t = c.fetchone() return t[0] Scientific Software (MCS 507) processing data with a database 19 Oct 2012 28 / 39

processing data with a database 1 MySQL and MySQLdb MySQL: an open source database running MySQL for database creation MySQLdb: an interface to MySQL for Python 2 CTA Tables in MySQL files in GTFS feed are tables in database filling a table with a Python script storing the connections Scientific Software (MCS 507) processing data with a database 19 Oct 2012 29 / 39

fields in stop_times.txt The first line in stop_times.txt lists: 1 trip_id: type INT 2 arrival_time: type TIME 3 departure_time: type TIME 4 stop_id: type INT 5 stop_sequence: type INT 6 stop_headsign: type VARCHAR(80) 7 pickup_type: type INT 8 shape_dist_traveled: type INT Scientific Software (MCS 507) processing data with a database 19 Oct 2012 30 / 39

adding a new table mysql> create table stop_times -> (id INT, arrival TIME, departure TIME, -> stop INT, seq INT, head VARCHAR(80), -> ptp INT, sdt INT); Query OK, 0 rows affected (0.37 sec) Note the types TIME and VARCHAR. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 31 / 39

describe mysql> describe stop_times; +-----------+-------------+------+-----+---------+-------+ Field Type Null Key Default Extra +-----------+-------------+------+-----+---------+-------+ id int(11) YES NULL arrival time YES NULL departure time YES NULL stop int(11) YES NULL seq int(11) YES NULL head varchar(80) YES NULL ptp int(11) YES NULL sdt int(11) YES NULL +-----------+-------------+------+-----+---------+-------+ 8 rows in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 32 / 39

manual insertion mysql> insert into stop_times values ( -> 46035893,"12:09:14","12:09:14",6531,29, -> "Midway Orange Line",0,18625); Query OK, 1 row affected (0.00 sec) mysql> select departure, head from stop_times; +-----------+--------------------+ departure head +-----------+--------------------+ 12:09:14 Midway Orange Line +-----------+--------------------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 33 / 39

filling the table On Mac OS X laptop: $ sudo python dbctafillstoptimes.py opening../lec22/cta/stop_times.txt... dbctafillstoptimes.py:26: Warning: Out of range value for column id at row 1 c.execute(d) Redo on a fast Linux Workstation: # time python dbctafillstoptimes.py opening../lec22/cta/stop_times.txt... dbctafillstoptimes.py:26: Warning: Out of range value for column id at row 1 c.execute(d) real 5m32.433s user 1m11.921s sys 0m17.735s Scientific Software (MCS 507) processing data with a database 19 Oct 2012 34 / 39

about the complexity While running dbctafillstoptimes.py, the memory consumption of Python and mysql was of the same magnitude, about 300Mb. mysql> select count(*) from stop_times; +----------+ count(*) +----------+ 2921276 +----------+ 1 row in set (0.00 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 35 / 39

inserting data def InsertData(c,s): """ Inserts the data in the comma separated string using the cursor c. """ L = s.split(, ) d = insert into stop_times values ( d = d + ( 0, if L[0] == else L[0] +, ) d = d + \" + L[1] + \" +, d = d + \" + L[2] + \" +, d = d + L[3] +, + L[4] +, d = d + L[5] +, + L[6] +, w = L[7]; L7 = w[0:len(w)-2] + ) d = d + ( 0) if L[7] == else L7) # print d c.execute(d) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 36 / 39

querying stop_times mysql> select head from stop_times -> where stop = 3021 and -> arrival < "05:30:00"; +----------------+ head +----------------+ 63rd Pl/Kedzie 63rd Pl/Kedzie 63rd Pl/Kedzie 63rd Pl/Kedzie +----------------+ 4 rows in set (0.94 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 37 / 39

an involved query mysql> select name, departure, head -> from stops, stop_times -> where stops.id = 3021 -> and stops.id = stop_times.stop -> and stop_times.departure < "05:30:00"; +----------------------+-----------+----------------+ name departure head +----------------------+-----------+----------------+ California & Augusta 04:43:49 63rd Pl/Kedzie California & Augusta 05:03:49 63rd Pl/Kedzie California & Augusta 05:19:49 63rd Pl/Kedzie California & Augusta 05:12:49 63rd Pl/Kedzie +----------------------+-----------+----------------+ 4 rows in set (0.57 sec) Scientific Software (MCS 507) processing data with a database 19 Oct 2012 38 / 39

Summary + Exercises With Python scripts we read files into MySQL database. Visit http://www.mysqltutorial.org. 1 Install MySQL and MySQLdb on your computer. 2 Write a Python script to return the name of stop, given its id, using the table stops. 3 Design a GUI with Tkinter to query the stop name: one entry field for the stop id, another for the name of the stop, and one button in the middle to execute the query. Note that the GUI allows to query given the stop id or given the stop name. The fourth homework is due on Monday 22 October, 10AM: exercises 3 and 6 of Lecture 13; exercises 2 and 4 of Lecture 14; exercises 1 and 3 of Lecture 15; exercises 2 and 5 of Lecture 16; exercises 3 and 4 of Lecture 17. Scientific Software (MCS 507) processing data with a database 19 Oct 2012 39 / 39