SESSION ELEVEN 11.1 Walkthrough examples More MySQL This session is designed to introduce you to some more advanced features of MySQL, including loading your own database. There are a few files you need for this session which you can download from the wiki http: //www.physics.usyd.edu.au/astrop/ausvoss/uploads/main/mysql-examples.tgz. 11.1.1 Walkthrough 1: Bulk loading As mentioned in the lecture, when we have a large dataset to load it is inconvenient to do it with multiple INSERT statements. In this walkthrough we go through the process of bulk loading using the LOAD DATA statement. For this exercise we will use some bird sightings data birds.csv The file is in csv format and has 4 columns Column 1: Name Column 2: Scientific name Column 3: Summer sightings Column 4: Winter sightings For the purposes of this exercise we will assume that we want to load all the data into one table. In reality we would probably want to have the scientific name in a separate table, with the tables linked by a bird_id attribute (Why?). 1. Firstly create a new database 1 mysql> CREATE DATABASE birds; 2 mysql> USE birds; 2. Now we will create a table to store this data. Clearly the first two attributes (Name and Scientific name) should be strings, and the number of sightings should be integer valued. Also, we probably want a bird_id attribute, auto-incremented to guarantee uniqueness. 1 mysql> CREATE table sightings ( 2 -> bird_id SMALLINT UNSIGNED AUTO_INCREMENT, 3 -> name VARCHAR(20), 4 -> sciname VARCHAR(40), 5 -> s_sight SMALLINT UNSIGNED, 6 -> w_sight SMALLINT UNSIGNED, 7 -> PRIMARY KEY (bird_id) 8 -> ); 3. Now we want to load the data from the birds.csv file, into the sightings table. First remember how we INSERT one item into the table 1 mysql> INSERT INTO sightings 2 -> VALUES (Null, 'Dodo', 'Raphus Cucullatus', 0, 0); 4. There are a couple of things to note about this query. We have not specified the column names to populate, which is acceptable as long as you are populating all columns. Also, the bird_id has value Null because the table was created to auto-increment that attribute. Aus-VO Summer School 2008 1
5. The table now looks like this 1 mysql> SELECT * 2 -> FROM sightings; bird_id name sciname s_sight w_sight 1 Dodo Raphus Cucullatus 0 0 6. Before we load the real data, lets remove this test data from the table. 1 mysql> DELETE FROM sightings; 7. Now we are ready to load the data from birds.csv 1 mysql> LOAD DATA LOCAL 2 -> INFILE "birds.csv" 3 -> INTO TABLE sightings 4 -> FIELDS TERMINATED BY "," 5 -> (name, sciname, s_sight, w_sight); Query OK, 345 rows affected, 84 warnings (0.04 sec) Records: 345 Deleted: 0 Skipped: 0 Warnings: 82 Note that the LOCAL keyword is necessary to specify a file path relative to the current directory. If LOCAL is not present, the full path of the file must be given (from C:). 8. The output shows us that we have 84 warnings. To view these type SHOW warnings;. They are all data truncated errors because we haven t made our name type long enough to store the names (we set it to 20 characters). 9. We can change the type of the name column in the table 1 mysql> ALTER TABLE sightings 2 -> MODIFY name VARCHAR(40); 10. And remove the existing data from the table using DELETE FROM sightings; 11. Now try reloading the data as before. The data should now load successfully. Check that it looks reasonable with a command like or 1 mysql> SELECT * FROM sightings LIMIT 10; 1 mysql> SELECT name FROM sightings LIMIT 5; name Arctic Jaeger Arctic Tern Australasian Bittern Australasian Gannet Australasian Grebe 5 rows in set (0.00 sec) Tara Murphy and James Curran 2
11.1.2 Walkthrough 2:.sql files So far we have been typing all commands directly into MySQL at the prompt. This is reasonable for trying things out, but in general you would want to save the instructions and queries to a script file so you can reuse them. In this walkthrough we look at how to create and run.sql scripts. 1. Firstly lets create a database and a new table to store a students marks for a course 1 mysql> CREATE DATABASE marks; 2 mysql> USE marks; 3 mysql> CREATE TABLE info1903 ( 4 -> student_id INT UNSIGNED, 5 -> mark VARCHAR(5), 6 -> PRIMARY KEY (student_id) 7 -> ); 8 mysql> SHOW tables; Tables_in_marks info1903 2. Now we will insert some data into the table 1 mysql> INSERT INTO info1903 2 -> (student_id, mark) 3 -> VALUES (123456, 'HD'); 3. Now lets say we are happy with this setup and so decide to create an.sql script so we can reuse, modify and add to these commands in future. 4. Open up a new file loadmarks.sql in any text editor and type the commands just as you would at the MySQL prompt. 5. Now remove the database you just created, using DROP DATABASE marks;. 6. We can now recreate the database by running the.sql script as follows 1 mysql> SOURCE loadmarks.sql; Query OK, 1 row affected (0.00 sec) Database changed Query OK, 0 rows affected (0.01 sec) Query OK, 1 row affected (0.00 sec) 7. And now to check that it has worked correctly 1 mysql> SELECT * 2 -> FROM info1903; student_id mark 123456 HD Tara Murphy and James Curran 3
11.2 Exercises These exercises are based on material from Learning SQL, Alan Beaulieu, O Reilly, 2005 http://www.oreilly.com/catalog/learningsql/. 11.2.1 Question 1: Ordering results 1. Retrieve the employee ID, first name and last name for all bank employees. Sort the results by last name then first name. 2. Retrieve the customer ID, city, state and federal ID for all customers. Sort the results by the last 3 digits of federal ID number. Hint: Try the built-in function RIGHT(fed_id, 3). 11.2.2 Question 2: Grouping + Aggregation 1. Construct a query to count the number of rows in the account table 2. Count the number of accounts held by each customer. Show the customer IDs and the number of accounts they hold. 3. Count the number of accounts held by each customer. Show the customer IDs and the number of accounts for each customer who holds at least 2 accounts. 4. Count the number of accounts held by each customer. Show the customer IDs, and where possible the full name of each customer. 5. Modify the previous query so that it only shows results for customers that are listed in the individual table. 11.2.3 Question 3: Harder queries 1. Find the number of employees who have opened an account. 2. Find the maximum balance, minimum balance, average balance, total balance and number of accounts for each type of account (SAV, CHK, etc). 3. Find the total available balance by product and branch where there is more than one account per product and branch. Order the results by total balance (highest to lowest). Hint: You can order by a specific column in the results. For example ORDER BY 2 DESC; will order the results by the attribute in the second column. 11.2.4 Question 4: Loading data For this question you should use the file vanguard.txt which you downloaded from the wiki. It contains the following information for the Vanguard Australian Share Index fund. 1 Column 1: Date Column 2: Share Purchase price Column 2: Share Withdrawal price 1. Create a new database shares. 2. Design and create a table vanguard for this data. Make sure you can justify your choice of primary key. 3. Load the data from the vanguard.txt file into the table. Check that you have set up the database correctly using DESC and SHOW 4. Find the date on which the selling price was at a minimum 5. Find the date in 2005 in which the buying price was at a maximum 1 http://www.vanguard.com.au/ Note that the use of this data as a lab exercise does not mean that I recommend this fund :) Tara Murphy and James Curran 4
11.3 Capability checklist When you ve finished this lab, check that you know how to... 1. Populate tables using bulk loading from files 2. Explore the data using commands like DESC, SHOW 3. Write SQL queries which use ORDER BY 4. Write SQL queries involving grouping 5. Write SQL queries using aggregation functions 6. Write SQL queries using HAVING 7. Understand the difference between HAVING and WHERE 8. Run SQL queries written as separate.sql scripts 11.4 Appendix: MySQL data types These are the MySQL data types that you should choose from when designing and creating a new database table. It is not a comprehensive list, but covers all of the main types. Type CHAR() VARCHAR() TEXT MEDIUMTEXT LONGTEXT TINYINT() SMALLINT() MEDIUMINT() INT() FLOAT DOUBLE() DATE DATETIME TIMESTAMP TIME YEAR ENUM() Description A fixed-length string from 0 to 255 characters A variable-length string from 0 to 255 characters A string with a maximum length of 65 535 characters A string with a maximum length of 16 777 215 characters A string with a maximum length of 4 294 967 295 characters -128 to 127 normal; 0 to 255 UNSIGNED -32768 to 32767 normal; 0 to 65535 UNSIGNED -8388608 to 8388607 normal; 0 to 16777215 UNSIGNED -2147483648 to 2147483647 normal; 0 to 4294967295 UNSIGNED A floating point (real) number A double precision float YYYY-MM-DD YYYY-MM-DD HH:MM:SS YYYYMMDDHHMMSS HH:MM:SS YYYY A list of possible values Tara Murphy and James Curran 5