Databases and SQL programming overview
Databases: Digital collections of data A database system has: Data + supporting data structures The management system (DBMS) Popular DBMS Commercial: Oracle, IBM, Microsoft GUI: Microsoft Access, OpenOffice Base Terminal: MySQL, Postgres, SQLite
Databases: Digital collections of data Why use a database system? Large amounts of diverse data Example: sequence identifiers and expression data
Databases: Digital collections of data
Databases: Digital collections of data Why use a database system? Large amounts of diverse data Example: sequence identifiers and expression data Many concurrent end-users or collaborators
Databases: Digital collections of data
Databases: Digital collections of data
Biological databases Complex data Relationships (1-1, 1-many, many-many) End users
Biological databases
SQL (Structured Query Language) Language for building, accessing, and manipulating a relational database management systems (RDBMS) Database Schema: tables, relationships, permissions The data Access the data with SQL queries
PostgreSQL http://www.postgresql.org/ Install postgres (version 8.4 is the current one in Ubuntu) $ psql --help Log into your psql terminal $ psql -h localhost -U postgres -d my_database Get help with SQL commands my_database=>\h Get help with psql commands my_database=>\?
SQL server basics Keywords are not case sensitive. Table and column names are stored as they are entered (use lower case as naming convention). SQL statements terminated with a semicolon. Elements are comma separated. Comments are enclosed between /* and */ or preceded by --.
SQL (Structured Query Language) Language for building, accessing, and manipulating a relational database management systems (RDBMS) Database Create a new database from your psql terminal Create or drop database from your linux terminal (createdb and dropdb commands) * Mind owner of the database, and permissions!
SQL (Structured Query Language) Language for building, accessing, and manipulating a relational database management systems (RDBMS) Database Schema: tables, relationships, permissions (DDL) The data Access the data with SQL queries
SQL (Structured Query Language) Data definition (DDL) Data manipulation SELECT statements Joins Row functions Aggregate functions Subqueries Views * PostgreSQL, and database connection with Perl
Data definition Data types http://www.postgresql.org/docs/8.4/static/datatype.html
Data definition Create table Defines the structure of the table: CREATE TABLE sample ( sample_id serial NOT NULL PRIMARY KEY, sample_name character varying NOT NULL );
Data definition Create table Defines the structure of the table: Table name CREATE TABLE sample ( sample_id serial NOT NULL PRIMARY KEY, sample_name character varying NOT NULL ); SQL command Column name Data type Constraint(s)
Data definition Constraints are used to enforce valid data in columns modifies the structure of the table: NOT NULL CHECK PRIMARY KEY FOREIGN KEY http://www.postgresql.org/docs/8.4/static/ddl-constraints.html
Data definition Create table CREATE TABLE sample ( sample_id serial NOT NULL PRIMARY KEY, sample_name character varying NOT NULL, species character varying ); Or: CREATE TABLE sample ( sample_id serial NOT NULL PRIMARY KEY, sample_name character varying NOT NULL ); ALTER TABLE sample ADD COLUMN species character varying ;
Data definition Alter table modifies the structure of the table: ALTER TABLE sample ALTER COLUMN species SET NOT NULL; ALTER TABLE sample DROP COLUMN species; ALTER TABLE sample ADD COLUMN species text DEFAULT NULL;
Data definition Drop tables DROP TABLE sample; Oops!
Data definition Drop tables DROP TABLE sample; Oops! ALWAYS USE TRANSACTIONS!! => BEGIN; => SQL statement 1; SQL statement 2... ; -- I made some mistake... => ROLLBACK;
Data definition ALWAYS USE TRANSACTIONS!! => BEGIN; => SQL statement 1; SQL statement 2... ; -- Looks good! => COMMIT;
Data definition Foreign keys CREATE TABLE species ( species_id serial PRIMARY KEY, species_name character varying NOT NULL ); CREATE TABLE sample ( sample_id serial NOT NULL PRIMARY KEY, sample_name character varying NOT NULL, species_id integer REFERENCES species(species_id) );
Foreign key constraint species animal species_id (PK) animal_id (PK) species_name animal_name species_id (FK)
SQL (Structured Query Language) Language for building, accessing, and manipulating a relational database management systems (RDBMS) Database Schema: tables, relationships, permissions (DDL) The data (INSERT, UPDATE, DELETE) Access the data with SQL queries
Data manipulation Insert add new rows to your table INSERT INTO species (species_name) VALUES ('Solanum lycopersicum'); Update modify column value/s of existing rows UPDATE species SET species_name = 'Solanum tuberosum' WHERE species_id = 1; Delete -remove rows from table DELETE FROM species WHERE species_id = 1;
Data manipulation Transactions BEGIN; UPDATE...; DELETE...; INSERT...; COMMIT; or ROLLBACK;
Data manipulation - transactions
Use transactions, sanitize your input http://bobby-tables.com/
Data manipulation copy command Large dataset? Write SQL file with INSERT commands Use COPY with a delimited text file
Data manipulation copy command http://www.postgresql.org/docs/8.4/interactive/app-psql.html
Data manipulation copy command BEGIN; INSERT INTO species (species_name) VALUES ('Solanum melongena'); INSERT INTO species (species_name) VALUES ('Solanum tuberosum'); INSERT INTO species (species_name) VALUES ('Capsicum annuum');.. COMMIT; => \copy species (species_name) FROM 'species_list.txt'
SQL (Structured Query Language) Language for building, accessing, and manipulating a relational database management systems (RDBMS) Database Schema: tables, relationships, permissions (DDL) The data (INSERT, UPDATE, DELETE) Access the data with SQL queries (SELECT)
SQL (Structured Query Language) Data definition Data manipulation (DML) SELECT statements Joins Row functions Aggregate functions Subqueries Views * PostgreSQL, and database connection with Perl
SELECT statements
SELECT statements Select everything: SELECT * FROM sample; Select with a condition, sort results: SELECT sample_name FROM sample WHERE species_id=1 ORDER BY sample_name ASC; Count rows, group by column, with a condition: SELECT count(sample_id), species.species_name FROM sample JOIN species USING (species_id) GROUP BY species_name HAVING species_name like 'Solanum%';
SELECT statements Conditional operators: SELECT... FORM... WHERE...
SQL (Structured Query Language) Data definition Data manipulation (DML) SELECT statements Joins Row functions Aggregate functions Subqueries Views * PostgreSQL, and database connection with Perl
Joins
Joins Inner joins are default, so the word 'inner' can be omitted Natural joins (when foreign key column has the same name as the referenced column) SELECT * FROM sample JOIN species USING (species_id);
Joins
Joins
Row functions
Row functions Math functions http://www.postgresql.org/docs/8.4/interactive/functions-math.html String functions http://www.postgresql.org/docs/8.4/interactive/functions-string.html
Row functions Date and time http://www.postgresql.org/docs/8.4/interactive/functions-datetime.html http://www.postgresql.org/docs/8.4/static/sql-createcast.html Data type conversion # SELECT now(); --this is a timestamp! # SELECT cast (now() AS text) ; --the output is now a 'text' data type
PostgreSQL the basics $ psql -h hostname -U username -d dbname Users: postgres, your_user_name, other_user *user permissions postgres is the database superuser Grant permissions to other users as required
Resources http://en.wikipedia.org/wiki/sql http://www.postgresql.org/docs/8.4/ In your psql terminal: =>\h [SQL command name] =>\?
SQL advanced querying Data definition Data manipulation (DML) SELECT statements Joins Row functions Aggregate functions Subqueries Views * PostgreSQL, and database connection with Perl
Row functions - Case SELECT * FROM test; The SQL CASE expression is a generic conditional expression, similar to if/else statements in other languages: CASE WHEN condition THEN result [WHEN...] [ELSE result] END a --1 2 3 SELECT a, CASE WHEN a=1 THEN 'one' WHEN a=2 THEN 'two' ELSE 'other' END FROM test; a case ---+------1 one 2 two 3 other
Row functions more conditional expressions The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display, for example: SELECT COALESCE(description, short_description, '(none)')... The NULLIF function returns a null value if value1 and value2 are equal; otherwise it returns value1. This can be used to perform the inverse operation of the COALESCE example given above: SELECT NULLIF(value, '(none)')... If value1 is (none), return a null, otherwise return value1.
Aggregate functions
Aggregate functions - grouping GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions
Aggregate functions - grouping HAVING clause use with aggregate functions instead of 'WHERE'
Subqueries
Subqueries
Views A view is a virtual table based on the results of an SQL statement http://www.postgresql.org/docs/8.4/interactive/tutorial-views.html CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name;
postgresql Programming with a procedural language Write your own postgresql functions! PL/pgSQL (similar to Oracle's PL/SQL http://en.wikipedia.org/wiki/pl_sql ) Many other languages: PL/Perl http://www.postgresql.org/docs/8.1/static/plperl.html PL/Java, plphp, PL/Python, PL/R and more...