CSED421 Database Systems Lab Index
Index of Index What is an index? When to Create an Index or Not? Index Syntax UNIQUE Index / Indexing Prefixes / Multiple-column index Confirming indexes Index types B-Tree / Hash MySQL & Index Usage EXPLAIN Practice Page 2
What Is an Index? An index can be created in a table to find data more quickly and efficiently. Improves the speed of data retrieval operations on a database Slower writes and increased storage space. The users cannot see the indexes, they are just used to speed up searches/queries. Page 3
When to Create an Index or Not Create an index A column contains a wide range of values, many null values Columns frequently used in a WHERE clause ex. PRIMARY KEY, FOREIGN KEY To retrieve rows < 2~4 % of the total rows Don t create an index Small table Rarely used columns To retrieve rows > 2~4 % of the total rows Frequently updated table An index makes write slower Consider trade-offs read time vs. write time & storage space Page 4
Index Basic Syntax CREATE INDEX Syntax CREATE [UNIQUE FULLTEXT SPATIAL] INDEX index_name [index_type] ON tbl_name ( index_col_name,... ) [index_type] index_col_name : col_name [(length)] [ASC DESC] index_type : USING {BTREE HASH} DROP INDEX Syntax DROP INDEX index_name ON tbl_name ADD PRIMARY KEY Syntax ALTER TABLE tbl_name ADD PRIMARY KEY [index_type] ( index_col_name,... ) [index_type] DROP PRIMARY KEY Syntax ALTER TABLE tbl_name DROP PRIMARY KEY Creating primary key on an NDB table automatically results in the creation of both an ordered index and a hash index. Page 5
UNIQUE Index CREATE INDEX index_name [index_type] ON tbl_name ( index_col_name,... ) [index_type] Duplicate values are allowed. CREATE UNIQUE INDEX index_name [index_type] ON tbl_name ( index_col_name,... ) [index_type] Duplicate values are not allowed. An error occurs if you try to add a new row with a key value that matches an existing row. It permits multiple NULL values for columns that can contain NULL. (except for the BDB storage engine) Page 6
Indexing Prefixes Indexes can be created that use only the leading part of column values, using col_name (length) syntax to specify an index prefix length. Prefixes can be specified for CHAR, VARCHAR, BINARY, VARBINARY, BLOB and TEXT columns. mysql> CREATE INDEX part_of_name -> ON customer(name(4)); Bean Dog Pend pointer to the B-Tree nodes Art Bam Bean Cow Cut Dog Patrick Pendant Run
Multiple-column Index mysql> CREATE INDEX idx_name ON tbl_name (col1, col2); mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; Any leftmost prefix of the index can be used by the optimizer to find rows. mysql> CREATE INDEX idx_name ON tbl_name (col1, col2, col3); mysql> SELECT * FROM tbl_name WHERE col1=val1; mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; mysql> SELECT * FROM tbl_name WHERE col2=val2; mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
Confirming Indexes mysql> SHOW INDEX FROM employees;
Index Types Some storage engines permit you to specify an index type when creating an index. Storage Engine MyISAM InnoDB MEMORY/HEAP NDB Permissible Index Types BTREE BTREE HASH, BTREE HASH, BTREE mysql> CREATE INDEX id_index -> ON lookup (id) USING BTREE;
B-Tree Index Characteristics It can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operator. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character mysql> SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%'; mysql> SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%'; mysql> SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%'; mysql> SELECT * FROM tbl_name WHERE key_col LIKE other_col; Bean Dog Pen Internal nodes B-Tree Index (unclustered example) pointer to the B-Tree nodes Leaf nodes Art Bam Bean Cow Cut Dog Patrick Pen Run Wax pointer to the records Art Cow Dog Wax Bam Patrick Bean Run Pen Cut data file
Hash Index Characteristics Only for equality comparisons that use the = or <=> operators (but are very fast). It cannot speed up ORDER BY operations. It cannot determine approximately how many rows there are between two values. Only whole keys can be used to search for a row. Wax input: search key Art Run Patrick Bucket 1 data file Hash Function H(k) output: a bucket pointer to the bucket Hash Index (unclustered example) Bam Bean Dog Cut Pen Cow Wax Bucket 2 Bucket 3 pointer to the record Bucket 4 Art Cow Dog Wax Bam Patrick Bean Run Pen Cut
When MySQL uses indexes To find the rows matching a WHERE clause quickly. Index on tbl (key_col) mysql> SELECT * FROM tbl WHERE key_col = 100; To eliminate rows from consideration. Indexes on tbl (key_part1), tbl (key_part2) mysql> SELECT * FROM tbl WHERE key_part1 = 100 AND key_part2 = 200; To retrieve rows from other tables when performing joins. Index on tbl1 (key_col) mysql> SELECT * FROM tbl1 JOIN tbl2 ON tbl1.key_col = tbl2.ref_col; To find the MIN() or MAX() value for a specific indexed column key_col. Index on tbl (key_col) mysql> SELECT MIN(key_col), MAX(key_col) FROM tbl; To sort or group a table Index on tbl (key_col) mysql> SELECT * FROM tbl ORDER BY key_col; http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
EXPLAIN Syntax EXPLAIN [EXTENDED] SELECT select_options To obtain information about how MySQL executes a SELECT statement Information from the optimizer about the query execution plan To show overall estimated cost mysql> SHOW SESSION STATUS LIKE 'Last_query_cost';
movielens Database 6,040 1,000,209 3,883 id title genres 1 Toy Story Animation Childr en s Comedy 21 Waiting to Exhale Comedy Drama 104 Get Shorty Action Comedy Drama. userid movieid rating ts (timestamp) 1 1 5 978824268 2 21 1 978299839 3 104 4 978298486 id gender age occup ation zipcode 1 F 1 10 48067 2 M 56 16 70072 3 M 25 15 55117
Table Access Full mysql> EXPLAIN SELECT * FROM users; +----+-------------+-------+------+---------------+------+---------+------+------+-------+ id select_type table type possible_keys key key_len ref rows Extra +----+-------------+-------+------+---------------+------+---------+------+------+-------+ 1 SIMPLE users ALL NULL NULL NULL NULL 5891 +----+-------------+-------+------+---------------+------+---------+------+------+-------+ 1 row in set (0.00 sec) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+-------------+ Variable_name Value +-----------------+-------------+ Last_query_cost 1196.199000 +-----------------+-------------+ 1 row in set (0.02 sec) users http://dev.mysql.com/doc/refman/5.0/en/explain-output.html
Index Full Scan mysql> EXPLAIN SELECT * FROM users ORDER BY id; +-------------+-------+-------+---------------+---------+---------+------+------+-------+ select_type table type possible_keys key key_len ref rows Extra +-------------+-------+-------+---------------+---------+---------+------+------+-------+ SIMPLE users index NULL PRIMARY 4 NULL 5891 +-------------+-------+-------+---------------+---------+---------+------+------+-------+ 1 row in set (0.00 sec) (id column is omitted) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+-------------+ Variable_name Value +-----------------+-------------+ Last_query_cost 7087.199000 +-----------------+-------------+ 1 row in set (0.03 sec) users
Index Unique Scan mysql> EXPLAIN SELECT * FROM users WHERE id = 300; +-------------+-------+-------+---------------+---------+---------+-------+------+-------+ select_type table type possible_keys key key_len ref rows Extra +-------------+-------+-------+---------------+---------+---------+-------+------+-------+ SIMPLE users const PRIMARY PRIMARY 4 const 1 +-------------+-------+-------+---------------+---------+---------+-------+------+-------+ 1 row in set (0.00 sec) (id column is omitted) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+----------+ Variable_name Value +-----------------+----------+ Last_query_cost 0.000000 +-----------------+----------+ 1 row in set (0.03 sec) users
Index Range Scan mysql> EXPLAIN SELECT * FROM users WHERE id BETWEEN 100 AND 1000; +-------------+-------+-------+---------------+---------+---------+------+------+--------- + select_type table type possible_keys key key_len ref rows Extra +-------------+-------+-------+---------------+---------+---------+------+------+--------- + SIMPLE users range PRIMARY PRIMARY 4 NULL 1728 Using where +-------------+-------+-------+---------------+---------+---------+------+------+--------- + 1 row in set (0.00 sec) (id column is omitted) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+------------+ Variable_name Value +-----------------+------------+ Last_query_cost 694.051654 +-----------------+------------+ 1 row in set (0.03 sec) users
Index Fast Full Scan mysql> EXPLAIN SELECT id FROM users; +-------------+-------+-------+---------------+---------+---------+------+------+--------- + select_type table type possible_keys key key_len ref rows Extra +-------------+-------+-------+---------------+---------+---------+------+------+--------- + SIMPLE users index NULL PRIMARY 4 NULL 5891 Using index +-------------+-------+-------+---------------+---------+---------+------+------+--------- + 1 row in set (0.00 sec) (id column is omitted) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+-------------+ Variable_name Value +-----------------+-------------+ Last_query_cost 1196.199000 +-----------------+-------------+ 1 row in set (0.02 sec) users
Multi-column Index mysql> DESC ratings; +---------+---------+------+-----+---------+-------+ Field Type Null Key Default Extra +---------+---------+------+-----+---------+-------+ userid int(11) NO PRI 0 movieid int(11) NO PRI 0 rating int(11) YES NULL ts int(11) YES NULL +---------+---------+------+-----+---------+-------+ 4 rows in set (0.00 sec) mysql> EXPLAIN SELECT * FROM ratings WHERE userid = 10 AND movieid = 2000; +-------------+---------+-------+---------------+---------+---------+-------------+------+-------+ select_type table type possible_keys key key_len ref rows Extra +-------------+---------+-------+---------------+---------+---------+-------------+------+-------+ SIMPLE ratings const PRIMARY PRIMARY 8 const,const 1 +-------------+---------+-------+---------------+---------+---------+-------------+------+-------+ 1 row in set (0.00 sec) (id column is omitted) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+----------+ Variable_name Value +-----------------+----------+ Last_query_cost 0.000000 +-----------------+----------+ 1 row in set (0.02 sec)
Multi-column Index mysql> EXPLAIN SELECT * FROM ratings WHERE userid = 10; +-------------+---------+------+---------------+---------+---------+-------+------+------- + select_type table type possible_keys key key_len ref rows Extra +-------------+---------+------+---------------+---------+---------+-------+------+------- + SIMPLE ratings ref PRIMARY PRIMARY 4 const 400 +-------------+---------+------+---------------+---------+---------+-------+------+------- + 1 row in set (0.00 sec) (id column is omitted) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+------------+ Variable_name Value +-----------------+------------+ Last_query_cost 479.999000 +-----------------+------------+ 1 row in set (0.02 sec)
Multi-column Index mysql> EXPLAIN SELECT * FROM ratings WHERE movieid = 2000; +----------+---------+------+---------------+------+---------+------+--------+------------ + sel_type table type possible_keys key key_len ref rows ex +----------+---------+------+---------------+------+---------+------+--------+------------ + SIMPLE ratings ALL NULL NULL NULL NULL 884275 Using where +----------+---------+------+---------------+------+---------+------+--------+------------ + 1 row in set (0.00 sec) (id column is omitted) mysql> SHOW SESSION STATUS LIKE 'Last_query_cost'; +-----------------+---------------+ Variable_name Value +-----------------+---------------+ Last_query_cost 182718.999000 +-----------------+---------------+ 1 row in set (0.03 sec)
Practice movielens database brynn.postech.ac.kr에이미 movielens database가만들어져있다. README 파일에 movielens database의 descrpition이있다. PRIMARY KEY와 FOREIGN KEY constraint는이미선언됨. Page 24
Practice - Problem 해당 query 에대해아래과정을수행하여라. 1. 18 세미만의유저의평가가존재하는영화를모두찾아라. 1. SELECT DISTINCT M.title FROM Users U, Ratings R, Movies M WHERE U.id = R.userid AND U.age = 1 AND M.id = R.movieid; 2. 남성프로그래머로부터 2 점이하의낮은평점을 1 회이상받은영화를모두찾아라. 1. SELECT DISTINCT M.title FROM Users U, Ratings R, Movies M WHERE U.id = R.userid AND U.occupation = 12 AND R.rating <= 2 AND R.movieid = M.id; 3. 1000 개이상의평가가있는영화중가장높은평균평점을받은영화 Top 5 를찾아라. 1. SELECT M.title, AVG(R.rating) avg_rating FROM Ratings R, Movies M WHERE R.movieid = M.id GROUP BY M.id HAVING COUNT(*) >= 1000 ORDER BY avg_rating DESC LIMIT 5; 1. 각 query 를 optimize 하는 index 를만든다. 2. 아래명령어를이용하어각쿼리가 index 를통해어떻게성능이향상되었는지를비교하라. 1. SHOW SESSION STATUS LIKE 'Last_query_cost'; Page 25
Practice - Submission Lab7 skeleton 을 download. 해당 skeleton 을완성하여 submit 함. 1. last_query_cost: last_query_cost 를구한값을넣음 (format: copy the full number) 2. query_time: 쿼리수행시간을넣음 (format: 0.11s) 3. CREATE INDEX 를수정하여직접삽입한 index 를추가함. 4. CREATE INDEX 이전의 last_query_time, query_time 에는 index 추가전의수행 cost 를, 이후의 entries 에는 index 추가후의수행 cost 를적음. Performance 도평가요소 : 어떤 index 를추가할지고민하여야할것. SHOW SESSION STATUS LIKE last_query_cost'; 구문이정확한방식은아님. complex 한 query 에대한분석을할수없음. Page 26
See More W3Schools MySQL http://www.w3schools.com/sql/sql_create_index.asp http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html http://dev.mysql.com/doc/refman/5.0/en/create-index.html http://dev.mysql.com/doc/refman/5.0/en/delete-index.html http://dev.mysql.com/doc/refman/5.0/en/alter-table.html Page 27