Teradata. This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries.

Size: px
Start display at page:

Download "Teradata. This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries."

Transcription

1 Teradata This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries. What is it? Teradata is a powerful Big Data tool that can be used in order to quickly and efficiently enable users to deal with extremely large data sets. Teradata is a relational database that can store billions of rows and petabytes (1 petabyte=1000 terabytes) of data. The architecture of the system makes it possible and provides the flexibility to access and process the data in a faster way. Teradata differs from other conventional database systems from its architecture to the processing speed. Big corporations like global insurance companies are using Teradata to store customer and client information because they have a lot of it to process. The demand for this system is high due to easy scalability and its fault tolerance. What are the main components of Teradata System? Teradata has 3 main components which do wonder to the world of data management & Storage. It has 1. PE (Parsing Engine) : Acts as a gate keeper to the Teradata Systems and manages all sessions, interprets the SQL statements for any errors, manages the access rights for the user, defines a least expensive optimizer plan for the query to execute and sends the request to AMP via Bynet. 2. Message Passing Layer (Bynet): Carries messages between the AMPs and PEs, provides Point-to- Point and Broadcast communications, Merging answer sets back to the PE and Making Teradata parallelism possible 3. AMP (Access Module Processor): AMP is the heart of Teradata which does most of the operations for data storage and retrieval. It also takes care of finding the rows requested, Lock management of the tables and rows, Sorting rows, Aggregating columns, Join processing etc. What are Primary Index and Primary Key in Teradata? Unlike other database systems, Teradata distributes the data based on PI (Primary Index). PI is defined at the time of table creation and database automatically takes the first column as the PI if the PI is not mentioned explicitly. Since the data distribution is based on PI, it is wise to choose a PI that evenly distributes the data among the AMP. For example, if Table A has two columns like below and we have 5 AMPs in the System. ID Gender 1 Male 2 Male 3 Male 4 Male 5 Female If we choose ID as PI, since the values are distinct all 5 rows are distributed evenly across all 5 AMPs. But if GENDER has been chosen as PI, we have only 2 distinct values and data will be stored in only 2 AMPS leaving other 3 AMPS empty and idle. Note: Same Value of PI will be stored in the same AMP. Primary Key is a concept that uniquely identifies a particular row of a table. What are the types of PI (Primary Index) in Teradata?

2 There are two types of Primary Index. Unique Primary Index (UPI) and Non Unique Primary Index (NUPI). By default, NUPI is created when the table is created. Unique keyword has to be explicitly given when UPI has to be created. UPI wills slower the performance sometimes as for each and every row, uniqueness of the column value has to be checked and it is an additional overhead to the system but the distribution of data will be even. Care should be taken while choosing a NUPI so that the distribution of data is almost even. UPI/NUPI decision should be taken based on the data and its usage. How to Choose Primary Index (PI) in Teradata? Choosing a Primary Index is based on Data Distribution and Join frequency of the Column. If a Column is used for joining most of the tables then it is wise to choose the column as PI candidate. For example, we have an Employee table with EMPID and DEPTID and this table needs to be joined to the Department Table based on DEPTID. It is not a wise decision to choose DEPTID as the PI of the employee table. Reason being, employee table will have thousands of employees whereas number of departments in a company will be less than 100. So choosing EMPID will have better performance in terms of distribution. How the data is distributed among AMPs based on PI in Teradata? Assume a row is to be inserted into a Teradata table The Primary Index Value for the Row is put into the Hash Algorithm The output is a 32-bit Row Hash The Row Hash points to a bucket in the Hash Map. The first 16 bits of the Row Hash of is used to locate a bucket in the Hash Map The bucket points to a specific AMP The row along with the Row Hash are delivered to that AMP When the AMP receives a row it will place the row into the proper table, and the AMP checks if it has any other rows in the table with the same row hash. If this is the first row with this particular row hash the AMP will assign a 32-bit uniqueness value of 1. If this is the second row hash with that particular row hash, the AMP will assign a uniqueness value of 2. The 32-bit row hash and the 32-bit uniqueness value make up the 64-bit Row ID. The Row ID is how tables are sorted on an AMP. This uniqueness value is useful in case of NUPI's to distinguish each BUPI value. Both UPI and NUPI is always a One AMP operation as the same values will be stores in same AMP. How Teradata retrieves a row? For example, a user runs a query looking for information on Employee ID 100. The PE sees that the Primary Index Value EMP is used in the SQL WHERE clause. Because this is a Primary Index access operation, the PE knows this is a one AMP operation. The PE hashes 100 and the Row Hash points to a bucket in the Hash Map that represents AMP X. AMP X is sent a message to get the Row Hash and make sure it s EMP 100. What are Secondary Indexes (SI), types of SI and disadvantages of Secondary Indexes in Teradata? Secondary Indexes provide another path to access data. Teradata allows up to 32 secondary indexes per table. Keep in mind; row distribution of records does not occur when secondary indexes are defined. The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored. Keep in mind that Secondary Indexes (when defined) do take up additional space.

3 Secondary Indexes are frequently used in a WHERE clause. The Secondary Index can be changed or dropped at any time. However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed. There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely efficient. A USI is considered a two- AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP. A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan. Secondary indexes can be useful for: Satisfying complex conditions Processing aggregates Value comparisons Matching character combinations Joining tables How are the data distributed in Secondary Index Subtables in Teradata? When a user creates a Secondary Index, Teradata automatically creates a Secondary Index Subtable. The subtable will contain the: Secondary Index Value Secondary Index Row ID Primary Index Row ID When a user writes an SQL query that has an SI in the WHERE clause, the Parsing Engine will Hash the Secondary Index Value. The output is the Row Hash, which points to a bucket in the Hash Map. That bucket contains an AMP number and the Parsing Engine then knows which AMP contains the Secondary Index Subtable pertaining to the requested USI information. The PE will direct the chosen AMP to look-up the Row Hash in the Subtable. The AMP will check to see if the Row Hash exists in the Subtable and double check the subtable row with the actual secondary index value. Then, the AMP will pass the Primary Index Row ID back up the BYNET network. This request is directed to the AMP with the base table row, which is then easily retrieved. What are the types of JOINs available in Teradata? Types of JOINs are: Inner Join, Outer Join (Left, Right, and Full), Self Join, Cross Join and Cartesian Joins. The key things to know about Teradata and Joins Each AMP holds a portion of a table. Teradata uses the Primary Index to distribute the rows among the AMPs. Each AMP keeps their tables separated from other tables like someone might keep clothes in a dresser drawer. Each AMP sorts their tables by Row ID. For a JOIN to take place the two rows being joined must find a way to get to the same AMP. If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen. What are the types of Join Strategies available in Teradata?

4 Join Strategies are used by the optimizer to choose the best plan to join tables based on the given join condition. Merge (Exclusion) Nested Row Hash Product (including Cartesian Product joins) There are different types of merge join strategies available. But in general, while joining two tables the data will be redistributed or duplicated across all AMPs to make sure joining rows are in the same AMPs. If the two tables are joined based on PI, no redistribution/duplication will happen as the rows will be in the same AMP and performance will be better. If one table PI is used and Other table PI not used, redistribution/duplication of the table will happen based on the table size. In these cases Secondary Indexes will be helpful. Explain types of re-distribution of data happening for joining of columns from two tables in Teradata? Case 1 - P.I = P.I joins Case 2 - P.I = non Index joins Case 3 - non Index = non Index joins Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast. Case2 - data from second table will be re-distributed on all amps since joins are happening on PI vs. non Index column. Ideal scenario is when small table is redistributed to be joined with large table records on same amp case3 - data from both the tables are redistributed on all AMPs. This is one of the longest processing queries; Care should be taken to see that stats are collected on these columns What is Partitioned Primary Index (PPI) in Teradata? Partitioned primary index is physically splitting the table into a series of subtables, one for every partitioning value. When a single row is accessed, it looks first at the partitioning value to determine the subtable, then at the primary index to calculate the row hash for the row(s). For example, we have PPI on a MONTH Column, the rows of particular months are all sorted with in the same partition and whenever data is accessed for particular month, it will retrieve the data in a faster way. It helps to avoid full table scans. What are the advantages and disadvantages of PPI in Teradata? Advantages: Range queries don t have to utilize a Full Table Scan. Deletions of entire partitions are lightning fast. PPI provides an excellent solution instead of using Secondary Indexes Tables that hold yearly information don t have to be split into 12 smaller tables to avoid Full Table Scans (FTS). This can make modeling and querying easier. Fast load and Multifood work with PPI tables, but not with all Secondary Indexes. Disadvantages: A two-byte Partition number is added to the ROW-ID and it is now called a ROW KEY. The two-bytes per row will add more Perm Space to a table. Joins to Non-Partitioned Tables can take longer and become more complicated for Teradata to perform. Basic select queries utilizing the Primary Index can take longer if the Partition number is not also mentioned in the WHERE clause of the query.

5 You can t have a Unique Primary Index (UPI) if the Partition Number is not at least part of the Primary Index. You must therefore create a Unique Secondary Index to maintain uniqueness. Volatile and Global Temporary Tables in Teradata? Volatile tables are temporary tables that are materialized in spool and are unknown to the Data Dictionary. A volatile table may be utilized multiple times and in more than one SQL statement throughout the life of a session. This feature allows for additional queries to utilize the same rows in the temporary table without requiring the rows to be rebuilt. Volatile tables are local to session and the tables are dropped once the session is disconnected. ON COMMIT PRESERVE ROWS option should be mentioned at the time of table creation. It means that at the end of a transaction, the rows in the volatile table will not be deleted. The information in the table remains for the entire session. Users can ask questions to the volatile table until they log off. Then the table and data go away. Global Temporary Tables are similar to volatile tables in that they are local to a user s session. However, when the table is created, the definition is stored in the Data Dictionary. In addition, these tables are materialized in a permanent area known as Temporary Space. Because of these reasons, global tables can survive a system restart and the table definition will not discard at the end of the session. However, when a system restarts, the rows inside the Global Temporary Table will be removed. Lastly, Global tables require no spool space. They use Temp Space. Statistics can be collected in both of the tables in TD13 Version. Previously Collecting Stats on Volatile tables are not allowed. Sub Query and Correlated Sub query in Teradata? Sub queries and Correlated Sub queries are two important concepts in Teradata and used most of the times. The basic concept behind a sub query is that it retrieves a list of values that are used for comparison against one or more columns in the main query. Here the sub query is executed first and based on the result set, the main query will be executed. For example, Select empname, deptname from employee where empid IN (select empid from salarytable where salary>10000). In the above query, empid will be chosen first based on the salary in the sub query and main query will be executed based on the result subset. Correlated Sub query is an excellent technique to use when there is a need to determine which rows to SELECT based on one or more values from another table. It combines sub query processing and Join processing into a single request. It first reads a row from the main query and then goes into the sub query to find the rows that match the specified column value. Then it goes for the next row from the main query. This process continues until all the qualifying rows from MAIN query. For example, select empname,deptno, salary from employeetable as emp where salary=(select max(salary) from employeetable as emt where emt.deptno=emp.deptno) Above query returns the highest paid employee from each department. This is also one of the scenario based questions in Teradata.

6 How to calculate the table size, database size and free space left in a database in Teradata? DBC.TABLESIZE and DBC.DISKSPACE are the systems tables used to find the space occupied. Below Query gives the table size of each table in the database and it will be useful to find the big tables in case of any space recovery. SELECT DATABASENAME, TABLENAME, SUM(CURRENTPERM/(1024*1024*1024)) AS "TABLE SIZE" FROM DBC.TABLESIZE WHERE DATABASENAME = <'DATABASE_NAME'> AND TABLENAME =< 'TABLE_NAME'> GROUP BY 1,2; Below query gives the total space and free space available in a database. SELECT DATABASENAME DATABASE_NAME, SUM(MAXPERM)/(1024*1024*1024) TOTAL_PERM_SPACE, SUM(CURRENTPERM)/(1024*1024*1024) CURRENT_PERM_SPACE, TOTAL_PERM_SPACE-CURRENT_PERM_SPACE as FREE_SPACE FROM DBC.DISKSPACE WHERE DATABASENAME =< 'DATABASE_NAME'> group by 1; What are the Performance improvement techniques available in Teradata? First of all use EXPLAIN plan to see how the query is performing. Keywords like Product joins, low confidence are measures of poor performance. Make Sure, STATS are collected on the columns used in WHERE Clause and JOIN columns. If STATS are collected, explain plan will show HIGH CONFIDENCE This tells the optimizer about the number of rows in that table which will help the optimizer to choose the redistribution/duplication of smaller tables. Check the joining columns & WHERE Clause whether PI, SI or PPI are used. Check whether proper alias names are used in the joining conditions. Split the queries into smaller subsets in case of poor performance. What does Pseudo Table Locks mean in EXPLAIN Plan in Teradata? It is a false lock which is applied on the table to prevent two users from getting conflicting locks with all- AMP requests. PE will determine a particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table. Can you compress a column which is already present in table using ALTER in Teradata? No, We cannot use ALTER command to compress the existing columns in the table. A new table structure has to be created which includes the Compression values and data should be inserted into Compress column table.

7 Please note - ALTER can be used only to add new columns with compression values to table. How to create a table with an existing structure of another table with or without data and also with stats defined in Teradata? CREATE TABLE new_table AS old_table WITH DATA CREATE TABLE new_table AS old_table WITH NO DATA CREATE TABLE new_table AS old_table WITH DATA AND STATS How to find the duplicate rows in the table in Teradata? Group by those fields and add a count greater than 1 condition for those columns For example SELECT name, COUNT (*) FROM TABLE EMPLOYEE GROUP BY name HAVING COUNT (*)>1; Also DISTINCT will be useful. If both DISTINCT and COUNT(*) returns same number then there are no duplicates. Which is more efficient GROUP BY or DISTINCT to find duplicates in Teradata? With more duplicates GROUP BY is more efficient while if we have fewer duplicates the DISTINCT is efficient. What is the difference between TIMESTAMP (0) and TIMESTAMP (6) in Teradata? Both have the Date and Time Values. The major difference is that TIMESTAMP (6) has microsecond too. What is spool space and when running a job if it reached the maximum spool space how you solve the problem in Teradata? Spool space is the space which is required by the query for processing or to hold the rows in the answer set. Spool space reaches maximum when the query is not properly optimized. We must use appropriate condition in WHERE clause and JOIN on correct columns to optimize the query. Also make sure unncessary volatile tables are dropped as it occupies spool space. Why does varchar occupy 2 extra bytes? The two bytes are for the number of bytes for the binary length of the field. It stores the exact no of characters stored in varchar What is the difference between User and database in Teradata? - User is a database with password but database cannot have password - Both can contain Tables, views and macros - Both users and databases may or may not hold privileges - Only users can login, establish a session with Teradata database and they can submit requests What are the types of HASH functions used in Teradata? These are the types of HASH, HASHROW, HASHAMP and HASHBAKAMP. Their SQL functions are- HASHROW (column(s)) HASHBUCKET (hashrow) HASHAMP (hashbucket) HASHBAKAMP (hashbucket) To find the data distribution of a table based on PI, below query will be helpful. This query will give the number of records in each AMP for that particular table.

8 SELECT HASHAMP(HASHBUCKET(HASHROW(PI_COLUMN))),COUNT(*) FROM TABLENBAME GROUP BY 1.

Teradata Basics Class Outline

Teradata Basics Class Outline Teradata Basics Class Outline CoffingDW education has been customized for every customer for the past 20 years. Our classes can be taught either on site or remotely via the internet. Education Contact:

More information

Lessons with Tera-Tom Teradata Architecture Video Series

Lessons with Tera-Tom Teradata Architecture Video Series Lessons with Tera-Tom Teradata Architecture Video Series For More Information Contact: Thomas Coffing Chief Executive Officer Coffing Data Warehousing Cell: 513-300-0341 Email: Tom.Coffing@coffingdw.com

More information

You should have a basic understanding of Relational concepts and basic SQL. It will be good if you have worked with any other RDBMS product.

You should have a basic understanding of Relational concepts and basic SQL. It will be good if you have worked with any other RDBMS product. About the Tutorial is a popular Relational Database Management System (RDBMS) suitable for large data warehousing applications. It is capable of handling large volumes of data and is highly scalable. This

More information

INSTRUCTOR-LED TRAINING COURSE

INSTRUCTOR-LED TRAINING COURSE INSTRUCTOR-LED TRAINING COURSE TERADATA TERADATA Lecture/Lab ILT 25968 4 Days COURSE DESCRIPTION This course defines the processes and procedures to follow when designing and implementing a Teradata system.

More information

Teradata 14 Certification Exams Objectives. TE0-143 Teradata 14 Physical Design and Implementation

Teradata 14 Certification Exams Objectives. TE0-143 Teradata 14 Physical Design and Implementation Teradata 14 Certification Exams Objectives The high level objectives represent the general content areas. The more detailed information below the objective indicates representative topic areas. All Teradata

More information

Teradata SQL Class Outline

Teradata SQL Class Outline Teradata SQL Class Outline CoffingDW education has been customized for every customer for the past 20 years. Our classes can be taught either on site or remotely via the internet. Education Contact: Thomas

More information

"Charting the Course... Teradata Basics Course Summary

Charting the Course... Teradata Basics Course Summary Course Summary Description In this course, students will learn the basics of Teradata architecture with a focus on what s important to know from an IT and Developer perspective. Topics The Teradata Architecture

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 13 Teradata Architecture and its compoenets Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and

More information

Tables and Volatile Tables

Tables and Volatile Tables Derived Tables and Volatile Tables After completing this module, you will be able to: Use permanent tables for ad-hoc queries. Use both forms of Derived table syntax. Recognize variations for each form

More information

Tables From Existing Tables

Tables From Existing Tables Creating Tables From Existing Tables After completing this module, you will be able to: Create a clone of an existing table. Create a new table from many tables using a SQL SELECT. Define your own table

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

INDEX. 1 Basic SQL Statements. 2 Restricting and Sorting Data. 3 Single Row Functions. 4 Displaying data from multiple tables

INDEX. 1 Basic SQL Statements. 2 Restricting and Sorting Data. 3 Single Row Functions. 4 Displaying data from multiple tables INDEX Exercise No Title 1 Basic SQL Statements 2 Restricting and Sorting Data 3 Single Row Functions 4 Displaying data from multiple tables 5 Creating and Managing Tables 6 Including Constraints 7 Manipulating

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Greenplum SQL Class Outline

Greenplum SQL Class Outline Greenplum SQL Class Outline The Basics of Greenplum SQL Introduction SELECT * (All Columns) in a Table Fully Qualifying a Database, Schema and Table SELECT Specific Columns in a Table Commas in the Front

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

DB2 SQL Class Outline

DB2 SQL Class Outline DB2 SQL Class Outline The Basics of SQL Introduction Finding Your Current Schema Setting Your Default SCHEMA SELECT * (All Columns) in a Table SELECT Specific Columns in a Table Commas in the Front or

More information

Subquery: There are basically three types of subqueries are:

Subquery: There are basically three types of subqueries are: Subquery: It is also known as Nested query. Sub queries are queries nested inside other queries, marked off with parentheses, and sometimes referred to as "inner" queries within "outer" queries. Subquery

More information

Teradata Database SQL Fundamentals

Teradata Database SQL Fundamentals Teradata Database SQL Fundamentals Release 16.20 April 2018 B035-1141-162K Copyright and Trademarks Copyright 2000-2018 by Teradata. All Rights Reserved. All copyrights and trademarks used in Teradata

More information

Wentworth Institute of Technology COMP2670 Databases Spring 2016 Derbinsky. Physical Tuning. Lecture 12. Physical Tuning

Wentworth Institute of Technology COMP2670 Databases Spring 2016 Derbinsky. Physical Tuning. Lecture 12. Physical Tuning Lecture 12 1 Context Influential Factors Knobs Database Design Denormalization Query Design Outline 2 Database Design and Implementation Process 3 Factors that Influence Attributes w.r.t. Queries/Transactions

More information

Contact: / Website:

Contact: / Website: UNIQUE FEATURES OF VINAY TECH HOUSE: COMPLETELY PRACTICAL, REAL TIME AND PROJECT ORIENTED RICH THEORY WITH OPTIMIZATION TECHNIQUES CERTIFICATION BASED TRAINING*** VIEW POINT BASED EXPLANATION *** TERADATA

More information

My grandfather was an Arctic explorer,

My grandfather was an Arctic explorer, Explore the possibilities A Teradata Certified Master answers readers technical questions. Carrie Ballinger Senior database analyst Teradata Certified Master My grandfather was an Arctic explorer, and

More information

Creating and Managing Tables Schedule: Timing Topic

Creating and Managing Tables Schedule: Timing Topic 9 Creating and Managing Tables Schedule: Timing Topic 30 minutes Lecture 20 minutes Practice 50 minutes Total Objectives After completing this lesson, you should be able to do the following: Describe the

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Wentworth Institute of Technology COMP570 Database Applications Fall 2014 Derbinsky. Physical Tuning. Lecture 10. Physical Tuning

Wentworth Institute of Technology COMP570 Database Applications Fall 2014 Derbinsky. Physical Tuning. Lecture 10. Physical Tuning Lecture 10 1 Context Influential Factors Knobs Denormalization Database Design Query Design Outline 2 Database Design and Implementation Process 3 Factors that Influence Attributes: Queries and Transactions

More information

Teradata and Temporal Tables for DAMA

Teradata and Temporal Tables for DAMA Teradata and Temporal Tables for DAMA Tom Coffing (Tera-Tom) CEO, Coffing Data Warehousing Tom.Coffing@CoffingDW.Com Direct: 513 300-0341 Website: www.coffingdw.com Relational Databases store their data

More information

Introduction to Computer Science and Business

Introduction to Computer Science and Business Introduction to Computer Science and Business This is the second portion of the Database Design and Programming with SQL course. In this portion, students implement their database design by creating a

More information

7. Query Processing and Optimization

7. Query Processing and Optimization 7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one

More information

Course Outline and Objectives: Database Programming with SQL

Course Outline and Objectives: Database Programming with SQL Introduction to Computer Science and Business Course Outline and Objectives: Database Programming with SQL This is the second portion of the Database Design and Programming with SQL course. In this portion,

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] 1. What is DBMS? A Database Management System (DBMS) is a program that controls creation, maintenance and use

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

SQL Coding Guidelines

SQL Coding Guidelines SQL Coding Guidelines 1. Always specify SET NOCOUNT ON at the top of the stored procedure, this command suppresses the result set count information thereby saving some amount of time spent by SQL Server.

More information

University of Waterloo Midterm Examination Sample Solution

University of Waterloo Midterm Examination Sample Solution 1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,

More information

The Relational Algebra

The Relational Algebra The Relational Algebra Relational Algebra Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) 27-Jan-14

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

Greenplum Architecture Class Outline

Greenplum Architecture Class Outline Greenplum Architecture Class Outline Introduction to the Greenplum Architecture What is Parallel Processing? The Basics of a Single Computer Data in Memory is Fast as Lightning Parallel Processing Of Data

More information

Aster Data Basics Class Outline

Aster Data Basics Class Outline Aster Data Basics Class Outline CoffingDW education has been customized for every customer for the past 20 years. Our classes can be taught either on site or remotely via the internet. Education Contact:

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Top 6 SQL Query Interview Questions and Answers

Top 6 SQL Query Interview Questions and Answers Just my little additions, remarks and corrections to Top 6 SQL Query Interview Questions and Answers as published on http://javarevisited.blogspot.co.nz/2017/02/top-6-sqlquery-interview-questions-and-answers.html

More information

REPORT ON SQL TUNING USING INDEXING

REPORT ON SQL TUNING USING INDEXING REPORT ON SQL TUNING USING INDEXING SUBMITTED BY SRUNOKSHI KANIYUR PREMA NEELAKANTAN CIS -798 INDEPENDENT STUDY COURSE PROFESSOR Dr.TORBEN AMTOFT Kansas State University Page 1 of 38 TABLE OF CONTENTS

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

Teradata Database Architecture Overview

Teradata Database Architecture Overview Teradata Database Architecture Overview Todd Walter Chief Technologist Teradata #TDPARTNERS16 Session #637 11:30 12:15 Sunday, September 11, in C101 1 GEORGIA WORLD CONGRESS CENTER 2500 BC Building Pyramids

More information

SQL Interview Questions

SQL Interview Questions SQL Interview Questions SQL stands for Structured Query Language. It is used as a programming language for querying Relational Database Management Systems. In this tutorial, we shall go through the basic

More information

Aster Data SQL and MapReduce Class Outline

Aster Data SQL and MapReduce Class Outline Aster Data SQL and MapReduce Class Outline CoffingDW education has been customized for every customer for the past 20 years. Our classes can be taught either on site or remotely via the internet. Education

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

AURA ACADEMY Training With Expertised Faculty Call Us On For Free Demo

AURA ACADEMY Training With Expertised Faculty Call Us On For Free Demo AURA ACADEMY Course Content: TERADATA Database 14 1. TERADATA Warehouse and Competitiveness Opportunities of TERADATA in the enterprise. What is TERADATA-RDBMS/DWH? TERADATA 14 & Other versions( 13.10,13,12,v2r7/r6/r5)

More information

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model Database Design Section1 - Introduction 1-1 Introduction to the Oracle Academy o Give examples of jobs, salaries, and opportunities that are possible by participating in the Academy. o Explain how your

More information

TotalCost = 3 (1, , 000) = 6, 000

TotalCost = 3 (1, , 000) = 6, 000 156 Chapter 12 HASH JOIN: Now both relations are the same size, so we can treat either one as the smaller relation. With 15 buffer pages the first scan of S splits it into 14 buckets, each containing about

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity COSC 416 NoSQL Databases Relational Model (Review) Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Model History The relational model was proposed by E. F. Codd

More information

Teradata Certified Professional Program Teradata V2R5 Certification Guide

Teradata Certified Professional Program Teradata V2R5 Certification Guide Professional Program Teradata Certification Guide The Professional Program team welcomes you to the Teradata Certification Guide. The guide provides comprehensive information about Teradata certification

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Oracle Syllabus Course code-r10605 SQL

Oracle Syllabus Course code-r10605 SQL Oracle Syllabus Course code-r10605 SQL Writing Basic SQL SELECT Statements Basic SELECT Statement Selecting All Columns Selecting Specific Columns Writing SQL Statements Column Heading Defaults Arithmetic

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

Vendor: IBM. Exam Code: Exam Name: IBM Certified Specialist Netezza Performance Software v6.0. Version: Demo

Vendor: IBM. Exam Code: Exam Name: IBM Certified Specialist Netezza Performance Software v6.0. Version: Demo Vendor: IBM Exam Code: 000-553 Exam Name: IBM Certified Specialist Netezza Performance Software v6.0 Version: Demo QUESTION NO: 1 Which CREATE DATABASE attributes are required? A. The database name. B.

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L11: Physical Database Design Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR, China

More information

1. Data Definition Language.

1. Data Definition Language. CSC 468 DBMS Organization Spring 2016 Project, Stage 2, Part 2 FLOPPY SQL This document specifies the version of SQL that FLOPPY must support. We provide the full description of the FLOPPY SQL syntax.

More information

Exam Name: Netezza Platform Software v6

Exam Name: Netezza Platform Software v6 Vendor: IBM Exam Code: 000-553 Exam Name: Netezza Platform Software v6 Version: DEMO 1.Which CREATE DATABASE attributes are required? A. The database name. B. The database name and the redo log file name.

More information

Questions lead to knowledge A Teradata Certified Master answers readers technical queries.

Questions lead to knowledge A Teradata Certified Master answers readers technical queries. Questions lead to knowledge A Teradata Certified Master answers readers technical queries. A Carrie Ballinger Senior database analyst, Special Projects Teradata Certified Master V2R5 sking questions as

More information

COSC 304 Introduction to Database Systems. Views and Security. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 304 Introduction to Database Systems. Views and Security. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 304 Introduction to Database Systems Views and Security Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Views A view is a named query that is defined in the database.

More information

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000 192 Chapter 14 5. SORT-MERGE: With 52 buffer pages we have B> M so we can use the mergeon-the-fly refinement which costs 3 (M + N). TotalCost=3 (1, 000 + 1, 000) = 6, 000 HASH JOIN: Now both relations

More information

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488) Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Why Hash? Glen Becker, USAA

Why Hash? Glen Becker, USAA Why Hash? Glen Becker, USAA Abstract: What can I do with the new Hash object in SAS 9? Instead of focusing on How to use this new technology, this paper answers Why would I want to? It presents the Big

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL ORACLE UNIVERSITY CONTACT US: 00 9714 390 9000 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Operator Implementation Wrap-Up Query Optimization

Operator Implementation Wrap-Up Query Optimization Operator Implementation Wrap-Up Query Optimization 1 Last time: Nested loop join algorithms: TNLJ PNLJ BNLJ INLJ Sort Merge Join Hash Join 2 General Join Conditions Equalities over several attributes (e.g.,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2 Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,

More information

Chapter 17 Indexing Structures for Files and Physical Database Design

Chapter 17 Indexing Structures for Files and Physical Database Design Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Query Execution [15]

Query Execution [15] CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse

More information

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is

More information

Oracle Application Express Schema Design Guidelines Presenter: Flavio Casetta, Yocoya.com

Oracle Application Express Schema Design Guidelines Presenter: Flavio Casetta, Yocoya.com Oracle Application Express Schema Design Guidelines Presenter: Flavio Casetta, Yocoya.com about me Flavio Casetta Founder of Yocoya.com Editor of blog OracleQuirks.blogspot.com 25+ years in the IT 10+

More information

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein Student Name: Student ID: UCSC Email: Final Points: Part Max Points Points I 15 II 29 III 31 IV 19 V 16 Total 110 Closed

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply Recent desktop computers feature

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

SQL. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

SQL. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior SQL Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior 1 DDL 2 DATA TYPES All columns must have a data type. The most common data types in SQL are: Alphanumeric: Fixed length:

More information

MTA Database Administrator Fundamentals Course

MTA Database Administrator Fundamentals Course MTA Database Administrator Fundamentals Course Session 1 Section A: Database Tables Tables Representing Data with Tables SQL Server Management Studio Section B: Database Relationships Flat File Databases

More information

Database Optimization

Database Optimization Database Optimization June 9 2009 A brief overview of database optimization techniques for the database developer. Database optimization techniques include RDBMS query execution strategies, cost estimation,

More information

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015 Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Examples Unit 6 WS 2014/2015 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet.

More information

1 Writing Basic SQL SELECT Statements 2 Restricting and Sorting Data

1 Writing Basic SQL SELECT Statements 2 Restricting and Sorting Data 1 Writing Basic SQL SELECT Statements Objectives 1-2 Capabilities of SQL SELECT Statements 1-3 Basic SELECT Statement 1-4 Selecting All Columns 1-5 Selecting Specific Columns 1-6 Writing SQL Statements

More information

Chapter 8: Working With Databases & Tables

Chapter 8: Working With Databases & Tables Chapter 8: Working With Databases & Tables o Working with Databases & Tables DDL Component of SQL Databases CREATE DATABASE class; o Represented as directories in MySQL s data storage area o Can t have

More information

A Unit of SequelGate Innovative Technologies Pvt. Ltd. All Training Sessions are Completely Practical & Real-time

A Unit of SequelGate Innovative Technologies Pvt. Ltd. All Training Sessions are Completely Practical & Real-time SQL Basics & PL-SQL Complete Practical & Real-time Training Sessions A Unit of SequelGate Innovative Technologies Pvt. Ltd. ISO Certified Training Institute Microsoft Certified Partner Training Highlights

More information

Oracle SQL & PL SQL Course

Oracle SQL & PL SQL Course Oracle SQL & PL SQL Course Complete Practical & Real-time Training Job Support Complete Practical Real-Time Scenarios Resume Preparation Lab Access Training Highlights Placement Support Support Certification

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

CMSC 461 Final Exam Study Guide

CMSC 461 Final Exam Study Guide CMSC 461 Final Exam Study Guide Study Guide Key Symbol Significance * High likelihood it will be on the final + Expected to have deep knowledge of can convey knowledge by working through an example problem

More information

File System Interface and Implementation

File System Interface and Implementation Unit 8 Structure 8.1 Introduction Objectives 8.2 Concept of a File Attributes of a File Operations on Files Types of Files Structure of File 8.3 File Access Methods Sequential Access Direct Access Indexed

More information

More SQL: Complex Queries, Triggers, Views, and Schema Modification

More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

Database Management Systems

Database Management Systems Database Management Systems Distributed Databases Doug Shook What does it mean to be distributed? Multiple nodes connected by a network Data on the nodes is logically related The nodes do not need to be

More information

New Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time!

New Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time! Lecture 13 Advanced Query Processing CS5208 Advanced QP 1 New Requirements Top-N/Bottom-N queries Interactive queries Decision making queries Tolerant of errors approximate answers acceptable Control over

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information