Implementation Techniques

Similar documents
Introduction to Temporal Database Research. Outline

R-Tree Based Indexing of Now-Relative Bitemporal Data

Chapter 18: Parallel Databases

Distributed KIDS Labs 1

An Introduction to Databases and Database Management Systems.

CS352 Lecture: Database System Architectures last revised 11/22/06

Leveraging Set Relations in Exact Set Similarity Join

11. Architecture of Database Systems

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.

Advanced Databases: Parallel Databases A.Poulovassilis

R-Tree Based Indexing of Now-Relative Bitemporal Data

Run-Time Environments/Garbage Collection

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Introduction to Databases

Violating Independence

6. Relational Algebra (Part II)

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

Introduction to Spatial Database Systems

UVA. Database Systems. Need for information

Smallworld Core Spatial Technology 4 Spatial data is more than maps tracking the topology of complex network models

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

6. Parallel Volume Rendering Algorithms

Effective Timestamping in Databases Kristian Torp, Christian S. Jensen, and Richard T. Snodgrass

A7-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Light-Weight Indexing of General Bitemporal Data

Time It's present everywhere, but occupies no space. We can measure it, but we can't see it, touch it, get rid of it, or put it in a container. Everyo

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

Heckaton. SQL Server's Memory Optimized OLTP Engine

Multidimensional Data and Modelling - DBMS

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Distributed OS and Algorithms

CS317 File and Database Systems

Suggested Experience Required Exams Recommended Teradata Courses. TE Teradata 12 Basics

7. Query Processing and Optimization

Theorem 2.9: nearest addition algorithm

DATABASE MANAGEMENT SYSTEM SHORT QUESTIONS. QUESTION 1: What is database?

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

What is Mutation Testing?

Big Data Management and NoSQL Databases

INFORMATION TECHNOLOGY COURSE OBJECTIVE AND OUTCOME

Constructing Communication Subgraphs and Deriving an Optimal Synchronization Interval for Distributed Virtual Environment Systems

CS352 Lecture - Concurrency

Software Testing Fundamentals. Software Testing Techniques. Information Flow in Testing. Testing Objectives

Spatio-Temporal Databases: Contentions, Components and Consolidation

Fundamentals of STEP Implementation

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Advanced Database Systems

CS352 Lecture - Concurrency

Chapter 2. DB2 concepts

Introduction To Computers

Lecture 01. Fall 2018 Borough of Manhattan Community College

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems

CPS352 Lecture: Database System Architectures last revised 3/27/2017

Module 9: Selectivity Estimation

Database Management Systems: An Architectural View

CHAPTER 3 A TIME-DEPENDENT k-shortest PATH ALGORITHM FOR ATIS APPLICATIONS

Query Processing & Optimization

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Microsoft. [MS20762]: Developing SQL Databases

Chapter 25: Spatial and Temporal Data and Mobility

Database System Concepts

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Introduction to Software Engineering

Code Compaction Using Post-Increment/Decrement Addressing Modes

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

Lecture 02. Fall 2017 Borough of Manhattan Community College

Chapter 13: Query Processing

(All chapters begin with an Introduction end with a Summary, Exercises, and Reference and Bibliography) Preliminaries An Overview of Database

Hierarchical Intelligent Cuttings: A Dynamic Multi-dimensional Packet Classification Algorithm

DBMS (FYCS) Unit - 1. A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information.

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN. F. W. Zurcher B. Randell

Concurrent & Distributed Systems Supervision Exercises

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Lecture2: Database Environment

Developing SQL Databases

Temporal Aggregation and Join

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 13 Managing Transactions and Concurrency

HISTORICAL BACKGROUND

Chapter 12: Query Processing. Chapter 12: Query Processing

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Database Systems: Learning Outcomes. Examples of Database Application. Introduction

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

Unit 2. Unit 3. Unit 4

Query Plans for Conventional and Temporal Queries Involving Duplicates and Ordering

About Database Adapters

3 No-Wait Job Shops with Variable Processing Times

CS 525 Advanced Database Organization - Spring 2017 Mon + Wed 1:50-3:05 PM, Room: Stuart Building 111

COSC 3351 Software Design. An Introduction to UML (I)

Using Mutation to Automatically Suggest Fixes for Faulty Programs

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

CSE 544, Winter 2009, Final Examination 11 March 2009

Transcription:

V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight Indexing of General Bitemporal Data 38 Layered Temporal DBMS s Concepts and Techniques 39 Stratum Approaches to Temporal DBMS Implementation 40 Effective Timestamping in Databases Previous parts focus on enhancing our understanding of the temporal aspects of data and on the provision of additional built-in DBMS support for managing temporal data, covering database manipulation as well as database design. The present part takes a next logical step, that of providing new techniques where existing techniques fall short for implementing parts of the proposed temporal support. Common to the implementation techniques, each aims to realize some database functionality. But it is instructive to make an analytical distinction between threemorespecific,possible goals of implementation techniques. A technique may simply aim to constructively demonstrate that some perceived functionality can be implemented. This is a typical goal of a proof-of-concept implementation. Another more ambitious objective of a technique is to offer an efficient implementation. In this situation, other implementations may possibly exist, and performance studies are typically conducted that explore the performance of the implementation and perhaps compare it to the performance of alternative techniques. The last goal of an implementation technique is that of being an effective technique: in this case, it is 1103

1104 IMPLEMENTATION TECHNIQUES the ease with which the technique permits us to realize a certain functionality that is the point of focus. An easy means of realizing a certain functionality is clearly preferable to a more difficult one. A separate distinction useful for characterizing implementation techniques is the DBMS architecture assumed by the techniques we distinguish between two. Techniques that depend on the ability to customize various components of the DBMS in which they are to be embedded, we say assume an integrated architecture. For example, an implementation technique may depend on a certain behavior of the data manager component of the DBMS and is thus dependent on the ability to customize the data manager. An integrated architecture may be the one that most readily leans itself towards efficient implementations, but it may also not be an effective architecture. Perhaps most importantly, the dependence on an integrated architecture of an implementation technique makes the techniques relevant only to those with access to the source code of pre-existing DBMSs. The layered and extensible-system architectures represent a natural opposite to the integrated architecture. Implementation techniques that assume these types of architectures for their enhancements to the functionality of a DBMS assume that the DBMS is a black box and attempt to reuse the services of the DBMS for their benefit. In this case, the implementation technique may be seen as an advanced application that runs on-top of the DBMS or inside the DBMS, depending on the extensibility features offered by the DBMS. This general type of architecture may lend itself to more effective implementation of functionality, but its inherently reduced flexibility may also hinder efficient implementation. The implementation techniques presented in this part provide good coverage of this three-by-two space, with objective and assumed architecture as the dimensions. The conventional join operation is important because of its frequent use, because of its inherent O(n 2 ) worst-case complexity, enforced by the size of the result of the operation, and because the operation can frequently be computed with higher efficiency than that of the brute-force, nested-loop approach. Temporal-join operations not only involve equality predicates, for which existing techniques are optimized, but typically also involve inequality predicates on the time attributes, e.g., to implement an overlap predicate. Inequality predicates typically invalidate existing, advanced non-temporal join techniques, creating a need for special temporal-join techniques. Chapter 34 introduces a temporal-join algorithm based on tuple partitioning. This algorithm avoids the quadratic cost of nested-loop evaluation methods; it also avoids sorting. Performance comparisons between the partition-based algorithm and other evaluation methods are provided. The join algorithm aims to achieve an efficient implementation of an operation for which other implementations exist; and an integrated architecture is assumed.

1105 Chapter 35 proceeds to offer an efficient technique for part of the computation of another fundamental temporal-database operation, namely the timeslice operation, which retrieves the state, or timeslice, that was was current at a specific point in time from a relation with transaction-time support. Data is stored as a log of database changes, and timeslices are computed by traversing the log, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. The new timeslice can be computed by either incrementally updating the earlier outset or decrementally down-dating the later outset using the log. The chapter provides an efficient algorithm that uses a new data structure, the Pointer-Less Insertion Tree, for determining which outset is the most efficient one to use. The new algorithm aims to offer a more efficient solution than has so far been available, and in doing so assumes an integrated architecture. Specifically, the algorithm make quite strong assumptions about the data storage layouts, which are difficult to satisfy without the control offered by an integrated architecture. The chapter reports on the performance characteristics of the algorithm and includes relevant comparisons. The two previous chapters presented techniques intended for valid-time and transaction-time databases, respectively. The following two chapters occurs in the context of bitemporal databases, where both the valid time and transaction time of the data are recorded. The two chapters offer different techniques for indexing the bitemporal extents of data, thus offering competing foundations for the efficient evaluation of predicates that involve the valid and transaction time of data. Like spatial data, bitemporal data thus has associated two-dimensional regions. Such data is in part naturally now-relative: some data is currently true in the mini-world or is part of the current database state. So, unlike for spatial data, the regions of now-relative bitemporal data grow continuously. Existing indices, including commercially available indices such as B + - and R-trees, do not contend well with even small amounts of now-relative bitemporal data. Chapter 36 proposes two extended R -trees that are capable of indexing the possibly growing two-dimensional regions associated with bitemporal data, by also letting the internal bounding regions grow. Internal bounding regions may be triangular as well as rectangular. New heuristics for the algorithms that govern the index structure are provided. As a result, dead space and overlap, now also functions of time, are reduced. This chapter s focus is on efficiency, and performance studies indicate that the best extended index is typically significantly faster than the existing R-tree based indices. Although an integrated architecture is assumed, a continuation of this work has resulted in a prototype implementation applicable to an extensible DBMS architecture. However, further improvements of the existing extensible architectures are necessary before this research is practically applicable beyond integrated architectures.

1106 IMPLEMENTATION TECHNIQUES Chapter 37 proposes a new indexing technique that eliminates the different kinds of growing data regions by means of transformations and then indexes the resulting stationary data regions with four R -trees; and queries on the original data are correspondingly transformed to queries on the transformed data. Extensive performance studies are reported that provide insight into the characteristics and behavior of the four trees storing differently-shaped regions, and they indicate that the new technique yields a performance that is competitive with the best existing index from the previous chapter; and unlike this existing index, the new technique is readily applicable in a layered architecture. The next three chapters differ from the previous ones in several respects. They do not consider individual, isolated operations, but rather adopt a holistic view and consider the general challenges that occur when attempting to implement a temporal SQL. The traditional approach has been to assume an integrated architecture for implementing a temporal SQL: it has been assumed that a temporal DBMS must be built from scratch, utilizing new technologies for storage, indexing, query optimization, concurrency control, and recovery. (This was also assumed in Chapters 20 and 21.) In contrast these three chapters assume a layered architecture, and their main emphasis is on effective implementation rather than on efficient implementation, although the latter remains a concern. In addition, the last chapter addresses the problems of simply ensuring a correct implementation under the restrictions imposed by a layered architecture. Chapter 38 introduces and explores the concepts and techniques involved in implementing a temporally enhanced SQL while maximally reusing the facilities of an existing SQL implementation. The topics covered span the choice of an adequate timestamp domain that includes the variable now, a query processing architectures, and transaction processing. Chapter 39 proceeds to identify three layered, or stratum, meta-architectures, each with several specific architectures. Based on a new set of evaluation criteria, advantages and disadvantages of the specific architectures are identified. The chapter also classifies all existing temporal DBMS implementations according to the specific architectures they employ. It is concluded that a stratum architecture is the best short, medium, and perhaps even long-term approach to implementing a temporal DBMS. Chapter 40 delves into a challenge faced by all proposals for associating timestamp values, denoting valid and transaction time, with data. With few exceptions, the assignment of timestamp values has been considered only in the context of individual modification statements. This chapter goes further and considers timestamping in the context of transactions, and in a layered architecture. The chapter initially identifies and analyzes several problems with straightforward timestamping, then proceeds to propose techniques aimed at solving these problems. Timestamping the results of a transaction with the commit time of the transaction

1107 is a promising approach, and a spectrum of techniques for how to do this are explored. Next, although many database facts are valid until the current time, now,this value is absent from the existing time data types. Techniques using different substitute values are explored, and the performance of the different proposed techniques are studied. The result is a comprehensive approach that provides application programmers with simple, consistent, and efficient support for modifying bitemporal databases in the context of user transactions.