Design and Implementation of Transactions in a Column-Oriented In-Memory Database System

Size: px
Start display at page:

Download "Design and Implementation of Transactions in a Column-Oriented In-Memory Database System"

Transcription

1 Design and Implementation of Transactions in a Column-Oriented In-Memory Database System Markus Olsson March 16, 2010 Master s Thesis in Computing Science, 30 ECTS credits Supervisor at CS-UmU: Stephen Hegner Examiner: Fredrik Georgsson Umeå University Department of Computing Science SE UMEÅ SWEDEN

2

3 Abstract Coldbase is a column-oriented in-memory database implemented in Java that is used with a specific workload in mind. Coldbase is optimized to receive large streams of timestamped trading data arriving at a fast pace while allowing simple but frequent queries that analyse the data concurrently. By limiting the functionality, Coldbase is able to reach a high performance while the memory consumption is low. This thesis presents ColdbaseTX which is an extension to Coldbase that adds support for transactions. It uses an optimistic approach by storing all writes of a transaction locally and applying them when the transaction commits. Readers are separated from writers by using two versions of the data which makes it possible to guarantee that readers are never blocked. Benchmarks compare Coldbase to ColdbaseTX regarding both performance and memory efficiency. The results show that ColdbaseTX introduces a small overhead in both memory and performance which however is deemed acceptable since the gain is support for transactions.

4 ii

5 Contents 1 Introduction Report Outline Problem Description Goal Restrictions Methods Coldbase 5 3 Approaches to Concurrency Control Software Transactional Memory STM of Fraser (FSTM) Dynamic STM (DSTM) Discussion Traditional Concurrency Control Techniques Timestamp Ordering Multiversion Concurrency Control Optimistic Concurrency Control Design and Implementation General Design Vector Accessor Write Accessor Column Manager Versions Class Hierarchy Apply Data Retrieve a Column When to Reserve Versions Switch Versions iii

6 iv CONTENTS Append Data Array Switcher Gather Information Validate Read Set Switch Versions Force Commit Transaction Tables Private Table Public Table Queries Select Update Results Performance Memory Concurrency Isolation Interfaces Special Purpose Conclusions Limitations Future Work Acknowledgements 55 References 57 A Alternate Designs 59 A.1 STM A.2 Reserved Value A.3 Vector STM A.4 Sectioned Array B Users Guide 67 B.1 Update Query B.2 Transaction

7 CONTENTS v C Benchmarks 69 C.1 Quote Data C.2 Arguments C.3 Memory Benchmark C.4 Accessor Benchmark C.5 Append Benchmark C.6 Query Benchmarks C.6.1 Simple Query C.6.2 Query with Condition C.6.3 Complex Query

8 vi CONTENTS

9 List of Figures 3.1 Example of a transaction opening a shared object for writing The basic structure of a TMObject pointing to a locator Example of a transaction opening an object for writing when the targeted object is owned by a transaction that is aborted. Since the transaction is aborted, the old object version is considered the current Illustration of how the read and write phases of two transactions can be interleaved. The conditions for the three scenarios are explained in the text Illustrates a table divided by a fill marker Illustrates an overview of the system where a transaction modifies a column Class diagram of the column managers Describes the timeline where T 1 executes Illustrates how the transaction T 1 and the column manager must wait for T 2 to finish its writes before the versions can be switched Illustrates how T 2 causes a deadlock in column C 3 when failing to validate its writeset. Two operations are used in the column, validating the writes of a transaction (v), and receiving a notification from a transaction (n) Illustrates how a query retrieves a vector from a table Illustrates the problem when the reservation and the array switch operation are interleaved Illustrates the flow of the switch thread Graph illustrating the dependencies between the columns Illustrates how information is passed when column managers try to switch their versions and the action taken by the array switcher when receiving the information Illustrates how the array switcher manages to switch multiple columns in one atomic operation vii

10 viii LIST OF FIGURES A.1 Structure of the design using transactional objects as the base of the column. To retrieve the current value of a row, the transactional object must be opened either for reading or writing A.2 The execution shows how a transaction might get an inconsistent view of the data when updating the values one at a time A.3 Structure of the design using reserved values and a separate list containing modified values. When a transaction reads a reserved value (indicated by -) it finds the current value by looking up the same index in the mod list. 61 A.4 Structure of the design allowing transactions to retrieve entire columns during writes A.5 Structure of the design dividing the array into sections. Every index of the array is a section, in this case containing four values C.1 Illustrates the access test were the accessor is the only implemented subclass in the hierarchy C.2 Illustrates the access test were the accessor is one of many implemented subclasses in the hierarchy C.3 Illustrates the access test where the accessor is a single class separating the different modes using simple logic C.4 Illustrates the append test comparing the performance of Coldbase to ColdbaseTX when values are appended to a non-keyed table C.5 Illustrates the append test comparing the performance of Coldbase to ColdbaseTX when values are appended to a keyed table C.6 Illustrates the performance of Coldbase compared to ColdbaseTX for a simple query C.7 Illustrates the performance of Coldbase compared to ColdbaseTX for a query with a condition C.8 Illustrates the performance of Coldbase compared to ColdbaseTX for a complex query

11 List of Tables 3.1 Compatibility matrix for the different types of locks C.1 Shows the number of bytes allocated for different number of elements using a single column C.2 Shows the number of bytes allocated for different numbers of columns each holding 100,000 elements ix

12 x LIST OF TABLES

13 Chapter 1 Introduction In the financial sector, it is important for companies to be able to analyze trading data in order to get a view of how the market is behaving. By using this knowledge it is possible for the companies to make calculated business decisions. Nomura is one of the largest investment banks in the world. Their office in Umeå is responsible for developing advanced systems that handle trading in the largest stock markets in the world. At Nomura, Johan Jonsson has implemented a conceptual column-oriented in-memory database called Coldbase as part of his thesis work [1]. Omas Jakobsson is currently working on a thesis that extends Coldbase with secondary storage abilities. Coldbase is a fast database, designed for a specific workload. The workload contains large streams of timestamped financial data that is continuously appended to the bottom of the database tables. Coldbase allows users to work with the data concurrently while new data is added to the tables. By working in main memory and using a column-oriented design, Coldbase is able to provide fast analysis of the data, especially when only a subset of all columns are included in the query, e.g. summarizing the provision made by a specific company during a period of time. Typical queries in the database include large bulk selects or aggregations over multiple columns using simple logic. In the thesis made by Jonsson the main constructs of Coldbase are implemented. The result is an efficient database that allows fast execution of queries and also keeps the memory consumption low. The solution, however, does not contain any support for transactions and users are synchronized using table level locks. The thesis of Jakobsson will add a secondary storage solution to Coldbase. Given the large amount of data that is received, this solution makes it possible to move less frequently used data to a secondary storage, e.g. disk-based storage. This thesis aims to extend Coldbase with support for transactions. 1.1 Report Outline The following sections in this chapter define the problem that is solved in this thesis. In Chapter 2, a general overview of Coldbase is presented, giving the reader an idea of how Coldbase is implemented and how it executes queries. In Chapter 3, a study of different approaches to concurrency control is presented. The chapter presents both new and traditional algorithms used for controlling concurrency. The aim of the chapter is to present ideas of how transactions can be integrated in Coldbase. Chapter 4 first 1

14 2 Chapter 1. Introduction presents the general design of the implementation, giving the reader an idea of how the system will work. Using a bottom-up approach it then describes the different parts of the system and how they interact with each other. Chapter 5 evaluates the design and implementation of ColdbaseTX and discusses the benchmarks performed on the system. Chapter 6 presents the final conclusions together with a discussion about limitations and future work on ColdbaseTX. 1.2 Problem Description This thesis aims to add support for database transactions in Coldbase. The purpose of a transaction is to group a number of database operations into a single atomic operation. Consider a banking example where funds are moved from one account to another. The execution generally consists of two operations; Debit(100$, account1) Credit(100$, account2) During execution it is possible that the system crashes before the second operation is executed. In this case the 100$ will be lost, leaving the database in an inconsistent state. If the operations were executed within a transaction, both operations would be grouped into a single atomic operation that would either succeed entirely, applying all modifications to the database or fail, leaving the database unchanged. However, this example only illustrates the atomicity property of a transaction. A transaction must be reliable and a widely used acronym is ACID which describes four vital properties of a reliable transaction [2, p.606]. Atomicity - A transaction is atomic in the sense that it either succeeds completely applying all modifications to the database or it fails, leaving the database unmodified. Consistency - A transaction should bring the database from one consistent state to another, i.e. the resulting state of the database after the transaction has been executed should adhere to all rules set up for the database. Isolation - A transaction should be executed in isolation from other transactions, i.e. the changes made during the transaction should not be visible outside the scope of the transaction before the transaction has successfully committed. Durability - The changes made by a transaction should persist in the database, i.e. they should not be lost even if the system crashes. The first three properties are considered when implementing transactions in Coldbase. The durability property, however, is related to using a secondary storage and is already considered in another thesis Goal The goal of this thesis is to design and implement support for transactions in Coldbase. Software Transactional Memory (STM) is a popular topic in contemporary computer science research. It provides mechanisms similar to database transactions for controlling access to shared resources as an alternative to traditional lock-based synchronization.

15 1.2. Problem Description 3 By examining the algorithms used in STM as well as algorithms used in traditional databases a design shall be developed that integrates transactions in Coldbase. The design shall be implemented as a proof-of-concept. It is an open question whether support for transactions can be added to Coldbase while keeping the database efficient. Because of this the requirements presented below are general. Coldbase is designed to execute in main memory and is column oriented. The reason for this is that it must be fast. When support for transactions is added the solution must still focus of efficiency and avoid any performance bottlenecks. Since Coldbase executes in main memory, the amount of memory available is limited. It is important that the memory overhead inflicted by transactions be kept small. Coldbase is optimized for read-only queries that executes concurrently. When adding support for transactions the focus must be to keep the read-only queries as fast as possible. If possible, readers should never be blocked. Transactions that write data should be able to see changes they have made during the execution of the transaction. These changes should however not be visible outside the scope of the transaction before it has successfully committed. Coldbase has a query interface that is easy to extend. This interface should be kept unmodified. The design should avoid using lock-based synchronization to the extent possible. The report should reason about the different design decisions made Restrictions To limit the scope of the thesis the following restrictions have been imposed: The durability property of ACID has been considered in another thesis and is not included in this thesis. The isolation property can be defined at different levels. The solution in this thesis shall aim at the read-committed isolation level, i.e. the transaction should only read data that has been committed. The database does not allow long-running queries, e.g. queries that require input from the user during its execution. The database does not support inter-relational constraints, e.g. foreign keys Methods Since Coldbase is written in Java, the extended support for transactions will also be written in Java. The requirements of the transactions are general. The resulting design will use ideas taken from the in-depth study made on existing approches to concurrency control. The evaluation will consist of comparisons against the original Coldbase implementation, that has no concept of transactions and uses table-level locks to synchronize processes.

16 4 Chapter 1. Introduction

17 Chapter 2 Coldbase In this chapter a general overview of Coldbase is presented. It is important that the reader is familiar with how Coldbase is designed and how it executes queries, at least the general concepts that are described in this chapter. For more information about Coldbase, the reader is referred to the original thesis paper [1]. Coldbase is a column-oriented in-memory database that was implemented as part of a thesis work done in 2009 [1]. Coldbase is small, storing data in its primitive form and is also fast, accessing data directly through an array reference. Coldbase is implemented entirely in Java. When implementing Coldbase, a requirement of the design was to create an extendable query interface that is easy to use. The query class presents the user with an interface that makes it possible to execute SQL-like queries. Queries are built using a construct which can be referred to as query objects. Query objects are composable, making it possible for the user to create different types of queries by composing a number of query objects. To explain what a query object is in this context and how a query is executed, consider the SQL query SELECT id FROM table WHERE size < limit. Listing 2.1 illustrates how the query is implemented in Coldbase. new Select ( new Column []{ // Columns to display new Column (" id ") }, table, // Table containing data new Condition []{ // Conditions new Comparison ( new Column (" size "), Comparison. Type.LT, new Column (" limit ")) },null, null ); Listing 2.1: Example of how a simple select query is written in Coldbase In this query, Column and Comparison are query objects. The Comparison object is composed of two Column objects. When the query is executed it will start by retrieving the columns size and limit from a table. The table returns the columns represented as Vector objects. These vectors are implemented to represent a column in Coldbase 5

18 6 Chapter 2. Coldbase and should not be confused with regular Java vectors. The vectors implement methods that allow a query object to access the underlying data. For example, the Comparison object will use the comparison method lt() to compare the two vectors, resulting in an index vector which points to the rows in the table where size < limit. The index vector is the result of the comparison and if further conditions were defined this index vector would be used in their operations. Finally, the column id is retrieved as a vector from the table. The index vector is applied to extract the requested rows and the resulting id vector is returned in a private table. The table class in Coldbase holds vector objects that represent columns. It implements general methods that allow a query object to retrieve a column. It also allows a single process to append values to the bottom of the table. The table is separated in two sections by a fill marker. The fill marker is basically an integer that indicates the last row of the table. The rows above the fill marker represent the current data in the table and are accessible to queries. Access to this section is controlled by the table using table-level locks. The appender applies data below the fill marker with no need for synchronization since there may only be one appending process at a time. When data has been appended the fill marker is incremented, making the changes visible for new queries. The vector class is a large part of the system. Since vectors represent columns in tables they are responsible for storing the data. Data is stored in arrays containing primitive values in order to minimize the memory overhead. The functionality of the vectors differs from ordinary Java Vectors in that they do not implement methods for adding or retrieving values in the same way. The methods implemented in the vector class operate on the data using vector operations. As an example, take the execution vec1.add(vec2). The add operation will execute a row-wise addition of the elements in vec1 and vec2 and return the result as a new vector. By presenting an interface of similar methods, the vector class makes it possible to create different query objects and by composing the query objects it is possible to build different types of queries. This is a brief overview of how Coldbase works. The system implemented in this thesis is called ColdbaseTX. At this level of detail Coldbase and ColdbaseTX are quite similar. Most of the differences lie in how ColdbaseTX stores data and how it handles synchronization and access to the data.

19 Chapter 3 Approaches to Concurrency Control In this chapter the literature study of the thesis is presented and discussed. The aim of the study is to gather ideas of how transactions can be integrated into Coldbase. First, a new technology called Software Transactional Memory is presented. Next, approaches used in traditional databases are examined. The main focus lies on non-locking approaches but one of these approaches also describes how locks can be used for synchronization. 3.1 Software Transactional Memory The traditional approach to allowing concurrent access to shared data is to use locks. Although lock-based synchronization seems straightforward in small systems, the task of designing a synchronization scheme for larger systems is complex. The first problem that arises is what granularity should be used for the locks. A coarse granularity means that few locks are held; instead they protect large parts of the data. This approach leads to a simple implementation but scales poorly. An example in databases is when entire tables are locked. As the number of users increases, locks become a significant bottleneck slowing down concurrent users trying to access the table. Even if two users try to access disjoint parts of the table, they cannot do this concurrently. In a more finegrained solution users may lock single rows, allowing concurrent users to access disjoint rows in the table concurrently. This leads to a higher concurrency and scales better than a coarse-grained approach. However, it introduces an overhead in both memory when allocating more locks and in performance when managing the locks. When locks are used, the programmer must also be careful to avoid introducing deadlock, livelock, priority inversion and other problems inherent in locks. As the number of locks increases so does the complexity of avoiding these problems. In computers today, multicore processors have become standard instead of the old single core processor. A few years ago people relied on the fact that if they wanted their applications to run faster all they needed was a new, faster processor. Depending on the application, buying a new processor today does not necessarily result in a speed-up since the individual cores in a multicore processor are more likely slower than the old single core processors. Today the focus lies on developing concurrent applications in order to 7

20 8 Chapter 3. Approaches to Concurrency Control maximize the usage of all cores in the processor. As the number of cores increases it becomes more important to create concurrent applications that scale well. Software Transactional Memory (STM) is based on the idea of allowing concurrent access to shared data objects without worrying about locks. Using STM, the inherent problems of concurrently accessing shared data is abstracted from the user, simplifying the way concurrent programs can be developed [3, p.1]. STM provides a construct that allows the user to develop concurrent applications while the actual implementation of the application seems to be sequential. The only hint of concurrency is that accessing shared data must be done in atomic sections. The developer must be aware of which parts of the code are subject to concurrent access and place those parts within an atomic section. The complexity of synchronizing access to the atomic sections is handled by STM [3, p.1]. In order to achieve this, STM uses the concept of transactions when accessing shared data. Each transaction consist of a set of operations on the shared data which appear to be executed atomically. Either all operations are executed or none of them. To demonstrate the simplicity of developing concurrent constructs (such as lists) with STM, consider the following code: This method could be used when implementing public void replace ( Object oldvalue, Object newvalue ) { atomic { foreach ( element e in list ) { if( e. value equals oldvalue ) { e. value = newvalue } } } } Listing 3.1: Pseudocode showing how STM could be used a general concurrent list. The method iterates through the list and replaces all matching values with a new value. Placing the code in an atomic block in this case means that all modifications to the concurrent elements will be executed within a transaction, applying either all values in the end or none at all. The actual interface differs between different STM implementations and the interface presented above is very simplistic. The point however is that the developer is never concerned about how the concurrency is managed, e.g. acquiring and releasing locks. STM abstracts all the low-level synchronization allowing the user to focus on the implementation instead. Many different types of STM have been implemented and some of them will be presented in this section STM of Fraser (FSTM) FSTM is an object-based STM system designed by Keir Fraser in 2004 [4]. FSTM provides a dynamic programming interface where transactions can be started and committed at any point during the program execution. Within the transaction, shared objects are opened either for reading or writing. All writes to the objects are made to copies of the objects which are local to the transaction. When the transaction tries to commit, either all changes are made permanent atomically or all changes are discarded.

21 3.1. Software Transactional Memory 9 Since the writes are made to local objects, there is never any need to roll back changes which is otherwise common in other implementations. Actually, FSTM is similar to optimistic concurrency control in that it carries out the transaction making local writes and then tries to commit all changes Lock Freedom The algorithms of FSTM promise a strong progress guarantee called lock freedom. A non-blocking algorithm is lock-free if it guarantees that some thread always makes progress [5, p.1]. Regardless of the amount of concurrency present, system-wide progress is always guaranteed. A distinguishing feature of lock-free systems is that they must often introduce a help mechanism, allowing processes that are waiting to help the process which caused them to wait. By doing this, system-wide progress can be ensured. A drawback of lockfree implementations is that they are more complex than implementations of weaker progress guarantees since they must implement a helping function Design In FSTM, concurrent objects such as nodes in a linked list are wrapped into a structure called an object header. The sole purpose of the object header is to point to the most recently committed version of the concurrent object. The object header points either directly at the concurrent object or at the transaction that currently owns the object, or more specifically to the descriptor owned by the transaction. Owning an object in this case means that the transaction has in some sense locked the object for writing during the acquire phase of its commit operation. Each transaction owns a tx descriptor that keeps a history of the reads and writes made by the transaction and allows other transactions to monitor its status. The following values are included in the descriptor: a list of object handles for objects that have been opened for reading; a list of object handles for objects that have been opened for writing; a status flag for the transaction. When an object is opened for reading or writing by a transaction, an object handle is created, see Figure 3.1. The handle contains references to the object header, the concurrent object and a copy of the concurrent object called the shadow copy. All modifications carried out by the transaction are applied to the shadow copy which is local to the transaction. These changes are not visible outside the scope of the transaction until it commits. When an object is opened for reading there is no need for a shadow copy and this field in the handle is left empty. Every object handle created when opening a concurrent object is stored in the transaction descriptor; write handles in the write list and read handles in the read list Flow of FSTM A transaction is committed in several different phases. In the acquire phase, all objects that were opened in read-write mode are acquired. The header of each object is set to point at the transaction descriptor instead of the concurrent object. In this stage there

22 10 Chapter 3. Approaches to Concurrency Control Figure 3.1: Example of a transaction opening a shared object for writing. is a possibility that two transactions will conflict. If the acquiring transaction finds the descriptor of another transaction, instead of the concurrent object, when trying to repoint the object header, the acquiring transaction must help the conflicting transaction to finish its execution. When all objects have been successfully acquired, the transaction enters the read phase. In the read phase the transaction sets its status to read-checking, indicating that the transaction has atomically committed or aborted. In reality, the transaction has not yet reached the decision but changing the status to read-checking prevents other transactions to read the object headers owned by this transaction. This means that the transaction will appear to commit or abort atomically. The transaction verifies that none of the objects in its read list have been modified since they were last read. Based on this check, the transaction changes its status either to committed or aborted. The read-checking status is troublesome when conflicts occur. Consider the case when a transaction T 1 has written to an object and is currently in the read-checking phase of its commit operation. If another concurrent transaction T 2 opens the same object, trying to read or write, it will fail since it does not know if T 1 is aborted or committed, hence it does not know which version is the current valid version. In this case the readers will help the transaction to reach a decision before continuing their own execution. The readchecking state introduces a risk of deadlocks when verifying the objects in the read list. If two transactions, both in their read-checking state, try to read objects owned by the other transaction they will create a cycle when trying to help each other. A solution to this is that one of the transactions is aborted, based on a unique identifier calculated from the machine address. In the last phase, called release phase, the transaction iterates through all the modified objects, repointing the object header from the descriptor to the shadow copy. After this phase is completed, other transactions may acquire ownership of the object by repointing the object header to their transaction descriptor. The operations of a transaction are completely isolated from other transactions before the transaction tries to commit. This means that multiple transactions may open the same objects concurrently without causing a conflict. The only point where transactions must be synchronized is when they try to commit, becoming visible outside the transaction. Fraser claims that FSTM has a very lightweight commit operation allowing fast commits [4, p.38]. This is very important since a long running commit operation will increase the risk of conflicts against other transactions, leading to large overheads when transactions try to help each other.

23 3.1. Software Transactional Memory Dynamic STM (DSTM) DSTM is another object-based STM that supports dynamically sized data structures, thereby the name. DSTM was proposed by Herlihy, Moir, Luchangco and Scherer III in 2003 [6]. DSTM keeps a collection of transactional objects which point to different versions of the actual data objects. A transaction in DSTM consists of multiple operations which open transactional objects to read or modify the data object. When opening a transactional object, the transaction receives a copy of the data object, which means that all modifications are done locally. As with database transactions, the transactions of DSTM appear to be executed atomically, either applying all changes or none of them [6, p.1] Obstruction Freedom DSTM is based on a non-blocking progress guarantee called obstruction freedom which is weaker than the lock-freedom that is guaranteed by FSTM. A non-blocking algorithm is obstruction-free if it guarantees progress for any thread that eventually executes in isolation [5, p.1]. Basically, this means that if there is no synchronization conflict against any other thread, the thread will finish its execution in a finite number of steps. An advantage of obstruction-free algorithms, compared to lock-freedom algorithms, is that they are easier to implement. A large factor leading to reduced complexity is that in obstruction-free systems there is no need to implement complex helping functions. Instead, conflicting transactions may be aborted and left to retry later [4, p.15]. By guaranteeing obstruction freedom, the implementation of DSTM is less complex than that of FSTM. However, since concurrent threads might prevent each other from making progress, livelocks are possible in DSTM. To solve this problem, DSTM has introduced the concept of a contention manager which decides how a transaction should act when detecting a conflict [6, p.6] Design of DSTM The contention manager has a modular implementation and works as a plugin which can easily be replaced. A transaction in DSTM is able to detect that it will conflict with another transaction before the actual conflict takes place. When detecting a future conflict, the contention manager decides how the transaction should act. DSTM allows transactions to abort other transactions, which means that the contention manager can decide if the conflicting transaction should be aborted or if it should be allowed to finish its execution. An aggressive contention manager would abort immediately whereas a polite manager would do an exponential backoff, giving the conflicting transaction a chance to complete. This makes DSTM flexible as it is possible to modify it to some extent based on the expected scenario of use. Furthermore, since transactions may abort other conflicting transactions, it is possible to have different priorities of the transactions where transactions of higher priority abort transactions of lower priority. In DSTM transactional objects encapsulate regular Java objects in a similar way as object headers in FSTM. In order to access the actual data object, the transactional object must first be accessed. Each object to be encapsulated must implement an interface called Cloneable and thus be able to create a logically disjoint copy from itself.

24 12 Chapter 3. Approaches to Concurrency Control This copy, called a version, is used whenever a transactional object is opened and, like the shadow copy in FSTM, it is private to the transaction. Transactions are linearizable if they appear to take effect in a one-at-a-time order [6, p.1]. In order to guarantee linearizability, the transaction is continuously validated whenever an object is opened. This validation involves checking if any of the values read during the transaction have been modified by any other transaction. In such a case, the open method throws an exception telling the transaction that it will not be able to commit because it conflicts with another transaction. As an act to reduce contention, DSTM allows transactions to open objects in a readonly mode. This allows several transactions to read the same object without conflicting with each other. This is crucial when, for example, implementing a tree structure where the root node is frequently read but not modified. A conflict only arises if two transactions access the same element and at least one of them executes a write. To further increase concurrency, the transaction is given an option to release unneeded objects that have been opened in read-only mode, before committing the changes. This could be used, for example, when two transactions operate on a linked list. The first transaction iterates through the list, opening nodes for reading and releasing them as they move on to the next node. The other transaction finds an object to modify and is able to do so, without causing a conflict, because the other transaction has already read and released it. Logically a transactional object, called TMObject, consists of three fields: a pointer to the last transaction that opened the object for writing; a pointer to the old object version; a pointer to the new object version. The new object version is a private copy of the old object version. As long as the transaction is active, the new version will be invisible outside the transaction. When the transaction finally commits, the new version of the object will become the current version of the object and therefore be visible for other transactions. In order to atomically acquire and commit a TMObject, its structure has to be modified. A TMObject actually only contains a pointer to a different structure called a locator which contains all three fields described above, see Figure 3.2 below. A locator of Figure 3.2: The basic structure of a TMObject pointing to a locator. a TMObject is created whenever the object is opened for writing. The current locator of a TMObject belongs to the transaction that most recently opened the object for writing. In a conflict-free execution, the opening transaction creates a new locator, initializes the values of the locator and repoints the TMObject to the new locator through an atomic compare-and-swap operation. When creating a locator the transaction may set its fields

25 3.1. Software Transactional Memory 13 Figure 3.3: Example of a transaction opening an object for writing when the targeted object is owned by a transaction that is aborted. Since the transaction is aborted, the old object version is considered the current. in different ways based on the state of the transaction owning the current locator of the TMObject, see Figure 3.3. Consider the case when a transaction A opens a transactional object that already has a locator which is owned by another transaction B. If transaction B is committed, the current value of the object is the new object version. If B has aborted, the current value of the object is the old object version. Using this information, A sets the old object version in its Locator to point at the current object version of B s locator. A then sets the new object version to point at a copy of the current object version. However, say that B is currently in an active state, having neither committed nor aborted. It is impossible for A to know which version should be treated as current. This is where the contention manager comes into the picture. A is able to foresee that it will conflict with B and therefore asks its contention manager which action is appropriate; abort B and continue the execution or wait for B to finish and then continue execution. After the fields of the locator are set, the TMObject is repointed to the new locator in one atomic operation. The next step is for the transaction to commit. When the transaction has acquired all objects and applied changes to the private versions of the objects it is ready to commit. In DSTM, committing a transaction only involves changing its status flag from active to committed. All the locators that the transaction created when acquiring objects have a field that is linked to the status flag of the transaction. When the status flag of the transaction is changed to committed all the changes to the acquired objects become permanent and visible to other transactions. By using this design, DSTM allows transactional behaviour for accessing and modifying multiple objects in a concurrent environment [6, p.1] Discussion When a transaction has finished all writes to different objects, DSTM is able to apply these writes and make them permanent in a single atomic operation, by changing a flag. This design is simple and useful since it reduces the possibility of another transaction reading inconsistent data. Java has methods for atomically switching the value of a single object [7]. It would be possible to apply all writes of a transaction by atomically

26 14 Chapter 3. Approaches to Concurrency Control switching each value, one by one. The problem with this approach is that during the writing another transaction might read from the data and read some of the new data and some of the old, resulting in an inconsistent view. Further efforts are needed to make sure that inconsistent reads are prevented. Each time a transaction opens an object, DSTM validates the transaction by verifying that none of the read objects have yet been modified. This validation is costly and is the bottleneck when executing large queries. If a transaction consists of w writes and r reads the sum of all validations results in a complexity of O(r (r + w)). DSTM keeps two versions of each object which means that the readers are always presented with the current data and are never blocked. This feature should make DSTM suitable for a read optimized system that demands fast queries over large amounts of data. A problem is that readers need to validate the data they have read every time they open a new object. If the queries are large the overhead of validating each read also becomes large. Another problem is that if the validation fails the entire transaction must be restarted. Even if reads are never blocked, during high contention of writes they will be aborted more frequently. In many cases it is beneficial to allow transactions to have different priorities. For example, a transaction that spans multiple tables is more likely to be aborted since other transactions that modify only a small part of a table are executed and committed in a much faster rate, altering the read set of the slow transaction. In this case, the expensive transaction could gain a higher priority based on its workload. High priority in this context means that the transaction can not be aborted due to a transaction of lower priority. In DSTM, priority can easily be implemented since the transactions are able to detect future conflicts and may resolve these based on the different priorities. A read-only transaction in DSTM holds a private read-set that containins versions of objects opened in read-only mode. Since these reads are not visible to other transactions, the priority can only cover conflicts in objects opened in read-write mode. In FSTM conflicts are detected late in the transaction, at the point of committing. When a conflict is detected, during the acquire phase, the transaction that owns the object causing the conflict is recursively helped, regardless if the transaction has a lower priority. Fraser does not discuss any concept of priority among the transactions in FSTM. In DSTM each object consists of a pointer to a locator which contains a reference to the last transaction that modified the object and to two versions of the data. In FSTM each object points either directly at the data or to the transaction that currently owns the object. When few or no writes are taking place, which is usually the case in a readoptimized system, FSTM will impose a small memory overhead compared to DSTM. A TMObject in DSTM always points to a locator which in turn holds more data. If no writes are taking place the locator could remove one of the versions in the locator to reduce the space. Even though the memory overhead is small for each object, it is too large to be implemented in Coldbase. This is explained further in Appendix A. The main difference between DSTM and FSTM is that they use different progress guarantees. This affects how the different systems are implemented; FSTM implements a helping function and DSTM implements a contention manager. It also affects the degree of visibility between transactions and at what point transactions may conflict, or detect conflicts. In DSTM transactions are visible to each other whenever they open objects in read-write mode. Because of this they can detect conflicting transactions in an early stage and thus avoid wasting the effort of continuing executions since they are bound to be restarted. On the contrary, transactions in FSTM become visible when they acquire the objects of their write-list during the execution of a commit. This means that

27 3.2. Traditional Concurrency Control Techniques 15 conflicts between transactions are detected late in the transaction during the commit. If two transactions conflict, one of the transactions might be invalidated while helping the other transaction. In this case, all the work done by the aborted transaction has been in vain. 3.2 Traditional Concurrency Control Techniques Concurrency control ensures that transactions execute atomically and that the execution is correct even when transactions are interleaved [8, p.2]. In this chapter, three traditional approaches to concurrency control are presented Timestamp Ordering In timestamp ordering (TO) each transaction T i owns a unique timestamp ts(t i ). Each time a transaction operates on a data item it leaves its timestamp as a footprint. To enable this, and make the footprint visible to other transactions, two values are associated with each data item. rts, the timestamp of the last transaction that read it. wts, the timestamp of the transaction that wrote it. The operation of a transaction can be defined as o i [x], where o may be a read or a write, i is the transaction id and x is the data item. Two operations conflict if they work on the same object and at least one of them is a write. The basic idea of TO, as the name implies, is to order conflicting operations based on their timestamps. There is a general rule, called TO-rule which states: If p i (x) and q j (x) are conflicting operations then p i (x) is processed before q j (x) iff ts(t i ) < ts(t j ) [8, p.114]. If two operations conflict, the operation that has the oldest timestamp is processed before the other. If this rule is followed, all the operations are serializable, thus having the same effect as serially executing the transactions. There are different ways of enforcing the TO-rule Basic TO The first approach, called basic TO, is aggressive in that it rejects operations that arrive too late. If an operation o i [x] arrives after another operation p j [x] has been executed, and ts(t i ) < ts(t j ), then o i [x] must be rejected in order to enforce the TO-rule. This means that the transaction T i must be aborted and restarted with a new timestamp. Based on the footprint that transactions leave on data items it is possible to determine if a conflicting operation should be rejected or not. For example, if the operation is a write and its timestamp is older than that of the transaction that most recently wrote the data, the operation must be rejected. It is not always possible to execute an operation directly even though it has been accepted as far as the TO-rule is concerned. Say that the operation o j [x] follows another conflicting operation p i [x] that is currently being processed. o j [x] has been accepted by the TO-rule but before it can be executed the current operation p i [x] must finish its

28 16 Chapter 3. Approaches to Concurrency Control execution. To keep track of current operations in transit, a scheduler keeps records for each data item, tracking the number of reads and writes that are currently being processed. The scheduler also has a queue for operations that are waiting for other conflicting operations in transit before they can be executed. A problem with basic TO is that it is not recoverable. For example, take this schema, where c i denotes the commit of T i : w 1 [x] r 2 [x] w 2 [y] c 2 In this schema, conflicting operations are executed in timestamp order which means that it is a valid schema produced by basic TO. The problem is that T 2 has read values from T 1 and then committed before T 1. If T 1 should abort then T 2 has a dirty read (r 2 [x]) and must also abort. This might trigger a chain of cascading aborts and is not an attractive solution since it would add complexity to the system to handle these situations Strict TO Another design called strict TO enforce strictness, meaning that the risk for dirty reads and cascading aborts is eliminated [8, p.11]. The design is almost the same as for basic TO but strict TO introduces locks. When a write operation is executed, a scheduler notes that the operation is in transit. This means that no other transaction can operate on the item. In basic TO the item would be released directly after a successful execution. However, in strict TO the item is not released until the writing transaction has committed, blocking other transactions from operating on the item Discussion Timestamp ordering requires that a lot of metadata be kept. The following data need to be stored in some way for each data item: two timestamps used for concurrency control two values tracking the number of read and write operations that are in transit a queue of operations waiting to be processed. The current implementation of Coldbase has a minimal overhead for each data item. When designing Coldbase, the data items were considered to be stored as objects, making it possible to use generics and the Java Collections API. Arrays of primitive values were chosen since objects introduced a unnecessary memory overhead. By using primitive values, each data item allocates only 4 bytes [1, p.16]. If timestamp ordering were to be implemented, the data items would cause a memory overhead of at least n 4 = 4n + 16 bytes, where n is the number of operations that are held in a queue, waiting to be processed. This is not a feasible solution since the database system must be optimized for speed and memory. However, it is safe to believe that in the future the size of memory will continue to increase as well as the number of cores in the processors. With looser constraints on memory the algorithms will focus even more on optimizing the databases to handle concurrent executions. Coldbase is a read-optimized database system executing a lot of read-only queries concurrently. This means that whichever concurrency control is used, it should aim to interrupt queries as little as possible. When examining the algorithm of timestamp ordering, one finds that in some cases a read can be not only blocked but even aborted.

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25 DATABASE TRANSACTIONS CS121: Relational Databases Fall 2017 Lecture 25 Database Transactions 2 Many situations where a sequence of database operations must be treated as a single unit A combination of

More information

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers

More information

Concurrency control CS 417. Distributed Systems CS 417

Concurrency control CS 417. Distributed Systems CS 417 Concurrency control CS 417 Distributed Systems CS 417 1 Schedules Transactions must have scheduled so that data is serially equivalent Use mutual exclusion to ensure that only one transaction executes

More information

Concurrency Control in Distributed Systems. ECE 677 University of Arizona

Concurrency Control in Distributed Systems. ECE 677 University of Arizona Concurrency Control in Distributed Systems ECE 677 University of Arizona Agenda What? Why? Main problems Techniques Two-phase locking Time stamping method Optimistic Concurrency Control 2 Why concurrency

More information

A can be implemented as a separate process to which transactions send lock and unlock requests The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction

More information

Multi-User-Synchronization

Multi-User-Synchronization Chapter 10 Multi-User-Synchronization Database Systems p. 415/569 Why Run TAs Concurrently? We could run all TAs serially (one after the other) This would prevent all unwanted side effects However, this

More information

Foundation of Database Transaction Processing. Copyright 2012 Pearson Education, Inc.

Foundation of Database Transaction Processing. Copyright 2012 Pearson Education, Inc. Foundation of Database Transaction Processing Copyright 2012 Pearson Education, Inc. Chapter Outline - 17.1 Introduction to Transaction Processing - 17.2 Transaction and System Concepts - 17.3 Desirable

More information

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence? CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory

More information

Unit 10.5 Transaction Processing: Concurrency Zvi M. Kedem 1

Unit 10.5 Transaction Processing: Concurrency Zvi M. Kedem 1 Unit 10.5 Transaction Processing: Concurrency 2016 Zvi M. Kedem 1 Concurrency in Context User Level (View Level) Community Level (Base Level) Physical Level DBMS OS Level Centralized Or Distributed Derived

More information

6.852: Distributed Algorithms Fall, Class 20

6.852: Distributed Algorithms Fall, Class 20 6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous

More information

DB2 Lecture 10 Concurrency Control

DB2 Lecture 10 Concurrency Control DB2 Lecture 10 Control Jacob Aae Mikkelsen November 28, 2012 1 / 71 Jacob Aae Mikkelsen DB2 Lecture 10 Control ACID Properties Properly implemented transactions are commonly said to meet the ACID test,

More information

Chapter 13 : Concurrency Control

Chapter 13 : Concurrency Control Chapter 13 : Concurrency Control Chapter 13: Concurrency Control Lock-Based Protocols Timestamp-Based Protocols Validation-Based Protocols Multiple Granularity Multiversion Schemes Insert and Delete Operations

More information

Optimistic Concurrency Control. April 18, 2018

Optimistic Concurrency Control. April 18, 2018 Optimistic Concurrency Control April 18, 2018 1 Serializability Executing transactions serially wastes resources Interleaving transactions creates correctness errors Give transactions the illusion of isolation

More information

Concurrent & Distributed Systems Supervision Exercises

Concurrent & Distributed Systems Supervision Exercises Concurrent & Distributed Systems Supervision Exercises Stephen Kell Stephen.Kell@cl.cam.ac.uk November 9, 2009 These exercises are intended to cover all the main points of understanding in the lecture

More information

Intro to Transactions

Intro to Transactions Reading Material CompSci 516 Database Systems Lecture 14 Intro to Transactions [RG] Chapter 16.1-16.3, 16.4.1 17.1-17.4 17.5.1, 17.5.3 Instructor: Sudeepa Roy Acknowledgement: The following slides have

More information

Optimistic Concurrency Control. April 13, 2017

Optimistic Concurrency Control. April 13, 2017 Optimistic Concurrency Control April 13, 2017 1 Serializability Executing transactions serially wastes resources Interleaving transactions creates correctness errors Give transactions the illusion of isolation

More information

CSE 530A ACID. Washington University Fall 2013

CSE 530A ACID. Washington University Fall 2013 CSE 530A ACID Washington University Fall 2013 Concurrency Enterprise-scale DBMSs are designed to host multiple databases and handle multiple concurrent connections Transactions are designed to enable Data

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

Concurrency Control. [R&G] Chapter 17 CS432 1

Concurrency Control. [R&G] Chapter 17 CS432 1 Concurrency Control [R&G] Chapter 17 CS432 1 Conflict Serializable Schedules Two schedules are conflict equivalent if: Involve the same actions of the same transactions Every pair of conflicting actions

More information

T ransaction Management 4/23/2018 1

T ransaction Management 4/23/2018 1 T ransaction Management 4/23/2018 1 Air-line Reservation 10 available seats vs 15 travel agents. How do you design a robust and fair reservation system? Do not enough resources Fair policy to every body

More information

Lock-free Serializable Transactions

Lock-free Serializable Transactions Lock-free Serializable Transactions Jeff Napper jmn@cs.utexas.edu Lorenzo Alvisi lorenzo@cs.utexas.edu Laboratory for Advanced Systems Research Department of Computer Science The University of Texas at

More information

TRANSACTIONS AND ABSTRACTIONS

TRANSACTIONS AND ABSTRACTIONS TRANSACTIONS AND ABSTRACTIONS OVER HBASE Andreas Neumann @anew68! Continuuity AGENDA Transactions over HBase: Why? What? Implementation: How? The approach Transaction Manager Abstractions Future WHO WE

More information

Database Tuning and Physical Design: Execution of Transactions

Database Tuning and Physical Design: Execution of Transactions Database Tuning and Physical Design: Execution of Transactions Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Transaction Execution 1 / 20 Basics

More information

Database Systems. Announcement

Database Systems. Announcement Database Systems ( 料 ) December 27/28, 2006 Lecture 13 Merry Christmas & New Year 1 Announcement Assignment #5 is finally out on the course homepage. It is due next Thur. 2 1 Overview of Transaction Management

More information

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Transaction Management A.R. Hurson Computer Science Missouri Science & Technology 1 Distributed Database System Note, this unit will be covered

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

Concurrency Control. Conflict Serializable Schedules. Example. Chapter 17

Concurrency Control. Conflict Serializable Schedules. Example. Chapter 17 Concurrency Control Chapter 17 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Conflict Serializable Schedules Two schedules are conflict equivalent if: Involve the same actions of the

More information

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of

More information

Introduction to Transaction Processing Concepts and Theory

Introduction to Transaction Processing Concepts and Theory Chapter 4 Introduction to Transaction Processing Concepts and Theory Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006) 1 Chapter Outline Introduction to Transaction Processing

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 17-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 17-1 Slide 17-1 Chapter 17 Introduction to Transaction Processing Concepts and Theory Chapter Outline 1 Introduction to Transaction Processing 2 Transaction and System Concepts 3 Desirable Properties of Transactions

More information

! Part I: Introduction. ! Part II: Obstruction-free STMs. ! DSTM: an obstruction-free STM design. ! FSTM: a lock-free STM design

! Part I: Introduction. ! Part II: Obstruction-free STMs. ! DSTM: an obstruction-free STM design. ! FSTM: a lock-free STM design genda Designing Transactional Memory ystems Part II: Obstruction-free TMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch! Part I: Introduction! Part II: Obstruction-free TMs! DTM: an obstruction-free

More information

transaction - (another def) - the execution of a program that accesses or changes the contents of the database

transaction - (another def) - the execution of a program that accesses or changes the contents of the database Chapter 19-21 - Transaction Processing Concepts transaction - logical unit of database processing - becomes interesting only with multiprogramming - multiuser database - more than one transaction executing

More information

Transaction Management: Concurrency Control, part 2

Transaction Management: Concurrency Control, part 2 Transaction Management: Concurrency Control, part 2 CS634 Class 16 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Locking for B+ Trees Naïve solution Ignore tree structure,

More information

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching

Locking for B+ Trees. Transaction Management: Concurrency Control, part 2. Locking for B+ Trees (contd.) Locking vs. Latching Locking for B+ Trees Transaction Management: Concurrency Control, part 2 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke CS634 Class 16 Naïve solution Ignore tree structure,

More information

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems Transaction Processing: Concurrency Control CPS 216 Advanced Database Systems Announcements (April 26) 2 Homework #4 due this Thursday (April 28) Sample solution will be available on Thursday Project demo

More information

Chapter 9: Concurrency Control

Chapter 9: Concurrency Control Chapter 9: Concurrency Control Concurrency, Conflicts, and Schedules Locking Based Algorithms Timestamp Ordering Algorithms Deadlock Management Acknowledgements: I am indebted to Arturas Mazeika for providing

More information

Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items.

Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items. Graph-based protocols are an alternative to two-phase locking Impose a partial ordering on the set D = {d 1, d 2,..., d h } of all data items. If d i d j then any transaction accessing both d i and d j

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

Concurrency Control. Chapter 17. Comp 521 Files and Databases Spring

Concurrency Control. Chapter 17. Comp 521 Files and Databases Spring Concurrency Control Chapter 17 Comp 521 Files and Databases Spring 2010 1 Conflict Serializable Schedules Recall conflicts (WW, RW, WW) were the cause of sequential inconsistency Two schedules are conflict

More information

CS5412: TRANSACTIONS (I)

CS5412: TRANSACTIONS (I) 1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of

More information

Some Examples of Conflicts. Transactional Concurrency Control. Serializable Schedules. Transactions: ACID Properties. Isolation and Serializability

Some Examples of Conflicts. Transactional Concurrency Control. Serializable Schedules. Transactions: ACID Properties. Isolation and Serializability ome Examples of onflicts ransactional oncurrency ontrol conflict exists when two transactions access the same item, and at least one of the accesses is a write. 1. lost update problem : transfer $100 from

More information

AST: scalable synchronization Supervisors guide 2002

AST: scalable synchronization Supervisors guide 2002 AST: scalable synchronization Supervisors guide 00 tim.harris@cl.cam.ac.uk These are some notes about the topics that I intended the questions to draw on. Do let me know if you find the questions unclear

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Concurrency Control Part 2 (R&G ch. 17) Serializability Two-Phase Locking Deadlocks

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All

More information

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer Multi-Core Computing with Transactional Memory Johannes Schneider and Prof. Roger Wattenhofer 1 Overview Introduction Difficulties with parallel (multi-core) programming A (partial) solution: Transactional

More information

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Synchronization Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Election Algorithms Issue: Many distributed algorithms require that one process act as a coordinator (initiator, etc).

More information

Chapter 15 : Concurrency Control

Chapter 15 : Concurrency Control Chapter 15 : Concurrency Control What is concurrency? Multiple 'pieces of code' accessing the same data at the same time Key issue in multi-processor systems (i.e. most computers today) Key issue for parallel

More information

! A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes :

! A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes : Lock-Based Protocols Concurrency Control! A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes : 1 exclusive (X) mode Data item can be both read as well

More information

A Qualitative Survey of Modern Software Transactional Memory Systems

A Qualitative Survey of Modern Software Transactional Memory Systems A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe and Michael L. Scott TR 839 Department of Computer Science University of Rochester Rochester, NY 14627-0226 {vmarathe,

More information

Chapter 5. Concurrency Control Techniques. Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006)

Chapter 5. Concurrency Control Techniques. Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006) Chapter 5 Concurrency Control Techniques Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006) Chapter Outline Purpose of Concurrency Control Two-Phase Locking Techniques Concurrency

More information

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9

Transactions. Kathleen Durant PhD Northeastern University CS3200 Lesson 9 Transactions Kathleen Durant PhD Northeastern University CS3200 Lesson 9 1 Outline for the day The definition of a transaction Benefits provided What they look like in SQL Scheduling Transactions Serializability

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 5: Concurrency Control Topics Data Database design Queries Decomposition Localization Optimization Transactions Concurrency control Reliability

More information

Reminder from last time

Reminder from last time Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Computer Science and Engineering CS6302- DATABASE MANAGEMENT SYSTEMS Anna University 2 & 16 Mark Questions & Answers Year / Semester: II / III

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 5: Concurrency Control Topics Data Database design Queries Decomposition Localization Optimization Transactions Concurrency control Reliability

More information

TRANSACTION PROPERTIES

TRANSACTION PROPERTIES Transaction Is any action that reads from and/or writes to a database. A transaction may consist of a simple SELECT statement to generate a list of table contents; it may consist of series of INSERT statements

More information

Distributed Systems. 12. Concurrency Control. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 12. Concurrency Control. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 12. Concurrency Control Paul Krzyzanowski Rutgers University Fall 2017 2014-2017 Paul Krzyzanowski 1 Why do we lock access to data? Locking (leasing) provides mutual exclusion Only

More information

Advances in Data Management Transaction Management A.Poulovassilis

Advances in Data Management Transaction Management A.Poulovassilis 1 Advances in Data Management Transaction Management A.Poulovassilis 1 The Transaction Manager Two important measures of DBMS performance are throughput the number of tasks that can be performed within

More information

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 Synchronization Part 2 REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 1 Outline Part 2! Clock Synchronization! Clock Synchronization Algorithms!

More information

Part VII Data Protection

Part VII Data Protection Part VII Data Protection Part VII describes how Oracle protects the data in a database and explains what the database administrator can do to provide additional protection for data. Part VII contains the

More information

Transaction Processing: Basics - Transactions

Transaction Processing: Basics - Transactions Transaction Processing: Basics - Transactions Transaction is execution of program that accesses DB Basic operations: 1. read item(x): Read DB item X into program variable 2. write item(x): Write program

More information

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636) What are Transactions? Transaction Management: Introduction (Chap. 16) CS634 Class 14, Mar. 23, 2016 So far, we looked at individual queries; in practice, a task consists of a sequence of actions E.g.,

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Unit 7: Transactions Schedules Implementation Two-phase Locking (3 lectures) 1 Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit

More information

Monitors; Software Transactional Memory

Monitors; Software Transactional Memory Monitors; Software Transactional Memory Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 17, 2016 CPD (DEI / IST) Parallel and Distributed

More information

Advanced Databases. Lecture 9- Concurrency Control (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

Advanced Databases. Lecture 9- Concurrency Control (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch Advanced Databases Lecture 9- Concurrency Control (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Multiple Granularity Allow data items to be of various sizes and

More information

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall Concurrency Control Chapter 17 Comp 521 Files and Databases Fall 2012 1 Conflict Serializable Schedules Recall conflicts (WR, RW, WW) were the cause of sequential inconsistency Two schedules are conflict

More information

Information Systems (Informationssysteme)

Information Systems (Informationssysteme) Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 c Jens Teubner Information Systems Summer 2016 1 Part VIII Transaction Management c Jens Teubner

More information

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 CSC 261/461 Database Systems Lecture 21 and 22 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Announcements Project 3 (MongoDB): Due on: 04/12 Work on Term Project and Project 1 The last (mini)

More information

Control. CS432: Distributed Systems Spring 2017

Control. CS432: Distributed Systems Spring 2017 Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques

More information

Transaction Processing Concepts and Theory. Truong Tuan Anh CSE-HCMUT

Transaction Processing Concepts and Theory. Truong Tuan Anh CSE-HCMUT 1 Transaction Processing Concepts and Theory Truong Tuan Anh CSE-HCMUT 2 Outline Introduction to Transaction Processing Transaction and System Concepts Desirable Properties of Transactions Characterizing

More information

Transaction Management: Introduction (Chap. 16)

Transaction Management: Introduction (Chap. 16) Transaction Management: Introduction (Chap. 16) CS634 Class 14 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke What are Transactions? So far, we looked at individual queries;

More information

A transaction is a sequence of one or more processing steps. It refers to database objects such as tables, views, joins and so forth.

A transaction is a sequence of one or more processing steps. It refers to database objects such as tables, views, joins and so forth. 1 2 A transaction is a sequence of one or more processing steps. It refers to database objects such as tables, views, joins and so forth. Here, the following properties must be fulfilled: Indivisibility

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 20 Introduction to Transaction Processing Concepts and Theory Introduction Transaction Describes local unit of database processing Transaction processing systems Systems with large databases and

More information

Transactions. Silberschatz, Korth and Sudarshan

Transactions. Silberschatz, Korth and Sudarshan Transactions Transaction Concept ACID Properties Transaction State Concurrent Executions Serializability Recoverability Implementation of Isolation Transaction Definition in SQL Testing for Serializability.

More information

Software Transactional Memory for Dynamic-sized Data Structures

Software Transactional Memory for Dynamic-sized Data Structures Software Transactional Memory for Dynamic-sized Structures Maurice Herlihy, Victor Luchango, Mark Moir, William N. Scherer III Presented by: Irina Botan Outline Introduction Dynamic Software Transactional

More information

Comp 5311 Database Management Systems. 14. Timestamp-based Protocols

Comp 5311 Database Management Systems. 14. Timestamp-based Protocols Comp 5311 Database Management Systems 14. Timestamp-based Protocols 1 Timestamps Each transaction is issued a timestamp when it enters the system. If an old transaction T i has time-stamp TS(T i ), a new

More information

Portland State University ECE 588/688. Transactional Memory

Portland State University ECE 588/688. Transactional Memory Portland State University ECE 588/688 Transactional Memory Copyright by Alaa Alameldeen 2018 Issues with Lock Synchronization Priority Inversion A lower-priority thread is preempted while holding a lock

More information

Lecture 22 Concurrency Control Part 2

Lecture 22 Concurrency Control Part 2 CMSC 461, Database Management Systems Spring 2018 Lecture 22 Concurrency Control Part 2 These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used from

More information

ECE 650 Systems Programming & Engineering. Spring 2018

ECE 650 Systems Programming & Engineering. Spring 2018 ECE 650 Systems Programming & Engineering Spring 2018 Database Transaction Processing Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) Transaction Processing Systems Systems with

More information

Transactions and Concurrency Control

Transactions and Concurrency Control Transactions and Concurrency Control Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Overview A transaction is a sequence of operations that is treated as a single logical operation.

More information

Concurrency Control CHAPTER 17 SINA MERAJI

Concurrency Control CHAPTER 17 SINA MERAJI Concurrency Control CHAPTER 17 SINA MERAJI Announcement Sign up for final project presentations here: https://docs.google.com/spreadsheets/d/1gspkvcdn4an3j3jgtvduaqm _x4yzsh_jxhegk38-n3k/edit#gid=0 Deadline

More information

Serializability of Transactions in Software Transactional Memory

Serializability of Transactions in Software Transactional Memory Serializability of Transactions in Software Transactional Memory Utku Aydonat Tarek S. Abdelrahman Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto {uaydonat,tsa}@eecg.toronto.edu

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 20 Concurrency Control Part -1 Foundations for concurrency

More information

Weak Levels of Consistency

Weak Levels of Consistency Weak Levels of Consistency - Some applications are willing to live with weak levels of consistency, allowing schedules that are not serialisable E.g. a read-only transaction that wants to get an approximate

More information

Distributed Databases Systems

Distributed Databases Systems Distributed Databases Systems Lecture No. 07 Concurrency Control Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Outline

More information

Chapter 20 Introduction to Transaction Processing Concepts and Theory

Chapter 20 Introduction to Transaction Processing Concepts and Theory Chapter 20 Introduction to Transaction Processing Concepts and Theory - Logical units of DB processing - Large database and hundreds of transactions - Ex. Stock market, super market, banking, etc - High

More information

TRANSACTIONS OVER HBASE

TRANSACTIONS OVER HBASE TRANSACTIONS OVER HBASE Alex Baranau @abaranau Gary Helmling @gario Continuuity WHO WE ARE We ve built Continuuity Reactor: the world s first scale-out application server for Hadoop Fast, easy development,

More information

Concurrency Control. Chapter 17. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Concurrency Control. Chapter 17. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Concurrency Control Chapter 17 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Confict Serializable Schedules Two schedules are confict equivalent if: Involve the same actions of the same

More information

Transaction Management & Concurrency Control. CS 377: Database Systems

Transaction Management & Concurrency Control. CS 377: Database Systems Transaction Management & Concurrency Control CS 377: Database Systems Review: Database Properties Scalability Concurrency Data storage, indexing & query optimization Today & next class Persistency Security

More information

SQL: Transactions. Introduction to Databases CompSci 316 Fall 2017

SQL: Transactions. Introduction to Databases CompSci 316 Fall 2017 SQL: Transactions Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Oct. 17) Midterm graded Sample solution already posted on Sakai Project Milestone #1 feedback by email this weekend

More information

UNIT 3 UNIT 3. Transaction Management and Concurrency Control, Performance tuning and query optimization of SQL and NoSQL Databases.

UNIT 3 UNIT 3. Transaction Management and Concurrency Control, Performance tuning and query optimization of SQL and NoSQL Databases. UNIT 3 Transaction Management and Concurrency Control, Performance tuning and query optimization of SQL and NoSQL Databases. 1. Transaction: A transaction is a unit of program execution that accesses and

More information

Transaction Processing: Concurrency Control ACID. Transaction in SQL. CPS 216 Advanced Database Systems. (Implicit beginning of transaction)

Transaction Processing: Concurrency Control ACID. Transaction in SQL. CPS 216 Advanced Database Systems. (Implicit beginning of transaction) Transaction Processing: Concurrency Control CPS 216 Advanced Database Systems ACID Atomicity Transactions are either done or not done They are never left partially executed Consistency Transactions should

More information

CS352 Lecture - Concurrency

CS352 Lecture - Concurrency CS352 Lecture - Concurrency Objectives: Last revised 11/16/06 1. To introduce locking as a means of preserving the serializability of concurrent schedules. 2. To briefly introduce other approaches to this

More information

Concurrency Control. R &G - Chapter 19

Concurrency Control. R &G - Chapter 19 Concurrency Control R &G - Chapter 19 Smile, it is the key that fits the lock of everybody's heart. Anthony J. D'Angelo, The College Blue Book Review DBMSs support concurrency, crash recovery with: ACID

More information

EXTENDING THE PRIORITY CEILING PROTOCOL USING READ/WRITE AFFECTED SETS MICHAEL A. SQUADRITO A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

EXTENDING THE PRIORITY CEILING PROTOCOL USING READ/WRITE AFFECTED SETS MICHAEL A. SQUADRITO A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE EXTENDING THE PRIORITY CEILING PROTOCOL USING READ/WRITE AFFECTED SETS BY MICHAEL A. SQUADRITO A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER

More information

CS352 Lecture - Concurrency

CS352 Lecture - Concurrency CS352 Lecture - Concurrency Objectives: Last revised 3/21/17 1. To introduce locking as a means of preserving the serializability of concurrent schedules. 2. To briefly introduce other approaches to this

More information

CS 448 Database Systems. Serializability Issues

CS 448 Database Systems. Serializability Issues CS 448 Database Systems Serializability Issues 1 Locking in B+ Trees How can we efficiently lock a particular leaf node? Btw, don t confuse this with multiple granularity locking! One solution: Ignore

More information

CS 541 Database Systems. Serializability Issues

CS 541 Database Systems. Serializability Issues CS 541 Database Systems Serializability Issues 1 Locking in B+ Trees! How can we efficiently lock a particular leaf node? " Btw, don t confuse this with multiple granularity locking!! One solution: Ignore

More information

mywbut.com Concurrency Control

mywbut.com Concurrency Control C H A P T E R 1 6 Concurrency Control This chapter describes how to control concurrent execution in a database, in order to ensure the isolation properties of transactions. A variety of protocols are described

More information