Buffer Manager: Project 1 Assignment

Similar documents
Overview of Project V Buffer Manager. Steps of Phase II

Buffer Manager: Project 1 Assignment

Assignment 1: PostgreSQL Buffer Manager

IMPORTANT: Circle the last two letters of your class account:

The print queue was too long. The print queue is always too long shortly before assignments are due. Print your documentation

Setting up PostgreSQL

COMP 3500 Introduction to Operating Systems Project 5 Virtual Memory Manager

Blocky: A Game of Falling Blocks

Hellerstein/Olston. Homework 6: Database Application. beartunes. 11:59:59 PM on Wednesday, December 6 th

Inside the PostgreSQL Shared Buffer Cache

CMSC 201 Fall 2016 Homework 6 Functions

Programming Standards: You must conform to good programming/documentation standards. Some specifics:

Table Of Contents. 1. Zoo Information a. Logging in b. Transferring files 2. Unix Basics 3. Homework Commands

On Line Bank Project: Phase II Assignment UC Berkeley Computer Science 186 Fall 2002 Introduction to Database Systems September 24, 2002

Data Management, Spring 2015: Project #1

Programming Assignment IV Due Monday, November 8 (with an automatic extension until Friday, November 12, noon)

Programming Project 4: COOL Code Generation

Programming Assignment III

Project C: B+Tree. This project may be done in groups of up to three people.

University of California, Berkeley. (2 points for each row; 1 point given if part of the change in the row was correct)

University of California, Berkeley. CS 186 Introduction to Databases, Spring 2014, Prof. Dan Olteanu MIDTERM

Managing Storage: Above the Hardware

Using the Zoo Workstations

6.170 Laboratory in Software Engineering Java Style Guide. Overview. Descriptive names. Consistent indentation and spacing. Page 1 of 5.

GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL

Tips from the experts: How to waste a lot of time on this assignment

CS155: Computer Security Spring Project #1

Project 2: Buffer Manager

Distributed: Monday, Nov. 30, Due: Monday, Dec. 7 at midnight

COSC-4411(M) Midterm #1

CS355 Hw 4. Interface. Due by the end of day Tuesday, March 20.

Homework # 7 DUE: 11:59pm November 15, 2002 NO EXTENSIONS WILL BE GIVEN

CMPSC 311- Introduction to Systems Programming Module: Assignment #2 Cartridge HRAM Filesystem

CS 564, Spring 2017 BadgerDB Assignment #1: Buffer Manager

Carnegie Mellon University Database Applications Fall 2009, Faloutsos Assignment 5: Indexing (DB-internals) Due: 10/8, 1:30pm, only

CS 1110 SPRING 2016: GETTING STARTED (Jan 27-28) First Name: Last Name: NetID:

CS 1550 Project 3: File Systems Directories Due: Sunday, July 22, 2012, 11:59pm Completed Due: Sunday, July 29, 2012, 11:59pm

Tips from the experts: How to waste a lot of time on this assignment

CS2630: Computer Organization Project 2, part 1 Register file and ALU for MIPS processor Due July 25, 2017, 11:59pm

Programming Assignment IV

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub.

Fall 2004 CS 186 Discussion Section Exercises - Week 2 ending 9/10

University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science. EECS 150 Spring 2000

CS2630: Computer Organization Project 2, part 1 Register file and ALU for MIPS processor

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CMSC 201 Spring 2017 Lab 01 Hello World

Programming Assignment IV Due Thursday, June 1, 2017 at 11:59pm

Page 1 of 7. public class EmployeeAryAppletEx extends JApplet

Programming Tips for CS758/858

CS Exam 1 Review Suggestions - Spring 2017

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Basic Uses of JavaScript: Modifying Existing Scripts

CS159 - Assignment 2b

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

STORING DATA: DISK AND FILES

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

CSCI0330 Intro Computer Systems Doeppner. Lab 02 - Tools Lab. Due: Sunday, September 23, 2018 at 6:00 PM. 1 Introduction 0.

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

Cache Lab Implementation and Blocking

Homework , Fall 2013 Software process Due Wednesday, September Automated location data on public transit vehicles (35%)

2 Steps Building the Coding and Testing Infrastructure

Important Project Dates

CSCI 201 Lab #11 Prof. Jeffrey Miller 1/23. Lab #11 CSCI 201. Title MySQL Installation. Lecture Topics Emphasized Databases

Project #1: Tracing, System Calls, and Processes

Computer Systems II. Memory Management" Subdividing memory to accommodate many processes. A program is loaded in main memory to be executed

Programming Assignment IV Due Thursday, November 18th, 2010 at 11:59 PM

Project #1 Exceptions and Simple System Calls

ΗΥ460 Συστήματα Διαχείρισης Βάσεων Δεδομένων Χειμερινό Εξάμηνο 2017 Διδάσκοντες: Βασίλης Χριστοφίδης, Δημήτρης Πλεξουσάκης, Χαρίδημος Κονδυλάκης

CS 051 Homework Laboratory #2

CS 186/286 Spring 2018 Midterm 1

Database Applications (15-415)

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Project 4: Implementing Malloc Introduction & Problem statement

PART 1: Buffer Management

Important Project Dates

CSC 258 lab notes, Fall 2003

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

HOT-Compilation: Garbage Collection

CMSC 201 Spring 2018 Lab 01 Hello World

Vnodes. Every open file has an associated vnode struct All file system types (FFS, NFS, etc.) have vnodes

EXPERIMENT 1. FAMILIARITY WITH DEBUG, x86 REGISTERS and MACHINE INSTRUCTIONS

Introduction. Overview and Getting Started. CS 161 Computer Security Lab 1 Buffer Overflows v.01 Due Date: September 17, 2012 by 11:59pm

Repetition Using the End of File Condition

7 Cmicro Targeting. Tutorial. Chapter

(Refer Slide Time: 01:25)

Operating Systems (234123) Spring 2017 (Homework Wet 1) Homework 1 Wet

CSCi 4061: Intro to Operating Systems Spring 2017 Instructor: Jon Weissman Assignment 1: Simple Make Due: Feb. 15, 11:55 pm

CSCE 548 Building Secure Software SQL Injection Attack

Assignment #1: /Survey and Karel the Robot Karel problems due: 1:30pm on Friday, October 7th

Mehran Sahami Handout #7 CS 106A September 24, 2014

Second Midterm Exam March 21, 2017 CS162 Operating Systems

Lab 03 - x86-64: atoi

CSE451 Winter 2012 Project #4 Out: February 8, 2012 Due: March 5, 2011 by 11:59 PM (late assignments will lose ½ grade point per day)

Unit 2 Buffer Pool Management

Debugging and profiling in R

CS 186/286 Spring 2018 Midterm 1

CS W Introduction to Databases Spring Computer Science Department Columbia University

In either case, remember to delete each array that you allocate.

CS220 Database Systems. File Organization

Transcription:

Buffer Manager: Project 1 Assignment Due: Feb 11, 2003 Introduction to Database Systems, UC Berkeley Computer Science 186 Spring 2003 1 Overview of Project 1 - Buffer Manager In this project you will add functionality to the backend server of PostgreSQL. You need to delve into the actual server code. In this first assignment we limit our investigation to one core module, the buffer manager. The actual coding for this project is minimal, but you will need to figure out what code to add and where, which is non-trivial! The major parts of this project are: Understanding different memory management strategies Examining, understanding and modifying existing code Measuring the effects of your code in a real system Due to disk space constraints, we are providing you with a leaner PostgreSQL source tree. Some features (contributed utilities, regression tests, manual, tutorials and other language support) have been removed. You will not need these for this project, but you are free to download them if you have the space. Even with these modifications, you will not have extra space to store other files, so be sure to remove them before beginning 1. Make sure your code runs smoothly on the x86 Solaris EECS instructional machines (rhombus, pentagon, and torus.cs.berkeley.edu) prior to submission since ALL grading will done on those. However, since all the software is freely available, you may find it easier to develop your code at home or on other machines you have access to. The instructions provided are designed to work with the EECS instructional machines and may not work as smoothly elsewhere. The TAs and Professor will only support EECS instructional machines. 2 Tasks and Steps (1) Learn different buffer management strategies. (30%) 1 You can allocate temp space via /share/b/bin/mkhometmpdir. See http://inst.eecs.berkeley.edu/cgi-bin/pub.cgi?file=/usr/pub/disk.quotas for details. Handout generated on 30 January 2003

(2) Compile and install PostgreSQL. (3) Implement CLOCK. (40%) (4) Compare performance of LRU, MRU and CLOCK. (30%) (5) Submit. Please spend some time to read this assignment completely before beginning. 3 Learn Different Buffer Management Strategies (30%) The textbook describes LRU, MRU and CLOCK strategies in section 9.4.1. However, you may need to read the whole section of 9.4 to get a full image of buffer manager in DBMS. As a first step, we present a small exercise to test your knowledge of the replacement strategies. For LRU, MRU, and CLOCK, you need to trace the buffer contents for the workload listed below. For your solutions, list each time that a memory access caused a miss in the buffer pool, and which (if any) page is evicted from the buffer pool to make room for the missing page. Assume time starts from T1 and moves forward by one on each page reference. Solution format for this step: You must record your answers in a plain text file called strategies. The format of the file is two columns of text, separated by spaces. The file should begin with your measurements from LRU, with each row listing the time of a miss, a space, and the page (if any) evicted from memory. Then do the same for MRU and CLOCK in that order, leaving a blank line in between. For example: Time T1 T2 T5 T2 T5 T4 T5 Page Removed W Z Z W LRU (a blank line here) MRU (a blank line here) CLOCK Note that not each time is listed for each strategy, since not each time has a buffer miss. Note also that the second column will be blank when the buffer miss does not result in an eviction. Assume there are 4 pages your buffer manager must manage. All four pages are empty to start. For each access the page is pinned, and then immediately unpinned 2. The 2 You cannot make this same assumption when writing the actual code. A page may be pinned for any length of time. 2

following table provides a list of page accesses at each time step. Based on this information, you should be able to generate the strategies file correctly. Time Page Read Time Page Read Time Page Read Time Page Read T1 A T7 E T13 A T19 A T2 B T8 A T14 B T20 F T3 C T9 B T15 B T21 C T4 A T10 F T16 D T22 A T5 D T11 C T17 G T23 B T6 E T12 G T18 F T24 G 4 Compile and Test PostgreSQL Everything needed for this project is available at ~cs186/sp03/hw1/hw1 pkg.tar.gz on all instructional machines. To begin with, copy this package to your own directory and untar it with: gtar -xzf hw1 pkg.tar.gz 3. You will see three subdirectories in your Hw1/ directory, which are: (1) postgresql-7.2.2/: Source code of PostgreSQL version 7.2.2. Little code was changed between this version and the actual version. The block size was changed to be smaller, in order to generate more I/Os. We added some scripts to help you compile your code. We also added some debugging/logging lines in the code. (2) src/: In this directory you will find freelist.c.mru, which is our implementation of MRU. PostgreSQL ships with LRU, so our implementation of MRU is intended to provide you with an example of how to change the buffer manager s replacement policy. You will want to compare this code to postgresql-7.2.2/backend/storage/buffer/freelist.c to study the changes we made. You will also find stub/ in this directory, which is the test harness code we will discuss later in Section 5.3. (3) exec/: Some scripts you need to run your code. Here are instructions for compiling PostgreSQL and updating your implementation to PostgreSQL. (1) To completely re-compile and install PostgreSQL, cd postgresql-7.2.2 and run./compile.sh. (2) To only update your implementation to PostgreSQL and re-compile it, edit your own copy of freelist.c and buf internals.h in the src/ directory, cd postgresql-7.2.2, and run./update.sh <your-freelist.c> <your-buf internals.h>. NOTE: You should edit your own versions of freelist.c and buf internals.h in the src/ directory, and use the./update.sh command to install them. 3 Remember to remove it if you are out of disk space. 3

We STRONGLY recommend that you do not modify the files in the postgresql-7.2.2 directory directly! update.sh will use your files to replace related files in src/backend/storage/buffer/ and src/include/storage/, and compile your code with PostgreSQL. The original LRU files have been copied for your reference as *.c.lru and *.h.lru in their original directories. Of course you need to pay attention to any errors produced by the compiler while running this script. 5 Implement CLOCK (40%) 5.1 How to implement? LRU (Least Recently Used) is the most common policy used by operating and database systems. However, direct implementation of LRU is pretty expensive. So some systems use a policy called CLOCK instead. While CLOCK approximates the behavior of the LRU scheme, it is much faster. The description of CLOCK algorithm is available in Section 9.4.1 of the textbook. We give a description here to assist your implementation. To implement the CLOCK replacement policy, you need to maintain the following information. You are free to maintain additional information if you want. But the following two pieces of information are necessary. (1) A referenced bit associated with each page. Each time a page in the buffer pool is done being accessed, the referenced bit associated with the corresponding frame should be set to 1 before the page is considered for replacement. In your implementation, you can simply set the referenced bit to 1 when you unpin a page and its pin count goes to 0 that is the first time that the the page is available to the replacement strategy. (2) The clock hand (current in the textbook): varies between 0 and NBuffers - 1, where NBuffers is the number of buffers in the PostgreSQL buffer pool. Each time a page needs to be found for replacement, the search begins from from the current page and advances to current = (current + 1) % NBuffers if not found (i.e. in a clockwise cycle). Each page in the buffer pool is in one of three states: pinned: pin count > 0 referenced: pin count = 0 and referenced = 1 available: pin count = 0 and referenced = 0 To find a page for replacement, you start from the current page, repeat the following steps until you find a page or all pages are pinned. If: page[current] is pinned: advance current and try again. 4

page[current] is referenced: set status to available, advance current and try again. page[current] is available: use this page and advance current. 5.2 Which files to modify? The actual implementation of CLOCK is rather straightforward once you understand the existing code. One hint we d like to give here is that you don t need to bother with the LRU freelist queue. You can add and manage any new data structure you need without doing anything with the old ones. The existing code is not extremely clear, but it is understandable. It may take you a few hours (or more) to understand it. Since understanding the existing code is a significant part of the assignment, the TAs and Professor will not assist you in understanding the existing code (beyond what we discuss here). The actual buffer manager code is neatly separated from the rest of the codebase. It is located in src/backend/storage/buffer and src/include/storage. For CLOCK, we are interested in modifying only two files. src/backend/storage/buffer/freelist.c, manages which pages are not pinned in memory, and which pages are eligible for replacement. src/include/storage/buf internals.h contains the definition for each buffer description (called BufferDesc). Most of the fields in BufferDesc are managed by other parts of the code, but for efficiency some fields are used to manage the buffer space while it is eligible for replacement (when it s on the existing LRU freelist). Some initialization of the queue occurs in src/backend/storage/buffer/buf init.c. However, you must do all your initialization in freelist.c. You will modify these two files to change the implementation from the existing LRU queue to the CLOCK discipline. You may find that you do not need to edit both files or change many of the functions. N.B.: You may need to declare global variables to make your code work. While this is generally poor programming style and you may have been taught to avoid it, it is the most natural solution given the existing structure of PostgreSQL. It s wise to use the C modifier static to limit the scope of your global variable to a single.c file. Your code will be tested with both our test harness stub (30%, see description below) and running in PostgreSQL on a real dataset (10%). Unfortunately there is no partial credit for each test. Be sure it is correct. A bad buffer manager makes the rest of the system useless! 5.3 How to debug? We suggest that you debug by running a debugger on the Postgres backend in standalone mode. You can following the instructions given during the C/GDB review 5

section. More information on using the backend in stand-alone mode is available at: http://www.postgresql.org/idocs/index.php?app-postgres.html We suggest you following steps for debugging 4 : (1) If you want to log which pages are loaded and which pages hit, add #define CS186 HW1 DEBUG in your buf interals.h. The debugging code we added for you is in src/backend/storage/buffer/bufmgr.c, and you are welcome to read or even change it as you see fit. You can also use following function to add your own entries to the Postgres debugging output: elog(debug,<format>,<arg>) see bufmgr.c for examples 5. (2) Update and compile your code using update.sh as described in section 4. You don t need to use gmake install to install it here! (3) Prepare your data directory: run./init.sh <DATADIR> and./pre.sh <DATADIR> in ~/Hw1/exec 6../pre.sh will create a database named test and two tables. NOTICE: the script will completely remove the <DATADIR>. So be careful. (4) Run src/backend/postgres (in a debugger if you like). Make sure you are running this version of postgres, not the one in the class account directory! To start the backend server in stand-alone mode, run src/backend/postgres -s -D <DATADIR> test. It will use the data directories (and data) you prepared in the last step. The interface is directly to the backend, so psql features will not work. If your binary does not work correctly, it may corrupt the data, so be warned. (5) Through the backend, you can directly type SQL statements, although the output is somewhat painful to read. To exit, type CTRL-D. In addition, the -s option will tell the server to print statistics at the end of each query, including the number of buffer hits! Finally, you can restrict the number of buffers PostgreSQL will use by adding the -B <nbuffers> 7 option so that even small queries generate buffer misses. Test harness stub: A small test harness stub program is available in ~Hw1/src/stub/*. Copy your versions of freelist.c and buf internals.h to override the original ones in src/stub, and run make to compile the test harness stub. You may want to modify the stub to add additional test cases. The actual test stub we use for grading will be different; the one provided is only for your convenience (and to make sure your code will compile with our test stub)! 4 This is not a requirement, of course, but we thought you d find this method useful. 5 Use (postgres >& <filename>) to redirect the debugging output to some file. 6 The data directory you created in Homework 0 may not be compatible with this project, because we changed block size of each buffer smaller than the original PostgreSQL. So, make sure you create a new directory 7 <nbuffers> should not be smaller than 16. 6

6 Compare performance of LRU, MRU and CLOCK (30%) Here you will compare CLOCK with LRU and MRU. There are three scripts we prepared for you, all in ~/Hw1/exec: (1)./init.sh <DATADIR>: the script for you to init the data. (2)./pre.sh <DATADIR> [index]: [index] is an optional flag that you will use in some experiments to create a table with an index on its primary key. (3)./run.sh <logfile> <DATADIR> <clock lru mru> <bufnum>: Make sure you have compiled 8 and installed 9 your CLOCK version before you run it with the clock flag. The statistics of the query will be written in <logfile>. <DATADIR> is the same directory as (1) and (2). Use clock, lru or mru to assign which replace policy you want to use. <bufnum> is the number of buffers you want to use. The <logfile> you get is like: DEBUG: EXECUTOR STATISTICS! system usage stats:!............ DEBUG: EXECUTOR STATISTICS! system usage stats:!............! postgres usage stats:! Shared blocks: 842 read, 0 written, buffer hit rate = 86.31%!............ This is interpreted as follows: 842 is the actual read requests that had to be passed onto the disk (= total page requests for reads - buffer hits for reads), and 86.31% is the hit rate of the query (= total buffer hits for reads / total page requests for reads). The SQL scripts of creating tables and running queries are available at ~cs186/sp03/hw1/*.sql. You can read them if you re interested. The query.sql script basically does: SELECT * from S,R where R.fkey=S.key. You are asked to do two experiments. (1) Prepare your data without index on S:./init.sh <DATADIR>./pre.sh <DATADIR> Run./run.sh with lru or mru or clock and different <bufnum>, fill table below: 8 Refer to Section 4 about how to compile 9 Use gmake install 7

<bufnum> 250 270 290 310 330 350 LRU hit rate MRU hit rate CLOCK hit rate (2) Prepare your data with index on primary key of S:./init.sh <DATADIR>./pre.sh <DATADIR> index Run./run.sh with lru or mru or clock and different <bufnum>, fill table below: <bufnum> 250 270 290 310 330 350 LRU hit rate MRU hit rate CLOCK hit rate Please answer some questions based on experiment results you get: (1) For experiment 1, did each strategy perform as we described in class? Answer yes or no, and explain your answer. (2) For experiment 2, what are the main differences compared with experiment 1? Explain the cause of these differences. (3) Tables R and S are the same size. Can you tell the approximate <bufnum> used by R (or S)? How do you know that? A Hint: In answering the questions above, you may be interested to know that PostgreSQL uses 278 buffers to initialize your query (e.g. load system tables), and unpins these buffers before the query begins running. 7 Submit You must submit at least 5 files (even if they are incomplete or unchanged): (1) README - the members of your group (names and class accounts), what works, and what doesn t. (2) strategies - answers to the first part, in the correct format! (3) freelist.c.clock - CLOCK implementation (4) buf internals.h.clock - CLOCK implementation (5) perf.txt - TEXT ONLY! Performance comparison, tables, answer to the question. Make sure all the files above are in a directory called ~/Project1 in one of your group member s class account. cd to that directory, and type submit Project1 to submit the files. Each group only needs to submit once. If there are multiple submissions, the last submission by any group member will be used for grading and the calculation of slip days. 8