A Private Heap for HDF5 Quincey Koziol Jan. 15, 2007

Similar documents
RFC: HDF5 File Space Management: Paged Aggregation

Operating Systems. Memory Management. Lecture 9 Michael O Boyle

Caching and Buffering in HDF5

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

20-EECE-4029 Operating Systems Spring, 2013 John Franco

Chapter 8: Memory Management. Operating System Concepts with Java 8 th Edition

Memory Management. Dr. Yingwu Zhu

CS 5523 Operating Systems: Memory Management (SGG-8)

Demand fetching is commonly employed to bring the data

CS 3733 Operating Systems:

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Exercises 6 - Virtual vs. Physical Memory, Cache

HDF5 Single Writer/Multiple Reader Feature Design and Semantics

Operating Systems Unit 6. Memory Management

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Main Memory CHAPTER. Exercises. 7.9 Explain the difference between internal and external fragmentation. Answer:

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science

The levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms

Buffer Management for XFS in Linux. William J. Earl SGI

Chapter 9 Memory Management

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies

UNIT III MEMORY MANAGEMENT

CS370 Operating Systems

Chapter 8: Main Memory

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts


CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

Chapter 8: Memory-Management Strategies

Main Memory (Part II)

1. Creates the illusion of an address space much larger than the physical memory

Memory Management. CSCI 315 Operating Systems Design Department of Computer Science

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Current Topics in OS Research. So, what s hot?

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

Module 8: Memory Management

Operating Systems (2INC0) 2017/18

memory management Vaibhav Bajpai

Module 9: Memory Management. Background. Binding of Instructions and Data to Memory

TRIREME Commander: Managing Simulink Simulations And Large Datasets In Java

Chapter 8: Main Memory

Topics: Memory Management (SGG, Chapter 08) 8.1, 8.2, 8.3, 8.5, 8.6 CS 3733 Operating Systems

File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18

DRAFT. HDF5 Data Flow Pipeline for H5Dread. 1 Introduction. 2 Examples

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS307 Operating Systems Main Memory

Preview. Memory Management

Cluster-Level Google How we use Colossus to improve storage efficiency

Operating Systems Memory Management. Mathieu Delalandre University of Tours, Tours city, France

Memory Management 3/29/14 21:38

Question Points Score Total 100

Computer Architecture. Lecture 8: Virtual Memory

11.1 Segmentation: Generalized Base/Bounds

ECE 598 Advanced Operating Systems Lecture 10

Module 8: Memory Management

Virtual Memory. Chapter 8

Memory Management - Demand Paging and Multi-level Page Tables

File System Structure. Kevin Webb Swarthmore College March 29, 2018

Question 13 1: (Solution, p 4) Describe the inputs and outputs of a (1-way) demultiplexer, and how they relate.

CS307: Operating Systems

Chapter 9 Memory Management Main Memory Operating system concepts. Sixth Edition. Silberschatz, Galvin, and Gagne 8.1

CS 153 Design of Operating Systems Winter 2016

Cache Performance (H&P 5.3; 5.5; 5.6)

Practice Exercises 449

The Google File System

Part Three - Memory Management. Chapter 8: Memory-Management Strategies

1. With respect to space, linked lists are and arrays are.

Memory Management. Dr. Yingwu Zhu

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

File Shredders. and, just what is a file?

HDF5 File Space Management. 1. Introduction

File System Implementation. Sunu Wibirama

P r a t t hr h ee e : e M e M m e o m r o y y M a M n a a n g a e g m e e m n e t 8.1/72

CS153: Memory Management 1

Homework 2. Ahmed Rezine. based on material by: Sudipta Chattopadhyay, Klas Arvidsson. January 21, 2017

So, coming back to this picture where three levels of memory are shown namely cache, primary memory or main memory and back up memory.

Chapter 8: Memory- Manage g me m nt n S tra r t a e t gie i s

Chapter 8: Main Memory

Basic Memory Management

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Chapter 9: Memory Management. Background

Memory Management (2)

Memory Management. Memory Management

Chapter 8: Main Memory

8.1 Background. Part Four - Memory Management. Chapter 8: Memory-Management Management Strategies. Chapter 8: Memory Management

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

General Objective:To understand the basic memory management of operating system. Specific Objectives: At the end of the unit you should be able to:

Figure 1: Organisation for 128KB Direct Mapped Cache with 16-word Block Size and Word Addressable

ecture 33 Virtual Memory Friedland and Weaver Computer Science 61C Spring 2017 April 12th, 2017

Albis: High-Performance File Format for Big Data Systems

Page Tables. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CSE141 Problem Set #4 Solutions

16 Sharing Main Memory Segmentation and Paging

Chapter 8: Memory Management Strategies

Transcription:

A Private Heap for HDF5 Quincey Koziol Jan. 15, 2007 Background The HDF5 library currently stores variable-sized data in two different data structures in its files. Variable-sized metadata (currently only the names of links) is stored in a local heap for each group (http://hdf.ncsa.uiuc.edu/hdf5/doc/h5.format.html#localheap). Variable-sized raw data for datasets (like variable-length and dataset region reference datatypes) is stored in the file s global heap (http://hdf.ncsa.uiuc.edu/hdf5/doc/h5.format.html#globalheap). Limitations of Current Heap Implementations Local heaps have several limitations that need to be overcome: the heap data is stored in a single block and the heap IDs used to locate entries in a local heap is the offset of the entry within the data block. Keeping the heap data in a single block requires that the data block be relocated in the file when it expands and currently requires that the entire data block be read from disk when any entry in it is accessed. Using the offset of the entry within the data block as the heap ID allows for fast lookups of the entry s data, but means that entries cannot be relocated within the data block. Global heaps have different limitations to overcome: the heap s data blocks are shared among all global heap users (currently only the datasets) in the file, the data blocks are not tracked by any data structure and an entry in a data block is located with a linear search. Sharing the data blocks among all datasets in the file means that when a data block is brought into the cache in order to access a particular entry, it may contain only a single entry for the dataset being accessed. Additionally, sharing data blocks among all datasets in the file makes deleting the heap entries used by a dataset being deleted very difficult and time consuming (each element of the dataset being deleted must be accessed and the individual heap entry must be released). The lack of a data structure tracking all data blocks for a file s global heap means that there is no way to locate existing data blocks when none are cached and new entries need to be stored in the heap (requiring a new heap data block to be allocated). Locating an entry in a data block with a linear lookup is time-consuming when a data block is large and stores many small entries. Motivation for a Redesigned Heap As the format for groups in HDF5 is being redesigned currently (in preparation for the 1.8.0 release), it s a good time to update the design of the local heaps. Moreover, it would be nice if there were only one type of heap in the file, so the redesigned heap should attempt to supplant the global heaps as well as the local heaps. Each object in the file (dataset or group) that needed to store variable-length data would have it s own heap, hence the new heaps are called private heaps throughout the rest of this document. 1

Use Cases for a Private Heap Several use cases for the new heap data structure arise from replacing the local and global heaps in the file: The new private heap should replace using a local heap for groups. Thus, it needs to fill the current group operation use cases, in rough order of priority: o Looking up an entry by it s heap ID should be very fast o Heaps should store small entries efficiently (most links for a group are estimated to be ~64 bytes in size) o Heaps should be able to store entries up to at least 64KB in size o The heap should have very small metadata overhead per entry (since the size of entries will be small) o Adding entries to the end of heap should be fast o The heap should allow storing 100,000+ entries o Deleting entries from the heap should be efficient The new private heap should replace using global heaps for storing variablelength data and region references in datasets. Thus, it needs to fill the current dataset operation use cases, in rough order of priority: o Adding entries to the end of a heap should be fast o Looking up an entry by it s heap ID should be fast o Heaps should store both large & small entries efficiently o Heaps should not impose upper limits on an entry s size o Heaps should not impose upper limits on the number of entries stored o Entries in the heap should be able to be reference counted (so fill values in datasets can be implemented efficiently) o Deleting entries from the heap should be possible o Deleting an entire heap should be possible without iterating over each element in a dataset Additionally, the existing local & global heaps are missing several capabilities that would be valuable in future development: o It should be possible to compress the entries in the heap o It should be possible to request storage of a collection of related entries, in a heap of a given size, to avoid fragmentation and improve I/O performance. Design Goals for a Private Heap To replace both the local and global heaps, the new private heap should achieve the following goals: The heap should be self-contained (not global to the entire file or required to be shared with other file objects) The heap should track all of it s data blocks (so that all the space in the file used by the object it belongs to can be accurately determined) IDs for entries in the heap should be persistent (adding or removing entries in the heap should not change the ID of other entries) Existing entries in the heap should be very fast to locate and read 2

Allocating space for and storing new entries in the heap should be very fast The size of a entry should not be limited The number of entries in a heap should not be limited Entries should be able to be removed from the heap, but removal speed should not be a top priority Each entry should have an optional reference count Design of the Private Heap The new heap will be based on a novel data structure, the doubling table, which is described in Appendix 1. An example of a doubling table with a width of 4 and a starting block size of 4096 is shown here: Row Blocks Added Array Size 0 4,096 4,096 4,096 4,096 16,384 1 4,096 4,096 4,096 4,096 32,768 2 8,192 8,192 8,192 8,192 65,536 3 16,384 16,384 16,384 16,384 131,072 4 32,768 32,768 32,768 32,768 262,144 5 65,536 65,536 65,536 65,536 524,288 IDs of objects in the heap are the linear offset of the entry in the heap, as if it were an array. So, an ID of 22384 would be located in the 2 nd block in row 1, at offset 1904. Each block in the doubling table represents a block in the heap, which the HDF5 library brings in from disk when it is accessed. TODO:! describe recursive/fractal nature of heap as larger blocks are used, especially as difference from standard doubling table, along with the resulting indirect/direct blocks.! describe tiny object storage in heap ID itself! describe huge object storage outside the heap! describe optional direct block compression (huge objects too)! followup with use case & design goals met or compromised on 3

Appendix 1 Doubling Tables A doubling table provides a mechanism for quickly extending an array data structure in a way that minimizes the number of empty elements in the array, while retaining very fast lookup of any element within the array. They are described in this appendix. The classic way to increase the size of an array in memory is to double it s size each time the array needs to be extended. This approaches O(n) memory element copy operations and the usage of the array varies between 50% and 100% full. For example, see the table below: (QAK: could use picture) Current Size New Size # of Elements Copied Now Total # of Elements Copied % Full Before Extend Occurs % Full After Extend Occurs 256 512 256 256 100% 50% 512 1024 512 768 100% 50% 1024 2048 1024 1792 100% 50% 2048 4096 2048 3840 100% 50% This scheme may be reasonable for memory arrays, but is difficult for arrays on disk, due to the time involved for repeated data copying, especially as the size of the array increases. Additionally, because each new copy of the array is twice as large as the previous one, the existing space in the file can t be re-used, if it is not at the end of the file, leading to a file that could be twice as large as necessary. However, if we think about an array more abstractly, as an ordered collection of homogenous elements, we could perhaps store those abstract elements in a set of blocks of elements, to facilitate more interesting array algorithms. Then, instead of copying the array elements each time the array s size is increased, it would be possible to leave the old array block where it was when a new block was added to the array and to build up an index for locating array elements in blocks that are added. For example, see the tables below: (QAK: could use a picture, showing index & blocks) Block # Block Size Array Size % Full After Extend Occurs 0 256 256 50% 1 256 512 50% 2 512 1024 50% 3 1024 2048 50% 4 2048 4096 50% Element # s Block # 0 255 0 256 511 1 512 1023 2 4

1024 2047 3 2048 4095 4 This approach reduces the amount of element copying, but when the array expands, the new array is still 50% empty. Now, instead of doubling the block size each time, if we keep the same block size as several blocks are added to the array, we could increase the percent of the array that is full when each block is added. For example, if we start with 256 byte blocks and have four blocks of the same size before doubling the size of the block to 512 bytes, etc., the blocks added and the percent full of the array would look like this: (QAK: could use a picture, showing relative block sizes, as the are added) Block size 256 0 256 50% 256 66.6% 256 75% 512 66.6% 512 75% 512 80% 512 83.3% 1024 75% % Full when extend occurs (QAK: add another example, with a different width) If this trend is continued, it can be seen that the percent full when a new size of blocks is added approaches 80% full when adding four blocks of the same size (a width of 4 the width is the # of blocks added of the same size, before doubling to the next size). After some experimentation, the following table of percent full information was derived: Doubling Table Width Low % Full When New Block Added* High % Full When New Block Added* Low to High Range 1 50.00000% 50.00000% 0.00000% 2 66.66667% 75.00000% 8.33333% 5

4 80.00000% 87.50000% 7.50000% 8 88.88889% 93.75000% 4.86111% 16 94.11765% 96.87500% 2.75735% 32 96.96970% 98.43750% 1.46780% 64 98.46154% 99.21875% 0.75721% 128 99.22481% 99.60938% 0.38457% 256 99.61089% 99.80469% 0.19379% 512 99.80507% 99.90234% 0.09728% 1024 99.90244% 99.95117% 0.04873% * - In the limit as infinite array size is reached. (QAK: show graph of % Full for several widths, as blocks are added) Clearly, making the table wider benefits the percent full when adding new blocks to the array. Probably a point of diminishing returns is reached in the 16-64 range of widths. Also, this progression is only shown for widths that are powers of two. Other widths follow a similar progression, but are a poor match for implementing in software. Additionally, in order to work out an easy algorithm for computing which block an array offset corresponds to, we found that it was better to make the first two rows of blocks in the table be the same size, before starting to double the blocks sizes. An example for a table with width 4 is: Row Blocks Added Array Size 0 256 256 256 256 1024 1 256 256 256 256 2048 2 512 512 512 512 4096 3 1024 1024 1024 1024 8192 4 2048 2048 2048 2048 16384 5 4096 4096 4096 4096 32768 This corresponds to rolling up all the blocks that are smaller than the initial block size and makes the array size a nice power of two also. TODO:! Show equation for looking up the block for an array element index 6