Scalable and Secure Internet Services and Architecture PRESENTATION REPORT Semester: Winter 2015 Course: ECE 7650 SUBMITTED BY: Yashwanth Boddu fq9316@wayne.edu
(1) Our base design uses less than 1 byte in RAM per key-value pair and our enhanced design takes slightly more than 1 byte per key-value pair. In FAWN, even a pointer to a KV pair needs a 4-byte pointer. How can possibly Skimpy Stash achieve such a low memory cost for metadata? Skimpy-Stash moves most of the pointers that locate each key value pair from RAM to the Flash itself. (2) Skimpy Stash uses a hash table directory in RAM to index key-value pairs stored in a log-structure on flash. Why are key-value pairs on the flash organized as a log? The key benefit of the log structured data organization on the Flash may be the high write throughput that can be obtained, since all the updates to the data and the metadata are written in a sequential order in the log. Here the metadata is pointer. (3) The average bucket size is the critical design parameter that serves as a powerful knob for making a continuum of tradeoffs between low RAM usage and low lookup latencies. Please explain this statement. If the bucket size is small, the linked list will be greater in size and therefore we have more lookup latency. And, if the bucket size is more, linked list will be shorter in size and we have less lookup latency but more the size of the bucket, more RAM space will be utilized which is again a backdrop. Hence bucket size is a powerful knob for making a continuum of tradeoffs between low RAM usage and low (4) The client [write] call returns only after the write buffer is flushed to flash. Why cannot such a call be acknowledged earlier? It can t be acknowledged earlier because the memory is still volatile in RAM write buffer. (5) Basic functions: Store, Lookup, Delete Use Figure to explain how these basic functions are executed? Store: A key insert (or, update) operation (set) writes the key-value pair into the RAM write buffer. When there are enough key-value pairs in RAM write buffer to fill a flash page (or, a configurable timeout interval since the client call has expired, say 1 msec), these entries are written to flash and inserted into the RAM HT directory and flash. Lookup: It firsts looks up the RAM write buffer, upon a miss there it lookups the Hash Table directory in RAM and searches the chained key value pair records on flash in the respective bucket.
Delete: A delete operation on a key is supported through insertion of null value for that key. Eventually the null entry and earlier inserted values of the key on flash will be garbage collected. (6) The chain of records on flash pointed to by each slot comprises the bucket of records corresponding to this slot in the HT directory. This is illustrated in Figure 3. Please use the figure to describe Skimpy Stash s data structure. Also explain how lookup, insert, and delete operations are executed. Skimpy Stash s data structure: Initially, a data structure is maintained in RAM that buffers key- value writes so that a write to flash happens only after there is enough data to fill a flash page. The directory structure, for the key value pairs stored on flash, is maintained in RAM and is organized as a hash table with each slot containing a pointer to a chain of records on flash. The chain of records on flash pointed to by each slot comprises the bucket of records corresponding to this slot in the HT directory. The flash store provides persistent storage for the key-value pairs and is organized as circular append log to the tail. In the enhanced design, we use two-choice based load balancing to reduce wide variations in bucket sizes and introduce a bloom filter in each ash table directory slot in RAM for summarizing the records in that bucket so that at most one bucket chain on flash needs to be searched during a lookup.
Lookup: A lookup operation on a key uses the hash function h to obtain the HT directory bucket that this key belongs to. It uses the pointer stored in that slot to follow the chain of records on flash to search the key; upon finding the first record in the chain whose key matches the search key, it returns the value. The number of flash reads for such a lookup is k/2 on the average, and at most the size of the bucket chain in the worst case. Insert: An insert (or, update) operation uses the hash function h to obtain the HT directory bucket that this key belongs to. Let a1 be the address on flash of the first record in this chain (i.e., what the pointer in this slot points to). Then a record is created corresponding to the inserted (or, updated) key-value pair with its next-pointer field equal to a1. This record is appended to the log on flash and its address on flash now becomes the value of the pointer in the respective slot in RAM. Effectively, this new record is inserted at the beginning of the chain corresponding to this bucket. Thus, if this insert operation corresponds to an update operation on an earlier inserted key, the most recent value of the key will be (correctly)read during a lookup operation (the old value being further down the chain and accumulating as garbage in the log). Delete: A delete operation is same as the insert (or, update) with null value for that key. Eventually the null entry on flash and old values of the key will be garbage collected in the log.
(7) Because we store the chain of key-value pairs in each bucket on flash, we incur multiple flash reads upon lookup of a key in the store. Please explain how this issue can be alleviated. [Hint: please refer to Section Compaction to Reduce Flash Reads during ] This can be done by periodically compacting the chain on flash in a bucket by placing the valid keys in the chain contiguously on one or more flash pages that are appended to the tail of log. (8)..two-choice based load balancing strategy is used to reduce variations in the number of keys assigned to each bucket. Explain how this is achieved. This strategy is used to reduce variations in the number of keys assigned to each bucket. With a load balanced design for HT directory, each key would be hashed to 2 candidate HT directory buckets using 2 hash functions h1 and h2, and actually inserted into the one that has currently fewer elements.
(9) when the last record in a bucket chain is encountered in the log during garbage collection, all valid records in that chain are compacted and relocated to the tail of the log.. Please explain how garbage is collected. When a certain configurable fraction of garbage accumulates in the log (in terms of space occupied), the pages on flash from the head of the log are recycled- valid entries from the head of the log are written back to end of the log while invalid entries can be skipped. This effectively leads to the design decision of garbage collecting entire bucket chains on flash at a time.