ISSN 1673-9418 CODEN JKYTA8 E-mail: fcst@vip.163.com Journal of Frontiers of Computer Science and Technology http://www.ceaj.org 1673-9418/2011/05(08)-0686-09 Tel: +86-10-51616056 DOI: 10.3778/j.issn.1673-9418.2011.08.002 KV * 韩旭 +, 曹巍, 孟小峰 中国人民大学信息学院, 北京 100872 Virtual Memory Management for Main-Memory KV Database Using Solid State Disk * HAN Xu +, CAO Wei, MENG Xiaofeng School of Information, Renmin University of China, Beijing 100872, China + Corresponding author: E-mail: hanxumelody@ruc.edu.cn HAN Xu, CAO Wei, MENG Xiaofeng. Virtual memory management for main-memory KV database using solid state disk. Journal of Frontiers of Computer Science and Technology, 2011, 5(8): 686-694. Abstract: Key-value in-memory databases have the characteristics of efficiency, usability and scalability. Because of the limits of the capacity of main memory, the applications dealing with large amount of data have to swap data between main memory and disks. While solid state disks (SSDs) have the high performance of random reads as a new storage medium, they can speed up random reads on virtual memory. To remedy the lower performance of random writes on SSDs, this paper proposes an optimization method of write buffer of SSD, which transforms several random writes to a sequential write, and designs a garbage collection policy of SSD, which transforms several random writes to a sequential read and a sequential write, to improve the spatial utilization of key-value in-memory database. Finally, an SSD-based virtual memory implementation is proposed to realize high performance of key-value main memory databases, and the improvement which is at most 40%, is confirmed by changing the source code of Redis in experiment. Key words: key-value; solid state disk (SSD); virtual memory; buffer 主存键值 (key-value, KV) 数据库具有高效性 易用性和可扩展性 由于主存容量有限, 一些数据量较大的应用必须使用磁盘进行数据交换 而固态硬盘 (solid state disk, SSD) 有高速的随机读特点, 使用固态 *The National Natural Science Foundation of China under Grant No. 60833005, 91024032, 61070055 ( ); the National Science and Technology Major Special Projects of China under Grant No. 2010ZX01042-002-003 ( ); the Research Funds of Renmin University of China under Grant No. 10XNI018 ( ). Received 2011-04, Accepted 2011-06.
: KV 687 硬盘作为主存 KV 数据库的虚拟内存会提高对不在主存中的数据的读性能 但是固态硬盘的随机写性能较差, 于是提出了针对固态硬盘的写缓冲区优化算法, 将多个随机写转化为一个连续写, 并设计了固态硬盘虚拟内存的垃圾回收机制, 将多个随机写转化为一个连续读和一个连续写, 从而提高主存 KV 数据库的性能 通过改写源代码, 将该虚拟内存管理应用于 Redis 中, 并进行了实验测试, 结果表明该虚拟内存管理的性能比原有性能最大提升了 40% 键值 ; 固态硬盘 ; 虚拟内存 ; 缓冲区 A TP391 1 30, Codd [1], (relational database management system, RDBMS),, Web 2.0,,,, (key-value, KV) KV key value, KV, value, KV,, key value,, KV Table 1 Comparison between RDBMS and KV databases 1 RDBMS KV SQL SQL,, KV (application program interface, API) SQL,, SQL,,, KV,,, KV, Cassandra Hbase KV
688 Journal of Frontiers of Computer Science and Technology 2011, 5(8) KV, KV KV, KV, KV,, KV KV ( : Memcached), ( : MySQL) KV, KV, KV, (solid state disk, SSD) ; ; SSD, ;,,, : 2 KV ; 3 ; 4 ; 5 2 Web,, KV KV KV : Memcached [2], Memcached API,, Memcached,, Memcached, Memcached,, Memcached, Redis [3], Memcached,,, Redis Redis, Redis,,, Redis Memcached BerkeleyDB(BDB) [4] KV, API BerkeleyDB,, ACID ( ) BerkeleyDB, BerkeleyDB API,, Flash Store [5] Xbox KV,, Flash, key Flash Store Hash (Hash, )
: KV 689 Bloom filter, ( Cache, ), BDB Flash Store : Hash Cuckoo, Hash,,, Hash SkimpyStash [6] Flash Store, Hash, Hash, Hash,, Hash Key,,,,,, 3 3.1, [7],,,,,,,, 32 64 0, 1, [8], (flash translation layer, FTL) [9 10] FTL, SSD 2 [11],, 18.62,, IO [12 13],,,,, Table 2 Performance comparison between Disk and SSD 2 Item Disk SSD Performance ratio Random read 3.80 70.76 18.62 Sequential read 37.45 73.08 1.95 Random write 3.73 3.71 0.99 Sequential write 37.69 68.52 1.82 3.2 KV,,,,
690 Journal of Frontiers of Computer Science and Technology 2011, 5(8),,,, KV,, KV,,, KV,,,,,, KV Fig.1 3.3 The architecture of system 1 KV, Set ( ) Get ( ), KV,,,, key value value,,,,, KV, key,, ;,, ;,,,, 3.4 3.3,,, : 1 1. value 2. // buffer 3. pointer NULL
: KV 691 4. if buffer then 5. pointer 6. value pointer 7. pointer 8. return OK 9. end if 10. value buffer 11. pointer buffer 12. pointer 13. if buffer 14. buffer SSD 15. for buffer 16. pointer 17. pointer 18. end for 19. end if 20. return OK 2 : key key 1. // buffer, value 2. pointer NULL 3. value pointer dictfind(key) 4. if pointer 5. return pointer 6. end if 7. if pointer 8. object 9. object pointer 10. buffer pointer 11. object 12. return object 13. end if 14. if pointer 15. object 16. object pointer 17. object 18. return object 19. end if 20. return NULL 4, SSD,,,,,,, Get, Get,,,,, 3 : key key 1. // buffer, value, block, c_buffer 2. pointer NULL
692 Journal of Frontiers of Computer Science and Technology 2011, 5(8) 3. value pointer dictfind(key) 4. if pointer 5. return pointer 6. end if 7. if pointer 8. object 9. object pointer 10. buffer pointer 11. object 12. return object 13. end if 14. if pointer 15. object 16. object pointer 17. object 18. block pointer id 19. block 20. if block 21. block c_pointer addr(block) 22. c_buffer c_pointer block 23. buffer=merge(buffer,c_buffer) 24. c_pointer buffer block 25. block 26. 27. end if 28. return object 29. end if 30. return NULL,,, 5 5.1 PC, CPU Intel Core2 Quad CPU Q9650, 4 GB, 500 GB 80 GB Intel KV Redis KV, Redis, digg stackoverflow Redis,, Redis 2.2, 5.2 Redis benchmark benchmark 50,, 10, 10, 20 100 Redis Mset Set Get, : (1) (Disk); (2) (SSD); (3) (SSD_BUF); (4) (SSD_GC) Redis, 2~ 4 Fig.2 Performance comparison on Mset 2 Mset 5.3,,,
: KV 693 Fig.3 Performance comparison on Set 3 Set Fig.4 Performance comparison on Get 4 Get,,,,, Get Get,, Get Redis Get, Mset, 40% Redis,, Redis Mset key, Redis Mset, Set Redis, Redis, Set Get, Redis,, Mset KV,, KV 6 KV,,,,,,, References: [1] Codd E F. A relational model of data for large shared data banks[j]. Communications of the ACM, 1970, 13(6): 377 387. [2] Danga Interactive. Memcached[EB/OL]. [2011-03-19]. http://memcached.org [3] Salvatore Sanfilippo. Redis[EB/OL]. [2011-03-19]. http:// redis.io. [4] Sleepycat Software. BerkeleyDB[EB/OL]. [2011-03-21].
694 Journal of Frontiers of Computer Science and Technology 2011, 5(8) http://www.oracle.com/technetwork/database/berkeleydb/ overview/index.html. [5] Debnath B, Sengupta S, Li Jin. FlashStore: high throughhput persistent key-value store[j]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 1414 1425. [6] Debnath B, Sengupta S, Li Jin. SkimpyStash: RAM space skimpy key-value store on flash-based storage[c]//proceedings of the 2011 International Conference on Management of Data (SIGMOD 11). New York, NY, USA: ACM, 2011: 25 36. [7] Mtron. Solid state drive MSD-SATA 3035 product specification[eb/ol]. (2008)[2009-07-19].http://mtron.net/Upload_ Data/Spec/ASiC/MOBI/SATA/MSD-SATA3035_rev0.4.pfd. [8] Samsung Electronics. 1G x 8Bit/2G x 8Bit/4G x 8Bit NAND flash memory, version 1.1[EB/OL]. (2007-06-18) [2009-06-15]. http://www.alldatasheet.com/datasheetpdf/pdf/139788/samsung/k9wag08uia.html. [9] Intel-Corporation. Understanding the flash translation layer (FTL) specifications[eb/ol]. (1998-12)[2009-06-15]. http://www.embeddedfreebsd.org/documents/intel-ftl.pdf. [10] Kim J, Kim J M. A space-efficient flash translation layer for compact-flash systems[j]. IEEE Transactions on Consumer Electronics, 2002, 48(2): 366 375. [11] Liang Zhichao, Zhou Da, Meng Xiaofeng. Sub-Join: query optimization algorithm for flash-based database[j]. Journal of Frontiers of Computer Science and Technology, 2010, 4(5): 401 409. [12] Lee S, Moon B. Design of flash-based DBMS: an in-page logging approach[c]//proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD 07). New York, NY, USA: ACM, 2007: 55 66. [13] Lee S, Moon B, Park C, et al. A case for flash memory SSD in enterprise database applications[c]//proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 08). New York, NY, USA: ACM, 2008: 1075 1086. 附中文参考文献 : [11],,. Sub-Join: [J]., 2010, 4(5): 401 409. HAN Xu was born in 1989. His research interests include flash-based database and key-value store, etc. (1989 ),,,, KV CAO Wei was born in 1975. She received her Ph.D. degree from Renmin University of China in 2009. Now she is a lecturer at School of Information, Renmin University of China, and the member of CCF. Her research interests include high performance database, database tuning and flash-based databases. (1975 ),,, 2009, CCF,,, MENG Xiaofeng was born in 1964. He received his Ph.D. degree from Chinese Academy of Sciences in 1999. Now he is a professor and doctoral supervisor at Renmin University of China, and the senior member of CCF. His research interests include Web data management, cloud data management, mobile data management, XML data management, flash-aware DBMS and privacy protection. (1964 ),, 1999, CCF, Web,,, XML,,