Virtual Memory Management for Main-Memory KV Database Using Solid State Disk *

Similar documents
Understanding IO patterns of SSDs

A Benchmark For Stroke Extraction of Chinese Characters

Build a Key Value Flash Disk Based Storage System. Flash Memory Summit 2017 Santa Clara, CA 1

Chapter 11 SHANDONG UNIVERSITY 1

Microsemi - Leading Innovation for China s Hyperscale Data Centers

The Design of Everyday Things

如何查看 Cache Engine 缓存中有哪些网站 /URL

Logitech G302 Daedalus Prime Setup Guide 设置指南

Previous on Computer Networks Class 18. ICMP: Internet Control Message Protocol IP Protocol Actually a IP packet

新一代 ODA X5-2 低调 奢华 有内涵

Chapter 1 (Part 2) Introduction to Operating System

组播路由 - MSDP 和 PIM 通过走

OTAD Application Note

操作系统原理与设计. 第 13 章 IO Systems(IO 管理 ) 陈香兰 2009 年 09 月 01 日 中国科学技术大学计算机学院

Microsoft RemoteFX: USB 和设备重定向 姓名 : 张天民 职务 : 高级讲师 公司 : 东方瑞通 ( 北京 ) 咨询服务有限公司

Triangle - Delaunay Triangulator

#MDCC Swift 链式语法应 用 陈乘

AvalonMiner Raspberry Pi Configuration Guide. AvalonMiner 树莓派配置教程 AvalonMiner Raspberry Pi Configuration Guide

计算机组成原理第二讲 第二章 : 运算方法和运算器 数据与文字的表示方法 (1) 整数的表示方法. 授课老师 : 王浩宇

Chapter 7: Deadlocks. Operating System Concepts 9 th Edition

Skill-building Courses Business Analysis Lesson 3 Problem Solving

绝佳的并行处理 - FPGA 加速的根本基石

ICP Enablon User Manual Factory ICP Enablon 用户手册 工厂 Version th Jul 2012 版本 年 7 月 16 日. Content 内容

三 依赖注入 (dependency injection) 的学习

PCU50 的整盘备份. 本文只针对操作系统为 Windows XP 版本的 PCU50 PCU50 启动硬件自检完后, 出现下面文字时, 按向下光标键 光标条停在 SINUMERIK 下方的空白处, 如下图, 按回车键 PCU50 会进入到服务画面, 如下图

Bi-monthly report. Tianyi Luo

Presentation Title. By Author The MathWorks, Inc. 1

Green Computing Cloud Computing LSD Tech Co., Ltd SSD server & SSD Storage Cloud SSD Supercomputer LSD Tech Co., LTD

EqualLogic Best Practices for SQL Server Deployments

Oracle 一体化创新云技术 助力智慧政府信息化战略. Copyright* *2014*Oracle*and/or*its*affiliates.*All*rights*reserved.** *

北 京 忆 恒 创 源 科 技 有 限 公 司 16

TDS - 3. Battery Compartment. LCD Screen. Power Button. Hold Button. Body. Sensor. HM Digital, Inc.

OpenCascade 的曲面.

测试 SFTP 的 问题在归档配置页的 MediaSense

Flash-based Database Systems

CHINA VISA APPLICATION CONCIERGE SERVICE*

密级 : 博士学位论文. 论文题目基于 ScratchPad Memory 的嵌入式系统优化研究

Oriented Scene Text Detection Revisited. Xiang Bai Huazhong University of Science and Technology

Congestion Control Mechanisms for Ad-hoc Social Networks 自组织社会网络中的拥塞控制机制

Chapter2 Instruction Sets

Command Dictionary CUSTOM

云计算入门 Introduction to Cloud Computing GESC1001

Supplementary Materials on Semaphores

我们应该做什么? 告知性分析 未来会发生什么? 预测性分析 为什么会发生 诊断性分析 过去发生了什么? 描述性分析 高级分析 传统 BI. Source: Gartner

第二小题 : 逻辑隔离 (10 分 ) OpenFlow Switch1 (PC-A/Netfpga) OpenFlow Switch2 (PC-B/Netfpga) ServerB PC-2. Switching Hub

实验三十三 DEIGRP 的配置 一 实验目的 二 应用环境 三 实验设备 四 实验拓扑 五 实验要求 六 实验步骤 1. 掌握 DEIGRP 的配置方法 2. 理解 DEIGRP 协议的工作过程

计算机科学与技术专业本科培养计划. Undergraduate Program for Specialty in Computer Science & Technology

: Operating System 计算机原理与设计

2.8 Megapixel industrial camera for extreme environments

5.1 Megapixel machine vision camera with GigE interface

Flymaple V1.1(SKU:DFR0188)

Multiprotocol Label Switching The future of IP Backbone Technology

Safe Memory-Leak Fixing for C Programs

public static InetAddress getbyname(string host) public static InetAddress getlocalhost() public static InetAddress[] getallbyname(string host)

S 1.6V 3.3V. S Windows 2000 Windows XP Windows Vista S USB S RGB LED (PORT1 PORT2 PORT3) S I 2 C. + 表示无铅 (Pb) 并符合 RoHS 标准 JU10 JU14, JU24, JU25

IBM 企业业务连续性方案建议书. System x3850m2+ds4700/ds5000

Digital Asset Management 数字媒体资源管理理 2. Introduction to Digital Media Format

ZWO 相机固件升级参考手册. ZWO Camera Firmware Upgrade reference manual. 版权所有 c 苏州市振旺光电有限公司 保留一切权利 非经本公司许可, 任何组织和个人不得擅自摘抄 复制本文档内容的部分或者全部, 并

云计算入门 Introduction to Cloud Computing GESC1001

武汉大学 学年度第 1 学期 多核架构及编程技术 试卷(A)

Decode Zend. Darkness/Airsupply

测试基础架构 演进之路. 茹炳晟 (Robin Ru) ebay 中国研发中心

SHANDONG UNIVERSITY 1

Computer Networks. Wenzhong Li. Nanjing University

nbns-list netbios-type network next-server option reset dhcp server conflict 1-34

2. Introduction to Digital Media Format

Declaration of Conformity STANDARD 100 by OEKO TEX

mod_callcenter callcenter.conf.xml 范例 odbc-dsn

Parallel Programming Principle and Practice Lecture 7

IEEE 成立于 1884 年, 是全球最大的技术行业协会, 凭借其多样化的出版物 会议 教育论坛和开发标准, 在激励未来几代人进行技术创新方面做出了巨大的贡献, 其数据库产品 IEL(IEEE/IET Electronic Library)

Chapter 1 (Part 1) Computer Abstractions and Technology ( 计算器抽象化与科技 )

The Design and Optimization for the TDMA Network-on-Chip

学习沉淀成长分享 EIGRP. 红茶三杯 ( 朱 SIR) 微博 : Latest update:

libde265 HEVC 性能测试报告

Software Engineering. Zheng Li( 李征 ) Jing Wan( 万静 )

Apache OpenWhisk + Kubernetes:

Apache Kafka 源码编译 Spark 大数据博客 -

Operating Systems. Chapter 4 Threads. Lei Duan

XPS 8920 Setup and Specifications

Command Dictionary -- DAMSTAB

WSV 让网站更加安全的几个小 妙招 徐栋 北京中达金桥技术服务有限公司

ngx_openresty: an Nginx ecosystem glued by Lua

China Next Generation Internet (CNGI) project and its impact. MA Yan Beijing University of Posts and Telecommunications 2009/08/06.

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--- global properties --> <property>

Support for Title 21 CFR Part 11 and Annex 11 compliance: Agilent OpenLAB CDS version 2.1

Air Speaker. Getting started with Logitech UE Air Speaker. 快速入门罗技 UE Air Speaker. Wireless speaker with AirPlay. 无线音箱 (AirPlay 技术 )

Run Oracle on Oracle

最短路径算法 Dijkstra 一 图的邻接表存储结构及实现 ( 回顾 ) 1. 头文件 graph.h. // Graph.h: interface for the Graph class. #if!defined(afx_graph_h C891E2F0_794B_4ADD_8772_55BA3

H3C CAS 虚拟机支持的操作系统列表. Copyright 2016 杭州华三通信技术有限公司版权所有, 保留一切权利 非经本公司书面许可, 任何单位和个人不得擅自摘抄 复制本文档内容的部分或全部, 并不得以任何形式传播 本文档中的信息可能变动, 恕不另行通知

Windows Batch VS Linux Shell. Jason Zhu

Introduction to Computer Science

EBD EBD. end

NyearBluetoothPrint SDK. Development Document--Android

DPDK Summit China 2017

Outline. Motivations (1/3) Distributed File Systems. Motivations (3/3) Motivations (2/3)

Technology: Anti-social Networking 科技 : 反社交网络

SNMP Web Manager. User s Manual

A Dynamic Time Warping Algorithm for Recognition of Multi-Stroke On-Line Handwritten Characters*

梁永健. W K Leung. 华为企业业务 BG 解决方案销售部 CTO Chief Technology Officer, Solution Sales, Huawei

XML allows your content to be created in one workflow, at one cost, to reach all your readers XML 的优势 : 只需一次加工和投入, 到达所有读者的手中

Transcription:

ISSN 1673-9418 CODEN JKYTA8 E-mail: fcst@vip.163.com Journal of Frontiers of Computer Science and Technology http://www.ceaj.org 1673-9418/2011/05(08)-0686-09 Tel: +86-10-51616056 DOI: 10.3778/j.issn.1673-9418.2011.08.002 KV * 韩旭 +, 曹巍, 孟小峰 中国人民大学信息学院, 北京 100872 Virtual Memory Management for Main-Memory KV Database Using Solid State Disk * HAN Xu +, CAO Wei, MENG Xiaofeng School of Information, Renmin University of China, Beijing 100872, China + Corresponding author: E-mail: hanxumelody@ruc.edu.cn HAN Xu, CAO Wei, MENG Xiaofeng. Virtual memory management for main-memory KV database using solid state disk. Journal of Frontiers of Computer Science and Technology, 2011, 5(8): 686-694. Abstract: Key-value in-memory databases have the characteristics of efficiency, usability and scalability. Because of the limits of the capacity of main memory, the applications dealing with large amount of data have to swap data between main memory and disks. While solid state disks (SSDs) have the high performance of random reads as a new storage medium, they can speed up random reads on virtual memory. To remedy the lower performance of random writes on SSDs, this paper proposes an optimization method of write buffer of SSD, which transforms several random writes to a sequential write, and designs a garbage collection policy of SSD, which transforms several random writes to a sequential read and a sequential write, to improve the spatial utilization of key-value in-memory database. Finally, an SSD-based virtual memory implementation is proposed to realize high performance of key-value main memory databases, and the improvement which is at most 40%, is confirmed by changing the source code of Redis in experiment. Key words: key-value; solid state disk (SSD); virtual memory; buffer 主存键值 (key-value, KV) 数据库具有高效性 易用性和可扩展性 由于主存容量有限, 一些数据量较大的应用必须使用磁盘进行数据交换 而固态硬盘 (solid state disk, SSD) 有高速的随机读特点, 使用固态 *The National Natural Science Foundation of China under Grant No. 60833005, 91024032, 61070055 ( ); the National Science and Technology Major Special Projects of China under Grant No. 2010ZX01042-002-003 ( ); the Research Funds of Renmin University of China under Grant No. 10XNI018 ( ). Received 2011-04, Accepted 2011-06.

: KV 687 硬盘作为主存 KV 数据库的虚拟内存会提高对不在主存中的数据的读性能 但是固态硬盘的随机写性能较差, 于是提出了针对固态硬盘的写缓冲区优化算法, 将多个随机写转化为一个连续写, 并设计了固态硬盘虚拟内存的垃圾回收机制, 将多个随机写转化为一个连续读和一个连续写, 从而提高主存 KV 数据库的性能 通过改写源代码, 将该虚拟内存管理应用于 Redis 中, 并进行了实验测试, 结果表明该虚拟内存管理的性能比原有性能最大提升了 40% 键值 ; 固态硬盘 ; 虚拟内存 ; 缓冲区 A TP391 1 30, Codd [1], (relational database management system, RDBMS),, Web 2.0,,,, (key-value, KV) KV key value, KV, value, KV,, key value,, KV Table 1 Comparison between RDBMS and KV databases 1 RDBMS KV SQL SQL,, KV (application program interface, API) SQL,, SQL,,, KV,,, KV, Cassandra Hbase KV

688 Journal of Frontiers of Computer Science and Technology 2011, 5(8) KV, KV KV, KV, KV,, KV KV ( : Memcached), ( : MySQL) KV, KV, KV, (solid state disk, SSD) ; ; SSD, ;,,, : 2 KV ; 3 ; 4 ; 5 2 Web,, KV KV KV : Memcached [2], Memcached API,, Memcached,, Memcached, Memcached,, Memcached, Redis [3], Memcached,,, Redis Redis, Redis,,, Redis Memcached BerkeleyDB(BDB) [4] KV, API BerkeleyDB,, ACID ( ) BerkeleyDB, BerkeleyDB API,, Flash Store [5] Xbox KV,, Flash, key Flash Store Hash (Hash, )

: KV 689 Bloom filter, ( Cache, ), BDB Flash Store : Hash Cuckoo, Hash,,, Hash SkimpyStash [6] Flash Store, Hash, Hash, Hash,, Hash Key,,,,,, 3 3.1, [7],,,,,,,, 32 64 0, 1, [8], (flash translation layer, FTL) [9 10] FTL, SSD 2 [11],, 18.62,, IO [12 13],,,,, Table 2 Performance comparison between Disk and SSD 2 Item Disk SSD Performance ratio Random read 3.80 70.76 18.62 Sequential read 37.45 73.08 1.95 Random write 3.73 3.71 0.99 Sequential write 37.69 68.52 1.82 3.2 KV,,,,

690 Journal of Frontiers of Computer Science and Technology 2011, 5(8),,,, KV,, KV,,, KV,,,,,, KV Fig.1 3.3 The architecture of system 1 KV, Set ( ) Get ( ), KV,,,, key value value,,,,, KV, key,, ;,, ;,,,, 3.4 3.3,,, : 1 1. value 2. // buffer 3. pointer NULL

: KV 691 4. if buffer then 5. pointer 6. value pointer 7. pointer 8. return OK 9. end if 10. value buffer 11. pointer buffer 12. pointer 13. if buffer 14. buffer SSD 15. for buffer 16. pointer 17. pointer 18. end for 19. end if 20. return OK 2 : key key 1. // buffer, value 2. pointer NULL 3. value pointer dictfind(key) 4. if pointer 5. return pointer 6. end if 7. if pointer 8. object 9. object pointer 10. buffer pointer 11. object 12. return object 13. end if 14. if pointer 15. object 16. object pointer 17. object 18. return object 19. end if 20. return NULL 4, SSD,,,,,,, Get, Get,,,,, 3 : key key 1. // buffer, value, block, c_buffer 2. pointer NULL

692 Journal of Frontiers of Computer Science and Technology 2011, 5(8) 3. value pointer dictfind(key) 4. if pointer 5. return pointer 6. end if 7. if pointer 8. object 9. object pointer 10. buffer pointer 11. object 12. return object 13. end if 14. if pointer 15. object 16. object pointer 17. object 18. block pointer id 19. block 20. if block 21. block c_pointer addr(block) 22. c_buffer c_pointer block 23. buffer=merge(buffer,c_buffer) 24. c_pointer buffer block 25. block 26. 27. end if 28. return object 29. end if 30. return NULL,,, 5 5.1 PC, CPU Intel Core2 Quad CPU Q9650, 4 GB, 500 GB 80 GB Intel KV Redis KV, Redis, digg stackoverflow Redis,, Redis 2.2, 5.2 Redis benchmark benchmark 50,, 10, 10, 20 100 Redis Mset Set Get, : (1) (Disk); (2) (SSD); (3) (SSD_BUF); (4) (SSD_GC) Redis, 2~ 4 Fig.2 Performance comparison on Mset 2 Mset 5.3,,,

: KV 693 Fig.3 Performance comparison on Set 3 Set Fig.4 Performance comparison on Get 4 Get,,,,, Get Get,, Get Redis Get, Mset, 40% Redis,, Redis Mset key, Redis Mset, Set Redis, Redis, Set Get, Redis,, Mset KV,, KV 6 KV,,,,,,, References: [1] Codd E F. A relational model of data for large shared data banks[j]. Communications of the ACM, 1970, 13(6): 377 387. [2] Danga Interactive. Memcached[EB/OL]. [2011-03-19]. http://memcached.org [3] Salvatore Sanfilippo. Redis[EB/OL]. [2011-03-19]. http:// redis.io. [4] Sleepycat Software. BerkeleyDB[EB/OL]. [2011-03-21].

694 Journal of Frontiers of Computer Science and Technology 2011, 5(8) http://www.oracle.com/technetwork/database/berkeleydb/ overview/index.html. [5] Debnath B, Sengupta S, Li Jin. FlashStore: high throughhput persistent key-value store[j]. Proceedings of the VLDB Endowment, 2010, 3(1/2): 1414 1425. [6] Debnath B, Sengupta S, Li Jin. SkimpyStash: RAM space skimpy key-value store on flash-based storage[c]//proceedings of the 2011 International Conference on Management of Data (SIGMOD 11). New York, NY, USA: ACM, 2011: 25 36. [7] Mtron. Solid state drive MSD-SATA 3035 product specification[eb/ol]. (2008)[2009-07-19].http://mtron.net/Upload_ Data/Spec/ASiC/MOBI/SATA/MSD-SATA3035_rev0.4.pfd. [8] Samsung Electronics. 1G x 8Bit/2G x 8Bit/4G x 8Bit NAND flash memory, version 1.1[EB/OL]. (2007-06-18) [2009-06-15]. http://www.alldatasheet.com/datasheetpdf/pdf/139788/samsung/k9wag08uia.html. [9] Intel-Corporation. Understanding the flash translation layer (FTL) specifications[eb/ol]. (1998-12)[2009-06-15]. http://www.embeddedfreebsd.org/documents/intel-ftl.pdf. [10] Kim J, Kim J M. A space-efficient flash translation layer for compact-flash systems[j]. IEEE Transactions on Consumer Electronics, 2002, 48(2): 366 375. [11] Liang Zhichao, Zhou Da, Meng Xiaofeng. Sub-Join: query optimization algorithm for flash-based database[j]. Journal of Frontiers of Computer Science and Technology, 2010, 4(5): 401 409. [12] Lee S, Moon B. Design of flash-based DBMS: an in-page logging approach[c]//proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD 07). New York, NY, USA: ACM, 2007: 55 66. [13] Lee S, Moon B, Park C, et al. A case for flash memory SSD in enterprise database applications[c]//proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 08). New York, NY, USA: ACM, 2008: 1075 1086. 附中文参考文献 : [11],,. Sub-Join: [J]., 2010, 4(5): 401 409. HAN Xu was born in 1989. His research interests include flash-based database and key-value store, etc. (1989 ),,,, KV CAO Wei was born in 1975. She received her Ph.D. degree from Renmin University of China in 2009. Now she is a lecturer at School of Information, Renmin University of China, and the member of CCF. Her research interests include high performance database, database tuning and flash-based databases. (1975 ),,, 2009, CCF,,, MENG Xiaofeng was born in 1964. He received his Ph.D. degree from Chinese Academy of Sciences in 1999. Now he is a professor and doctoral supervisor at Renmin University of China, and the senior member of CCF. His research interests include Web data management, cloud data management, mobile data management, XML data management, flash-aware DBMS and privacy protection. (1964 ),, 1999, CCF, Web,,, XML,,