MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Size: px
Start display at page:

Download "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices"

Transcription

1 MQSm: A Framework for Enablng Realstc Studes of Modern Mult-Queue SSD Devces Arash Tavakkol, Juan Gómez-Luna, and Mohammad Sadrosadat, ETH Zürch; Saugata Ghose, Carnege Mellon Unversty; Onur Mutlu, ETH Zürch and Carnege Mellon Unversty Ths paper s ncluded n the Proceedngs of the th USENIX Conference on Fle and Storage Technologes. February 5, Oakland, CA, USA ISBN Open access to the Proceedngs of the th USENIX Conference on Fle and Storage Technologes s sponsored by USENIX.

2 MQSm: A Framework for Enablng Realstc Studes of Modern Mult-Queue SSD Devces Arash Tavakkol, Juan Gómez-Luna, Mohammad Sadrosadat, Saugata Ghose, Onur Mutlu ETH Zürch Carnege Mellon Unversty Abstract Sold-state drves (SSDs) are used n a wde array of computer systems today, ncludng n datacenters and enterprse servers. As the I/O demands of these systems contnue to ncrease, manufacturers are evolvng SSD archtectures to keep up wth ths demand. For example, manufacturers have ntroduced new hgh-bandwdth nterfaces to replace the conventonal SATA host nterface protocol. These new nterfaces, such as the NVMe protocol, are desgned specfcally to enable the hgh amounts of concurrent I/O bandwdth that SSDs are capable of delverng. Whle modern SSDs wth sophstcated features such as the NVMe protocol are already on the market, exstng SSD smulaton tools have fallen behnd, as they do not capture these new features. We fnd that state-of-theart SSD smulators have three shortcomngs that prevent them from accurately modelng the performance of real off-the-shelf SSDs. Frst, these smulators do not model crtcal features of new protocols (e.g., NVMe), such as ther use of multple applcaton-level queues for requests and the elmnaton of OS nterventon for I/O request processng. Second, these smulators often do not accurately capture the mpact of advanced SSD mantenance algorthms (e.g., garbage collecton), as they do not properly or quckly emulate steady-state condtons that can sgnfcantly change the behavor of these algorthms n real SSDs. Thrd, these smulators do not capture the full end-to-end latency of I/O requests, whch can ncorrectly skew the results reported for SSDs that make use of emergng non-volatle memory technologes. By not accurately modelng these three features, exstng smulators report results that devate sgnfcantly from real SSD performance. In ths work, we ntroduce a new smulator, called MQSm, that accurately models the performance of both modern SSDs and conventonal SATA-based SSDs. MQSm fathfully models new hgh-bandwdth protocol mplementatons, steady-state SSD condtons, and the full end-to-end latency of requests n modern SSDs. We valdate MQSm, showng that t reports performance results that are only 6%-% apart from the measured actual performance of four real state-of-the-art SSDs. We show that by modelng crtcal features of modern SSDs, MQSm uncovers several real and mportant ssues that were not captured by exstng smulators, such as the performance mpact of nter-flow nterference. We have released MQSm as an open-source tool, and we hope that t can enable researchers to explore drectons n new and dfferent areas. Introducton Sold-state drves (SSDs) are wdely used n today s computer systems. Due to ther hgh throughput, low response tme, and decreasng cost, SSDs have replaced tradtonal magnetc hard dsk drves (HDDs) n many datacenters and enterprse servers, as well as n consumer devces. As the I/O demand of both enterprse and consumer applcatons contnues to grow, SSD archtectures are rapdly evolvng to delver mproved performance. For example, a major nnovaton has been the ntroducton of new host nterfaces to the SSD. In the past, many SSDs made use of the Seral Advanced Technology Attachment (SATA) protocol [67], whch was orgnally desgned for HDDs. Over tme, SATA has proven to be neffcent for SSDs, as t cannot enable the fast I/O accesses and mllons of I/O operatons per second (IOPS) that contemporary SSDs are capable of delverng. New protocols such as NVMe [63] overcome these barrers as they are desgned specfcally for the hgh throughput avalable n SSDs. NVMe enables hgh throughput and low latency for I/O requests through ts use of the mult-queue SSD (MQ-SSD) concept. Whle SATA exposes only a sngle request port to the OS, MQ-SSD protocols provde multple request queues to drectly expose applcatons to the SSD devce controller. Ths allows () an applcaton to bypass OS nterventon for I/O request processng, and () the SSD controller to schedule I/O requests based on how busy the SSD s resources are. As a result, the SSD can make hgher-performance I/O request schedulng decsons. As SSDs and ther assocated protocols evolve to keep pace wth changng system demands, the research communty needs smulaton tools that relably model these new features. Unfortunately, state-of-the-art SSD smulators do not model a number of key propertes of modern SSDs that are already on the market. We evaluate several real modern SSDs, and fnd that state-of-the-art smulators do not capture three features that are crtcal to accurately model modern SSD behavor. Frst, these smulators do not correctly model the mult-queue approach used n modern SSD protocols. Instead, they mplement only the sngle-queue approach used n HDD-based protocols such as SATA. As a result, exstng smulators do not capture () the hgh amount of request-level parallelsm and () the lack of OS nterventon n modern SSDs. Second, many smulators do not adequately model steady-state behavor wthn a reasonable amount of smulaton tme. A number of fundamental SSD mantenance algorthms, such as garbage collecton [ 3, 3], are not executed when an SSD s new (.e., no data has been wrtten to the drve). As a result, manufacturers desgn these mantenance algorthms to work best when an SSD reaches the steady-state operatng pont (.e., after all of the pages wthn the SSD have been wrtten to at least once) [7]. However, smulators that cannot capture steady-state behavor (wthn a reasonable USENIX Assocaton th USENIX Conference on Fle and Storage Technologes 9

3 smulaton tme) perform these mantenance algorthms on a new SSD. As such, many exstng smulators do not adequately capture algorthm behavor under realstc condtons, and often report unrealstc SSD performance results (as we dscuss n Secton 3.). Thrd, these smulators do not capture the full end-toend latency of performng I/O requests. Exstng smulators capture only the part of the request latency that takes place durng ntra-ssd operatons. However, many emergng hgh-speed non-volatle memores greatly reduce the latency of ntra-ssd operatons, and, thus, the uncaptured parts of the latency now make up a sgnfcant porton of the overall request latency. For example, n Intel Optane SSDs, whch make use of 3D XPont memory [9, 5], the overhead of processng a request and transferrng data over the system I/O bus (e.g., PCIe) s much hgher than the memory access latency []. By not capturng the full end-to-end latency, exstng smulators do not report the true performance of SSDs wth new and emergng memory technologes. Based on our evaluaton of real modern SSDs, we fnd that these three features are essental for a smulator to capture. Because exstng smulators do not model these features adequately, ther results devate sgnfcantly from the performance of real SSDs. Our goal n ths work s to develop a new SSD smulator that can fathfully model the features and performance of both modern mult-queue SSDs and conventonal SATA-based SSDs. To ths end, we ntroduce MQSm, a new smulator that provdes an accurate and flexble framework for evaluatng SSDs. MQSm addresses the three shortcomngs we found n exstng smulators, by () provdng detaled models of both conventonal (e.g., SATA) and modern (e.g., NVMe) host nterfaces; () accurately and quckly modelng steady-state SSD behavor; and (3) measurng the full end-to-end latency of a request, from the tme an applcaton enqueues a request to the tme the request response arrves at the host. To allow MQSm to adapt easly to future SSD developments, we employ a modular desgn for the smulator. Our modular approach allows users to easly modfy the mplementaton of a sngle component (e.g., I/O scheduler, address mappng) wthout the need to change other parts of the smulator. We provde two executon modes for MQSm: () standalone executon, and () ntegrated executon wth the gem5 full-system smulator []. We valdate the performance reported by MQSm usng several real SSDs. We fnd that the response tme results reported by MQSm are very close to the response tmes of the real SSDs, wth an average (maxmum) error of only % (%) for real storage workload traces. By fathfully modelng the major features found n modern SSDs, MQSm can uncover several ssues that exstng smulators are unable to demonstrate. One such ssue s the performance mpact of nter-flow nterference n modern MQ-SSDs. For two or more concurrent flows (.e., streams of I/O requests from multple applcatons), there are three major sources of nterference: () the wrte cache, () the mappng table, and (3) the I/O scheduler. Usng MQSm, we fnd that nter-flow nterference leads to sgnfcant unfarness (.e., the nterference slows down each flow unequally) n modern SSDs. Ths s a major concern, as farness s a frst-class desgn goal n modern computng platforms [, 7, 9, 3, 37, 56 6, 66, 73 76,,, ]. Unfarness reduces the predctablty of the I/O latency and throughput for each flow, and can allow a malcous flow to deny or delay I/O servce to other, bengn flows. We have made MQSm avalable as an open source tool to the research communty []. We hope that MQSm enables researchers to explore drectons n several new and dfferent areas. We make the followng key contrbutons n ths work: We use real off-the-shelf SSDs to show that stateof-the-art SSD smulators do not adequately capture three mportant propertes of modern SSDs: () the mult-queue model used by modern host nterface protocols such as NVMe, () steady-state SSD behavor, and (3) the end-to-end I/O request latency. We ntroduce MQSm, a smulator that accurately models both modern NVMe-based and conventonal SATA-based SSDs. To our knowledge, MQSm s the frst publcly-avalable SSD smulator to fathfully model the NVMe protocol. We valdate the results reported by MQSm aganst several real state-of-the-art mult-queue SSDs. We demonstrate how MQSm can uncover mportant ssues n modern SSDs that exstng smulators cannot capture, such as the mpact of nter-flow nterference on farness and system performance. Background In ths secton, we provde a bref background on multqueue SSD (MQ-SSD) devces. Frst, we dscuss the nternal organzaton of an MQ-SSD (Secton.). Next, we dscuss host nterface protocols commonly used by SSDs (Secton.). Fnally, we dscuss how the SSD flash translaton layer (FTL) handles requests and performs mantenance tasks (Secton.3).. SSD Internals Modern MQ-SSDs are typcally bult usng NAND flash memory chps. NAND flash memory [, ] supports read and wrte operatons at the granularty of a flash page (typcally kb). Insde the NAND flash chps, multple pages are grouped together nto a flash block, whch s the granularty at whch erase operatons take place. Flash wrtes can take place only to pages that are erased (.e., free). To mnmze the wrte latency, MQ-SSDs perform out-of-place updates (.e., when a logcal page s updated, ts data s wrtten to a dfferent, free physcal page, and the logcal-to-physcal mappng s updated). Ths avods the need to erase the old physcal page durng a wrte operaton. Instead, the old page s marked as nvald, and a garbage collecton procedure [ 3, 3] reclams nvald physcal pages n the background. Fgure shows the nternal organzaton of an MQ- SSD. The components nsde the MQ-SSD are dvded nto two groups: () the back end, whch ncludes the memory devces; and () the front end, whch ncludes the control and management unts. The memory devces (e.g., NAND flash memory [, ], phase-change 5 th USENIX Conference on Fle and Storage Technologes USENIX Assocaton

4 MQ-SSD Cached Host DRAM SQ CQ SQ CQ Root Complex PCIe Bus PCIe Swtch Detaled host-to-devce SQ N data transmsson model CQ N n MQSm 3 SQ: I/O Submsson Queue CQ: I/O Completon Queue Front end HIL FTL Request, LPA Cache LPA Address PPA Transacton Page Management Translaton Schedulng Request, Page M Devce-level I/O Request Queue Wrte Cache DRAM Chp Queue Chp Queue Chp Queue Chp 3 Queue FCC WRQ RDQ GC-WRQ GC-RDQ Detaled request processng delay model, and Mult-queue request Support for mult-queue-aware cache and processng n MQSm address mappng n MQSm 3 Back end Channel Chp Chp Channel FCC Chp Chp 3 Multplexed Interface Bus Interface Plane Plane Plane Plane De De Fast and effcent precondtonng n MQSm Fgure : Organzaton of an MQ-SSD. As hghlghted n the fgure (,, 3 ), our MQSm smulator captures several aspects of MQ-SSDs not modeled by exstng smulators. memory [], STT-MRAM [], 3D XPont [9]) n the back end are organzed n a hghly-herarchcal manner to maxmze I/O concurrency. The back end contans multple ndependent bus channels, whch connect the memory devces to the front end. Each channel connects to one or more memory chps. For a NAND flash memory based SSD, each NAND flash chp s typcally dvded nto multple des, where each de can ndependently execute memory commands. All of the des wthn a chp share a common communcaton nterface. Each de s made up of one or more planes, whch are arrays of flash cells. Each plane contans multple blocks. Multple planes wthn a sngle de can execute memory operatons n parallel only f each plane s executng the same command on the same address offset wthn the plane. In an MQ-SSD, the front end ncludes three major components [7]. () The host nterface logc (HIL) mplements the protocol used to communcate wth the host (Secton.). () The flash translaton layer (FTL) manages flash resources and processes I/O requests (Secton.3). (3) The flash chp controllers (FCCs) send commands to and transfer data to/from the memory chps n the back end. The front end contans on-board DRAM, whch s used by the three components to cache applcaton data and store data structures for flash management.. Host Interface Logc The HIL plays a crtcal role n leveragng the nternal parallelsm of the NAND flash memory to provde hgher I/O performance to the host. The SATA protocol [67] s commonly used for conventonal SSDs, due to wdespread support for SATA on enterprse and clent systems. SATA employs Natve Command Queung (NCQ), whch allows the SSD to concurrently execute I/O requests. NCQ allows the SSD to schedule multple I/O requests based on whch back end resources are currently dle [9, 5]. The NVM Express (NVMe) protocol [63] was desgned to allevate the bottlenecks of SATA [9], and to enable scalable, hgh-bandwdth, and low-latency communcaton over the PCIe bus. When an applcaton ssues an I/O request n NVMe, t bypasses the I/O stack n the OS and the block layer queue, and nstead drectly nserts the request nto a submsson queue (SQ n Fgure ) dedcated to the applcaton. The SSD then selects a request from the SQ, performs the request, and nserts the request s job completon nformaton (e.g., ack, read data) nto the request completon queue (CQ) for the correspondng applcaton. NVMe has already been wdely adopted n modern SSD products [3, 6, 79, 5, 6]..3 Flash Translaton Layer The FTL executes on a mcroprocessor wthn the SSD, performng I/O requests and flash management procedures [, ]. Handlng an I/O request n the FTL requres four steps for an SSD usng NVMe. Frst, when the HIL selects a request from the SQ, t nserts the request nto a devce-level queue. Second, the HIL breaks the request down nto multple flash transactons, where each transacton s at the granularty of a sngle page. Next, the FTL checks f the request s a wrte. If t s, and the MQ-SSD supports wrte cachng, the wrte cache management unt stores the data for each transacton n the wrte cache space wthn DRAM, and asks the HIL to prepare a response. Otherwse, the FTL translates the logcal page address (LPA) of the transacton nto a physcal page address (PPA), and enqueues the transacton nto the correspondng chp-level queue. There are separate queues for reads (RDQ) and for wrtes (WRQ). The transacton schedulng unt (TSU) resolves resource contenton among the pendng transactons n the chp-level queue, and sends transactons that can be performed to ts correspondng FCC [, 7]. Fnally, when all transactons for a request fnsh, the FTL asks the HIL to prepare a response, whch s then delvered to the host. The address translaton module of the FTL plays a key role n mplementng out-of-place updates. When a transacton wrtes to an LPA, a page allocaton scheme assgns the LPA to a free PPA. The LPA-to-PPA mappng s recorded n a mappng table, whch s stored wthn the non-volatle memory and cached n DRAM (to reduce the latency of mappng lookups) []. When a transacton reads from an LPA, the module searches for the LPA s mappng and retreves the PPA. The FTL s also responsble for memory wearout management (.e., wear-levelng) and garbage collecton (GC) [ 3, 3]. GC s trggered when the number of free pages drops below a threshold. The GC procedure reclams nvaldated pages, by selectng a canddate block wth a hgh number of nvald pages, movng any vald pages n the block nto a free block, and then erasng the canddate block. Any read and wrte transactons USENIX Assocaton th USENIX Conference on Fle and Storage Technologes 5

5 generated durng GC are nserted nto dedcated read (GC-RDQ) and wrte (GC-WRQ) queues. Ths allows the transacton schedulng unt to schedule GC-related requests durng dle perods. 3 Smulaton Challenges for Modern MQ-SSDs In ths secton, we compare the capabltes of state-ofthe-art SSD smulators to the common features of the modern SSD devces. As shown n Fgure, we dentfy three sgnfcant features of modern SSDs that are not supported by current smulaton tools: mult-queue support, fast modelng of steady-state behavor, and 3 proper modelng of the end-to-end request latency. Whle some of these features are also present n some conventonal SSDs, ther lack of support n exstng smulators s more crtcal when we evaluate modern and emergng MQ-SSDs, resultng n large devatons between smulaton results and measured performance. 3. Mult-Queue Support A fundamental dfference of a modern MQ-SSD from a conventonal SSD s ts use of multple queues that drectly expose the devce controller to applcatons [9]. For conventonal SSDs, the OS I/O scheduler coordnates concurrent accesses to the storage devces and ensures farness for co-runnng applcatons [66, 6]. MQ- SSDs elmnate the OS I/O scheduler, and are themselves responsble for farly servcng I/O requests from concurrently-runnng applcatons and guaranteeng hgh responsveness. Exposng applcaton-level queues to the storage devce enables the use of many optmzed management technques n the MQ-SSD controller, whch can provde hgh performance and a hgh level of both farness and responsveness. Ths s manly due to the fact that the devce controller can make better schedulng decsons than the OS, as the devce controller knows the current status of the SSD s nternal resources. We nvestgate how the performance of a flow changes when the flow s concurrently executed wth other flows on real MQ-SSDs. We conduct a set of experments where we control the ntensty of synthetc workloads that run on four new off-the-shelf MQ-SSDs released between and 7 (see Table and Appendx A). In each experment, there are two flows, Flow- and Flow-, where each flow always keeps ts I/O queue full wth only sequental read accesses of kb average request sze. We control the ntensty of a flow by adjustng ts I/O queue depth. A deeper I/O queue results n a more ntensve flow. We hold the I/O queue depth of Flow- constant n all experments, settng t to requests. We sweep eght dfferent values for the I/O queue depth of Flow-, rangng from to requests. To quantfy the I/O servce farness of each devce, we measure the average slowdown of each executed flow, and then use the slowdown to calculate farness usng Equaton. We defne the slowdown of a flow f as S f = RTf shared /RTf alone, where RTf shared s the response tme of f when t s run concurrently wth other flows, and We assume that each I/O flow uses a separate I/O queue RTf alone s the response tme of f when t runs alone. Farness (F) s calculated as [, 56, 5]: MIN{S f } F = () MAX{S f } Accordng to the above defnton: < F. Lower F values ndcate hgher dfferences between the mnmum and maxmum slowdowns of all concurrently-runnng flows, whch we say s more unfar to the flow that s slowed down the most. Hgher F values are desrable. Fgure depcts the slowdown, normalzed throughput (IOPS), and farness results when we execute Flow- and Flow- concurrently on our four target MQ-SSDs (whch we call SSD-A, SSD-B, SSD-C, and SSD-D). The x-axes n all of the plots n Fgure represent the queue depth (.e., the flow ntensty) of Flow- n the experments. For each SSD, we show three plots from left to rght: () the slowdown and normalzed throughput of Flow-, () the slowdown and normalzed throughput of Flow-, and (3) farness SSD-A Flow SSD-B 35 Flow SSD-C SSD-D.6... Flow- Flow Throughput Farness Flow Farness..6.. Throughput Farness Flow Farness..6.. Throughput Farness Flow Farness..6.. Throughput Farness.5.5 Flow Farness Fgure : Performance of Flow- (left) and Flow- (center), and farness (rght), when flows are concurrently executed wth dfferent ntenstes on four real MQ-SSDs. We make four major observatons from Fgure. Frst, n SSD-A, SSD-B, and SSD-C, the throughput of Flow- substantally ncreases proportonately wth the queue depth. Asde from the maxmum bandwdth avalable from the SSD, there s no lmt on the throughput of each I/O flow. Second, Flow- s slowed down sgnfcantly due to nterference from Flow- when the I/O queue depth of Flow- s much greater than that of Flow-. Thrd, for SSD-A, SSD-B, and SSD-C, the slowdown of Flow- becomes almost neglgble (.e., ts th USENIX Conference on Fle and Storage Technologes USENIX Assocaton

6 value approaches ) as the ntensty of Flow- ncreases. Fourth, SSD-D lmts the maxmum throughput of each flow, and thus the negatve mpact of Flow- on the performance of Flow- s well controlled. Further experments wth a hgher number of flows reveal that one flow cannot explot more than a quarter of the full I/O bandwdth of SSD-D, ndcatng that SSD-D has some level of nternal farness control. In contrast, one flow can unfarly explot the full I/O capabltes of the other three SSDs. We conclude that () the relatve ntensty of each flow sgnfcantly mpacts the throughput delvered to each flow; and () MQ-SSDs wth farness controls, such as SSD-D, perform dfferently from MQ-SSDs wthout farness controls when the relatve ntenstes of concurrently-runnng flows dffer. Thus, to accurately model the performance of MQ-SSDs, an SSD smulator needs to model multple queues and enable multple concurrently-runnng flows. 3. Steady-State Behavor SSD performance evaluaton standards explctly clarfy that the SSD performance should be reported n the steady state [7]. As a consequence, pre-condtonng (.e., quckly reachng steady state) s an essental requrement for SSD devce performance evaluaton, n order to ensure that the results are collected n the steady state. Ths polcy s mportant for three reasons. Frst, the garbage collecton (GC) actvtes are nvoked only when the devce has performed a certan number of wrtes, whch causes the number of free pages n the SSD to drop below the GC threshold. GC actvtes nterfere wth user I/O actvty and can sgnfcantly affect the sustaned devce performance. However, a fresh out-ofthe-box (FOB) devce s unlkely to execute GC. Hence, performance results on an FOB devce are unrealstc as they would not account for GC [7]. Second, the steadystate benefts of the wrte cache may be lower than the short-term benefts, partcularly for wrte-heavy workloads. More precsely, n the steady state, the wrte cache s flled wth applcaton data and warmed up, and t s hghly lkely that no free slot can be allocated to new wrte requests. Ths leads to cache evctons and ncreased flash wrte traffc n the back end [33]. Thrd, the physcal data placement of currently-runnng applcatons s hghly dependent on the devce usage hstory and the data placement of prevous processes. For example, whch physcal pages are currently free n the SSD depends on how prevous I/O requests wrote to and n- Based on the SNIA defnton [7], a devce s n the steady state f ts performance varaton s lmted to a determnstc range. Total Wrte Volume (GB) proj- proj-3 proj- proj- proj- prn- prn- mds- mds- tpcc tpce hm- hm- rad-ps rad-be msncfs dev exchange msnfs wsrch-3 wsrch- fn fnwsrch- valdated physcal pages. As a result, channel- and chplevel parallelsm n SSDs s lmted n the steady state. Although a number of works do successfully precondton and smulate steady-state behavor, many prevous studes have not explored the effect of steady-state behavor on ther proposals. Instead, ther smulatons start wth an FOB SSD, and never reach steady state (e.g., when each physcal page of the SSD has been wrtten to at least once). Most well-known storage traces are not large enough to fll the entre storage space of a modern SSD. Fgure 3 shows the total wrte volume of popular storage workloads [6, 53 55, 6]. We observe that most of the workloads have a total wrte volume that s much smaller than the storage capacty of most SSDs, wth an average wrte volume of 6 GB. Even for the few workloads that are large enough to fll the SSD, t s tme consumng for many exstng smulators to smulate each I/O request and reach steady state (see Secton 5). Therefore, t s crucal to have a smulator that enables effcent and hgh-performance steady-state smulaton of SSDs. 3.3 Real End-to-End Latency Request latency s a crtcal factor of MQ-SSD performance, snce t affects how long an applcaton stalls on an I/O request. The end-to-end latency of an I/O request, from the tme t s nserted nto the host submsson queue to the tme the response s sent back from the MQ-SSD devce to the completon queue, ncludes seven dfferent parts, as we show n Fgure. Exstng smulaton tools model only some parts of the end-to-end latency, whch are usually consdered to be the domnant parts of the end-to-end latency [3, 6, 7, 35, 3]. Fgure a llustrates the end-to-end latency dagram for a small kb read request n a typcal NAND flashbased MQ-SSD. It ncludes I/O job enqueung n the submsson queue (SQ), host-to-devce I/O job transfer over the PCIe bus, address translaton and transacton schedulng n the FTL 3, read command and address transfer to the flash chp, flash chp read 5, read data transfer over the Open NAND Flash Interface (ONFI) [65] bus 6, and devce-to-host read data transfer over the PCIe bus 7. Steps 5 and 6 are assumed to be the most tme-consumng parts n the end-to-end request processng. Consderng typcal latency values for an kb page read operaton, the I/O job nserton (< µs, as measured on our real SSDs), the FTL request processng on a multcore processor ( µs) [7] (assumng a mappng table cache ht), and the I/O job and data transfer over the PCIe bus ( µs) [, 6] make neglgble contrbutons compared to the flash read (5- µs) [9, 5, 5, 69] and the ONFI NV-DDR [65] flash transfer ( µs). However, the above assumpton s unrealstc due to two major reasons. Frst, for some I/O requests, FTL re- 7 9 src- src- src- src- src- rsrsch- rsrch- rsrch- prxy- prxy- Mean webdev-3 webdev- webdev- webdev- web-3 web- web- web- usr- ts- usr- usr- stg- stg- src- Fgure 3: Total amount of data wrtten by commonly-used storage workloads [6, 53 55, 6]. USENIX Assocaton th USENIX Conference on Fle and Storage Technologes 53

7 I/O job Xfer over PCIe Request 3 processng Read request Xfer to chp 5 Flash read (TFlash Read) Enqueue I/O job n the SQ NAND flash Chp Hghest contrbuton to end-to-end latency Response data Xfer over PCIe 6 ONFI data Xfer (TONFI Xfer) tme 7 In summary, a detaled, realstc model of end-to-end latency s key for accurate smulaton of modern SSD devces wth () multple I/O flows that can potentally lead to a sgnfcant ncrease n CMT (cached mappng table) msses, and () very-fast NVM technologes such as 3D XPont that greatly reduce raw memory read/wrte latences. Exstng smulaton tools do not provde accurate performance results for such devces. Modelng a Modern MQ-SSD wth MQSm (a) NAND flash memory User Applcaton Host Memory MQ-SSD HIL MQ-SSD Frmware 3D Xpont Chp I/O job Xfer over PCIe Hghest contrbuton to end-to-end latency Request Read request processng 3 Xfer to chp 5 6 tme Fast data Xfer (TFast Xfer) 7 Response data Xfer over PCIe 3D Xpont read (T3DXpont Read) Enqueue I/O job n the SQ (b) 3D XPont memory Fgure : Tmng dagram for a kb read request n (a) NAND-flash and (b) 3D XPont MQ-SSDs. quest processng may not always be neglgble, and can even become comparable to the flash read access tme. For example, pror work [6] shows that f the FTL uses page-level address mappng, then a workload wthout localty ncurs a large number of msses n the cached mappng table (CMT). In case of a mss n the CMT, the user read operaton stalls untl the mappng data s read from the SSD back end and transferred to the front end []. Ths can lead to a substantal ncrease n the latency of Step 3 n Fgure a, whch can become even longer than the combned latency of Steps 5 and 6. In an MQ-SSD, as a greater number of I/O flows execute concurrently, there s more contenton for the CMT, leadng to a larger number of CMT msses. Second, as shown n Fgure b, cuttng-edge nonvolatle memory technologes, such as 3D XPont [7, 9,, ], dramatcally reduce the access and data transfer tmes of the MQ-SSD back end, by as much as three orders of magntude compared to that of NAND flash [5,,, 3]. The total latency of the 3D XPont read and transfer (< µs) contrbutes less than % to the end-to-end I/O request processng latency (< µs) [7, ]. In ths case, a conventonal smulaton tool would be naccurate, as t does not model the major steps contrbutng to the end-to-end latency. To our knowledge, there s no SSD modelng tool that supports mult-queue I/O executon, fast and effcent modelng of the SSD s steady-state behavor, and a full end-to-end request latency estmaton. In ths work, we present MQSm, a new smulaton framework that s developed from scratch to support all of these three mportant features that are requred for accurate performance modelng and desgn space exploraton of modern MQ-SSDs. Although manly desgned for MQ-SSD smulaton, MQSm also supports smulaton of the conventonal SATA-based SSDs that mplement natve command queung (NCQ). Our new smulator models all of the components shown n Fgure, whch exst n modern SSDs. Table provdes a quck comparson between MQSm and prevous SSD smulators. MQSm s a dscrete-event smulator wrtten n C++ and s released under the permssve MIT Lcense []. Fgure 5 depcts a hgh-level vew of MQSm s man components and ther nteracton. In ths secton, we brefly descrbe these components and explan ther novel features wth respect to the prevous smulators. Front end Back end Host Interface Request Fetch Unt Input Stream Manager Data Cache Manager FTL Address Mappng Unt Cached Mappng Table Flash Block Manager GC and WL Unt NVM Chp MQ-SSD Frmware NVM Channel MQ-SSD HIL NVM PHY Host Memory Transacton Schedulng Unt (TSU) User Applcaton Fgure 5: Hgh-level vew of MQSm components.. SSD Back End Model MQSm provdes a smple yet detaled model of the flash memory chps. It consders three major latency components of the SSD back end: () address and command transfer to the memory chp; () flash memory read/ Table : A quck comparson between MQSm and exstng SSD modelng tools. Tool Mult-Queue Support Precondtonng End-to-end Latency MQSm Mult-queue schedulng and prortzaton Exstng Tools Not supported Fast and automatc (enabled by default) Manual, optonal, and long executon tme Detaled model of the end-to-end latency Mssng some constant- or varable-latency components 5 th USENIX Conference on Fle and Storage Technologes Bult-n Implementaton of SSD Components All major components that exst n modern SSDs Implementaton s mssng for some major components USENIX Assocaton

8 wrte executon for dfferent technologes that store,, or 3 bts per cell []; and (3) data transfer to/from memory chps. MQSm s flash model consders the constrants of de- and plane-level parallelsm, and advanced command executon [65]. One mportant new feature of MQSm s that t can be confgured or easly modfed to smulate new NVM chps (e.g., those that do not need erase-before-wrte). Due to decouplng of the NVM chp communcaton nterface from the chp s nternal mplementaton of the memory operatons, one can modfy the NVM chp of MQSm wthout the need to change the mplementaton of the other MQSm components. Another new feature of MQSm s that t decouples the szes of read and wrte operatons. Ths feature helps to explot large page szes of modern flash memory chps n that can enable better wrte performance, whle preventng the negatve effects of large page szes on read performance. For flash chp wrtes, the operaton s always page-szed [, ]. MQSm s data cache controller can delay wrtes to elmnate wrte-back of partally-updated logcal pages (where the update sze s smaller than the physcal page sze). When a partally-updated logcal page should be wrtten back to the flash storage, the unchanged sub-pages (sectors) of the logcal page are frst read from the physcal page that stores page data. Then, unchanged and updated peces of the page are merged. In the last step, the entre page data s wrtten to a new free physcal page. For flash chp reads, the operaton could be smaller than the physcal page sze. When a read operaton fnshes, only the data peces that are requested n the I/O request are transferred from flash chps to the SSD controller, avodng the data transfer overhead of large physcal pages.. SSD Front End Model The front end model of MQSm ncludes all of the basc components of a modern SSD controller and provdes many new features that do not exst n prevous SSD modelng tools... Host Interface Model The host nterface component of MQSm provdes both NVMe mult-queue (MQ) and SATA natve command queue models for a modern SSD. To our knowledge, MQSm s the frst modelng tool that supports MQ I/O request processng. There s a request fetch unt wthn the host nterface of MQSm that fetches and schedules applcaton I/O requests from dfferent nput queues. The NVMe host nterface provdes users wth a parameter, called QueueFetchSze, that can be used to tune the behavor of the request fetch unt, n order to accurately model the behavor of real MQ-SSDs. Ths parameter defnes the maxmum number of I/O requests from each SQ that can be concurrently servced n the MQ-SSD. More precsely, at any gven tme, the number of I/O requests that are fetched from a host SQ to the devce-level queue s always less than or equal to QueueFetchSze. Ths parameter has a large mpact on the MQ-SSD multflow request processng characterstcs dscussed n Secton 3. (.e., on maxmum achevable throughput per I/O flow and probablty of nter-flow nterference). Appendx A.3 analyzes the effect of ths parameter on performance. MQSm also models dfferent prorty classes for hostsde request queues, whch are part of the NVMe standard specfcaton [63]... Data Cache Manager MQSm has a data cache manager component that mplements a DRAM-based cache wth the least-recentlyused (LRU) replacement polcy. The DRAM cache can be confgured to cache () recently-wrtten data (default mode), () recently-read data, or (3) both recentlywrtten and recently-read data. A new feature of MQSm s cache manager, compared to prevous SSD modelng tools, s that t mplements a DRAM access model n whch the contenton among the concurrent accesses to DRAM chps and the latency of DRAM commands are consdered. The DRAM cache models n MQSm can be extended to make use of detaled and fast DRAM smulators, such as Ramulator [, 39], to perform detaled studes of the effect of DRAM cache performance on the overall MQ-SSD performance. We leave ths to future work...3 FTL Components MQSm mplements all the man FTL components, ncludng () the address translaton unt, () the garbage collecton (GC) and wear-levelng (WL) unt, and (3) the transacton schedulng unt. MQSm provdes dfferent optons for each of these components, ncludng state-ofthe-art address translaton strateges [, 7], GC canddate block selecton algorthms [,, 3, 5,, 9], and transacton schedulng schemes [3, 7]. MQSm also mplements several state-of-the-art GC and flash management mechansms, ncludng preemptble GC I/O schedulng [], ntra-plane data movement from one physcal page to another physcal page usng copyback read and wrte command pars [7], and program/erase suspenson [7] to reduce the nterference of GC operatons wth applcaton I/O requests. One novel feature of MQSm s that all of ts FTL components support mult-flow (.e., mult-nput queue) request processng. For example, the address mappng unt can partton the cached mappng table space among the concurrently runnng flows. Ths nherent support of mult-queueaware request processng facltates the desgn space exploraton of performance solaton and QoS schemes for MQ-SSDs..3 Modelng End-to-End Latency In addton to the flash operaton and nternal data transfer latency (steps 3,, 5, and 6 n Fgure ), there s a mx of varable and constant latences that MQSm models to determne the end-to-end request latency. Varable Latences. These are the varable request processng tmes n FTL that result from contenton n the cached mappng table and the DRAM wrte cache. Dependng on the request type (ether read or wrte) and the request s logcal address, the request processng tme n FTL ncludes some of the followng tems: () the tme requred to read/wrte from/to the data cache, and () the USENIX Assocaton th USENIX Conference on Fle and Storage Technologes 55

9 tme to fetch mappng data from flash storage n case of a mss n the cached address mappng table. Constant Latences. These nclude the tmes requred to transmt the I/O job nformaton, the entre user data, and the I/O completon nformaton over the PCIe bus, and the frmware (FTL) executon tme on the controller s mcroprocessor. The PCIe transmsson latences are calculated based on a smple packet latency model provded by Xlnx [] that consders: () the PCIe communcaton bandwdth, () the payload and header szes of the PCIe Transacton Layer Packets (TLP), (3) the sze of the NVMe management data structures, and d) the sze of the applcaton data. The frmware executon tme s estmated usng both a CPU and cache latency model [].. Modelng Steady-State Behavor The basc assumpton of MQSm s that all smulatons should be executed when the modeled devce s n steady state. To model the steady-state behavor, MQSm, by default, automatcally executes a precondtonng functon before startng the actual smulaton process. Ths functon performs precondtonng n a short tme (e.g., less than mn when runnng tpcc [53] on an GB MQ- SSD) wthout the need to execute addtonal I/O requests. Durng precondtonng, all avalable physcal pages of the modeled SSD are transtoned to ether a vald or nvald state, based on the steady-state vald/nvald page dstrbuton model provded n [] (only very few flash blocks are assumed to reman free and are added to the free block pool). MQSm pre-processes the nput trace to extract the LPA (logcal page address) access characterstcs of the applcaton I/O requests n the trace, and then uses the extracted nformaton as nputs to the vald/nvald page dstrbuton model. In addton, nput trace characterstcs, such as the average wrte arrval rate and the dstrbuton of wrte addresses, are used to warm up the wrte cache..5 Executon Modes MQSm provdes two modes of operaton: () standalone mode, where t s fed a real dsk trace or a synthetc workload, and () ntegrated mode, where t s fed dsk requests from an executon-drven engne (e.g., gem5 []). 5 Comparson wth Prevous Smulators The ncreasng usage of SSDs n modern computng systems has boosted nterest n SSD desgn space exploraton. To ths end, several smulators have been developed n recent years. Table summarzes the features of MQSm and popular exstng SSD modelng tools. The table also shows the average error rates for the performance of real storage workloads reported by each smulator, compared to the performance measured on four real MQ-SSDs (see Appendx A. for our methodology). Exstng tools ether do not model some major components of modern SSDs or provde very smplstc component models that lead to unrealstc I/O request latency estmaton. In contrast, MQSm provdes detaled mplementatons for all of the major components of modern SSDs. MQSm s wrtten n C++ and has 3K lnes of code (LOC). Next, we dscuss the man advantages of MQSm compared to the prevous tools. Host Interface Logc. As Table shows, most of the exstng smulators assume a very smplstc HIL model wth no explct management mechansm for the I/O request queue. Ths leads to an unrealstc SSD model regardng the requrements of both NVMe and SATA protocols. As we menton n Secton 3, the concurrent executon of I/O flows presents many challenges for performance predctablty and farness n MQ-SSDs. No ex- Table : Comparson of MQSm wth prevous SSD modelng tools. Smulator HIL Protocol Executon Mode End-to-End Latency Front-End Components Smulaton Error (%) NVMe SATA MQ NCQ Alone Full Emul 3 Prec NVM R/W 5 NVM Xfer FTL Proc 6 Cache Acc. 7 Host Xfer Map P 9 Map H GC Wrte Cache TSU WRL MQ FTL 3 LOC SSD-A SSD-B SSD-C SSD-D MQSm 3K 6 SSDModel [3] K FlashSm [3] K SSDSm [7] 5K NANDFlashSm [] 7K VSSIM [9] 6K WscSm [6] 7K SmpleSSD [35] 7K Standalone executon Integrated executon wth full-system smulator 3 SSD emulaton for real system Fast and accurate precondtonng of the modeled SSD to enable accurate steady-state results 5 Flash (NVM) read/wrte tmng 6 FTL request processng overhead 7 Accurate modelng of wrte cache access latency Host-to-devce and devce-to-host data transfer delay 9 Page-level address mappng Hybrd address mappng FTL transacton schedulng unt FTL wear-levelng unt 3 Bult-n support for mult-queue-aware request processng n FTL Lnes of source code 56 th USENIX Conference on Fle and Storage Technologes USENIX Assocaton

10 stng smulator mplements NVMe and mult-queue I/O request management, and, hence, accurately models the behavor of MQ-SSDs. Also, except for WscSm, we fnd that no exstng smulator mplements an accurate model of the SATA protocol and NCQ request processng. Ths leads to unrealstc SATA devce smulaton, as NCQ-based I/O schedulng plays a key role n the performance of real SSD devces [5, 6]. Steady-State Smulaton. To our knowledge, accurate and fast steady-state behavor modelng s not provded by many exstng SSD modelng tools. Among the tools lsted n Table, only SSDSm provdes a functon, called make aged, to change the status of a set of physcal pages to vald before startng the actual executon of an nput trace. Ths smple method cannot accurately replcate the steady-state behavor of an SSD for two reasons. Frst, after the executon of make aged, the physcal blocks would nclude only vald pages or only free pages. Ths s far from the steady-state status of blocks n real devces, where each non-free block has a mx of vald and nvald pages [,, ]. Second, the steadystate status of the data cache s not modeled,.e., the smulaton starts wth a completely empty wrte cache. In general, t s possble to brng these smulators to steady state. However, there s no fast pre-condtonng support for them, and pre-condtonng must be performed by executng traces. Precondtonng an exstng smulator requres users to generate traces wth a large enough number of I/O requests, and can sgnfcantly slow down the smulator, especally when a hghcapacty SSD s modeled. For example, our studes wth SSDSm show that pre-condtonng may ncrease the smulaton tme up to x f an GB SSD s modeled. 3 Detaled End-to-End Latency Model. As descrbed n Secton 3.3, the end-to-end latency of an applcaton I/O request ncludes dfferent components. Table shows that latency modelng n exstng smulators s manly focused on the latency of the flash chp operaton and the SSD nternal data transfer. As we explan n Secton 3.3, ths s an unrealstc model of the end-to-end I/O request processng latency, even for a conventonal SSD. To study the accuracy of the exstng tools n modelng real devces, we create four models for the four real SSDs shown n Table n each smulator, and execute three real traces,.e., tpcc, tpce, and exchange. We exclude the smulators that do not support trace-based executon. The four rghtmost columns of Table show the average error rate of each smulator n modelng the performance (.e., read and wrte latency) of these four real devces. The error rates of the four evaluated smulators are almost one order of magntude hgher than that of MQSm. We beleve that these hgh error rates are due to four major reasons: () the lack of wrte cache or naccurate modelng of the wrte cache access latency, () the lack of bult-n support for steady-state modelng, (3) ncomplete modelng of the request processng latency n FTL, and () the lack of modelng of the host-to-devce communcaton latency. 3 The ncrease n smulaton tme depends on the access pattern, ntensty, and mx of I/O requests (read vs. wrte) of the workload. 6 Research Drectons Enabled by MQSm MQSm s a flexble smulaton tool that enables dfferent studes on both modern and conventonal SSD devces. In ths secton, we dscuss two new research drectons enabled by MQSm, whch could not be explored easly usng exstng smulaton tools. Frst, we use MQSm to perform a detaled analyss of nter-flow nterference n a modern MQ-SSD (Secton 6.). We explan how sharng dfferent nternal resources n an MQ- SSD, such as the wrte cache, cached mappng table, and back end resources, can ntroduce farness ssues. Second, we explan how the full-system smulaton mode of MQSm can enable detaled applcaton-level studes (Secton 6.). 6. Desgn Space Exploraton of Farness and QoS Technques for MQ-SSDs As we descrbe n Secton, farness and QoS should be consdered as frst-class desgn crtera for modern datacenter SSDs. MQSm provdes an accurate framework to study nter-flow nterference, thus enables the ablty to devse nterference-aware MQ-SSD management algorthms for sharng of the nternal MQ-SSD resources. As we show n Secton 3., concurrently runnng two I/O flows mght lead to dsproportonate slowdowns for each flow, greatly degradng farness and proportonal progress. Ths s partcularly mportant n hgh-end SSD devces, whch provde hgher throughput per I/O flow, as we show n Appendx A.3. We fnd that ths nter-flow nterference s manly the result of contenton that takes place at three locatons n an MQ-SSD: ) the wrte cache n the front end, ) the cached mappng table (CMT) n the front end, and 3) the storage resources n the back end. In ths secton, we use MQSm to explore the mpact of these three ponts of contenton on performance and farness, whch cannot be explored accurately usng exstng smulators. 6.. Methodology MQ-SSD Confguraton. Table 3 lsts the specfcaton of the MQ-SSD that we model n MQSm for our contenton studes. Metrcs. To measure performance, we use weghted speedup (WS) [7] of the average response tme (RT), whch represents the overall effcency and system-level Table 3: Confguraton of the smulated SSD. SSD Organzaton Flash Communcaton Interface Flash Mcroarchtecture Flash Access Parameters Host nterface: PCIe 3. (NVMe.) User capacty: GB Wrte cache: 56 MB, CMT: MB channels, chps per channel QueueFetchSze = 5 ONFI 3. (NV-DDR) Wdth: bt, Rate: 333 MT/s KB page, B metadata per page, 56 pages per block, blocks per plane, planes per de Read latency: 75 µs, Program latency: 75 µs, Erase latency: 3. ms USENIX Assocaton th USENIX Conference on Fle and Storage Technologes 57

11 throughput [] provded by an MQ-SSD durng the concurrent executon of multple flows: WS = RT alone RT shared where RT alone and RT shared are defned n Secton 3.. To demonstrate the effect of nter-flow nterference on farness, we report slowdown and farness (F) metrcs, as defned n Secton Contenton at the Wrte Cache One pont of contenton among concurrently-runnng flows n an MQ-SSD s the wrte cache. For flows wth low to moderate wrte ntensty (where the average depth of the I/O queue less than ), or wth hgh spatal localty, the wrte cache decreases the response tme of wrte requests, by avodng the need for the requests to wat for the wrte to complete to the underlyng memory. For flows wth hgh wrte ntensty or wth hghlyrandom accesses, the wrte requests fll up the lmted capacty of the wrte cache quckly, causng sgnfcant cache thrashng and lmtng the decrease n wrte request response tme. Such flows not only do not beneft from the wrte cache themselves, but also prevent other lower-wrte-ntensty flows from beneftng from the wrte cache, leadng to a large performance loss for the lower-wrte-ntensty flows. To understand how the contenton at the wrte cache affects system performance and farness, we perform a set of experments where we run two flows, Flow- and Flow-, both of whch perform only random-access wrte requests. In both flows, the average request sze s set to kb. We set Flow- to have a moderate wrte ntensty, by lmtng the queue depth to requests. We vary the queue depth of Flow- from requests to 56 requests, to control the wrte ntensty of the flow. In order to solate the effect of wrte cache nterference n our experments, we () assgn each flow to a dedcated subset of back end resources (.e., Flow- uses Channels, and Flow- uses Channels 5 ), to avod ntroducng any nterference n the back end; and () use a perfect CMT, where all address translaton requests are hts, to avod nterference due to lmted CMT capacty. Fgure 6a shows the slowdown of each flow when the two flows run concurrently, compared to when each flow runs alone. Fgure 6b shows the farness and performance of the system when the two flows run concurrently. We make four key observatons from the fgures. Frst, Flow- s slowed down sgnfcantly when Flow- has a hgh wrte ntensty (.e., ts queue depth s greater than ), ndcatng that at hgh wrte ntenstes, Flow- nduces wrte cache trashng. Second, the slowdown of Flow- s neglgble, because of the low wrte ntensty of Flow-. Thrd, farness degrades greatly, as a result of the wrte cache contenton, when Flow- has a hgh wrte ntensty. Fourth, wrte cache contenton causes an MQ-SSD to be neffcent at concurrently runnng multple I/O flows, as the weghted speedup s reduced by over 5% when Flow- has a hgh wrte ntensty compared to when t has a low wrte ntensty. () 35 Flow Farness Flow (a) of Flow- (left) and Flow- (rght) Weghted Speedup (b) Farness (left) and system performance (rght) Fgure 6: Impact of wrte cache contenton. We conclude that wrte cache contenton leads to unfarness and overall performance degradaton for concurrently-runnng flows when one flow has a hgh wrte ntensty. In these cases, the hgh-wrte-ntensty flow () does not beneft from the wrte cache; and () prevents other, lower-wrte-ntensty flows from takng advantage of the wrte cache, even though the other flows would otherwse beneft from the cache. Ths motvates the need for far wrte cache management algorthms for MQ-SSDs that take nter-flow nterference and flow wrte ntensty nto account Contenton at the Cached Mappng Table As we dscuss n Secton 3.3, address translaton can notceably ncrease the end-to-end latency of an I/O request, especally for read requests. We fnd that for I/O flows wth random access patterns, the cached mappng table (CMT) mss rate s hgh due to poor reuse of address translaton mappngs, whch causes the I/O requests generated by the flow to stall for long perods of tme durng address translaton. Ths s not true for I/O flows wth sequental accesses, for whch the CMT mss rate remans low due to spatal localty. However, when two I/O flows run concurrently, where one flow has a random access pattern and another flow has a sequental access pattern, the poor localty of the flow wth the random access pattern may cause both flows to have hgh CMT mss rates. To understand how contenton at the CMT affects system performance and farness, we perform a set of experments where we concurrently run two flows that ssue read requests wth an average request sze of kb. In these experments, Flow- has a fully-sequental access pattern, and Flow- has a random access pattern for a fracton of the total executon tme, and has a sequental access pattern for the remanng tme. We vary the randomness (.e., the fracton of the executon tme wth a random access pattern) of Flow-. To solate the effects of CMT contenton, we assgn Flow- to Channels n the back end, and assgn Flow- to Channels 5. Fgure 7a shows the slowdown and change n CMT ht rate of each flow when Flow- and Flow- run concur- 5 th USENIX Conference on Fle and Storage Technologes USENIX Assocaton

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices Arash Tavakkol, Juan Gómez-Luna, Mohammad Sadrosadati, Saugata Ghose, Onur Mutlu February 13, 2018 Executive Summary

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Advanced Computer Networks

Advanced Computer Networks Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Scheduling and queue management. DigiComm II

Scheduling and queue management. DigiComm II Schedulng and queue management Tradtonal queung behavour n routers Data transfer: datagrams: ndvdual packets no recognton of flows connectonless: no sgnallng Forwardng: based on per-datagram forwardng

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems

FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems FIRM: Far and Hgh-Performance Memory Control for Persstent Memory Systems Jshen Zhao, Onur Mutlu, Yuan Xe Pennsylvana State Unversty, Carnege Mellon Unversty, Unversty of Calforna, Santa Barbara, Hewlett-Packard

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Memory and I/O Organization

Memory and I/O Organization Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Analysis of Collaborative Distributed Admission Control in x Networks

Analysis of Collaborative Distributed Admission Control in x Networks 1 Analyss of Collaboratve Dstrbuted Admsson Control n 82.11x Networks Thnh Nguyen, Member, IEEE, Ken Nguyen, Member, IEEE, Lnha He, Member, IEEE, Abstract Wth the recent surge of wreless home networks,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs Utlty-Based Acceleraton of Multthreaded Applcatons on Asymmetrc CMPs José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt ECE Department The Unversty of Texas at Austn Austn, TX, USA {joao, patt}@ece.utexas.edu

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals

More information

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7 Optmzed Regonal Cachng for On-Demand Data Delvery Derek L. Eager Mchael C. Ferrs Mary K. Vernon Unversty of Saskatchewan Unversty of Wsconsn Madson Saskatoon, SK Canada S7N 5A9 Madson, WI 5376 eager@cs.usask.ca

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture Goals and Approach CS 194: Dstrbuted Systems Resource Allocaton Goal: acheve predcable performances Three steps: 1) Estmate applcaton s resource needs (not n ths lecture) 2) Admsson control 3) Resource

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

A QoS-aware Scheduling Scheme for Software-Defined Storage Oriented iscsi Target

A QoS-aware Scheduling Scheme for Software-Defined Storage Oriented iscsi Target A QoS-aware Schedulng Scheme for Software-Defned Storage Orented SCSI Target Xanghu Meng 1,2, Xuewen Zeng 1, Xao Chen 1, Xaozhou Ye 1,* 1 Natonal Network New Meda Engneerng Research Center, Insttute of

More information

A fair buffer allocation scheme

A fair buffer allocation scheme A far buffer allocaton scheme Juha Henanen and Kalev Klkk Telecom Fnland P.O. Box 228, SF-330 Tampere, Fnland E-mal: juha.henanen@tele.f Abstract An approprate servce for data traffc n ATM networks requres

More information

A New Transaction Processing Model Based on Optimistic Concurrency Control

A New Transaction Processing Model Based on Optimistic Concurrency Control A New Transacton Processng Model Based on Optmstc Concurrency Control Wang Pedong,Duan Xpng,Jr. Abstract-- In ths paper, to support moblty and dsconnecton of moble clents effectvely n moble computng envronment,

More information

If you miss a key. Chapter 6: Demand Paging Source:

If you miss a key. Chapter 6: Demand Paging Source: ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Source: http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor If you mss a key after yesterday

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction Chapter 2 Desgn for Testablty Dr Rhonda Kay Gaede UAH 2 Introducton Dffcultes n and the states of sequental crcuts led to provdng drect access for storage elements, whereby selected storage elements are

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Space-Optimal, Wait-Free Real-Time Synchronization

Space-Optimal, Wait-Free Real-Time Synchronization 1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Multiple Sub-Row Buffers in DRAM: Unlocking Performance and Energy Improvement Opportunities

Multiple Sub-Row Buffers in DRAM: Unlocking Performance and Energy Improvement Opportunities Multple Sub-Row Buffers n DRAM: Unlockng Performance and Energy Improvement Opportuntes ABSTRACT Nagendra Gulur Texas Instruments (Inda) nagendra@t.com Mahesh Mehendale Texas Instruments (Inda) m-mehendale@t.com

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han UNITY as a Tool for Desgn and Valdaton of a Data Replcaton System Phlppe Quennec Gerard Padou CENA IRIT-ENSEEIHT y Nnth Internatonal Conference on Systems Engneerng Unversty of Nevada, Las Vegas { 14-16

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Shared Running Buffer Based Proxy Caching of Streaming Sessions

Shared Running Buffer Based Proxy Caching of Streaming Sessions Shared Runnng Buffer Based Proxy Cachng of Streamng Sessons Songqng Chen, Bo Shen, Yong Yan, Sujoy Basu Moble and Meda Systems Laboratory HP Laboratores Palo Alto HPL-23-47 March th, 23* E-mal: sqchen@cs.wm.edu,

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Routing in Degree-constrained FSO Mesh Networks

Routing in Degree-constrained FSO Mesh Networks Internatonal Journal of Hybrd Informaton Technology Vol., No., Aprl, 009 Routng n Degree-constraned FSO Mesh Networks Zpng Hu, Pramode Verma, and James Sluss Jr. School of Electrcal & Computer Engneerng

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations Improvng Hgh Level Synthess Optmzaton Opportunty Through Polyhedral Transformatons We Zuo 2,5, Yun Lang 1, Peng L 1, Kyle Rupnow 3, Demng Chen 2,3 and Jason Cong 1,4 1 Center for Energy-Effcent Computng

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network A New Token Allocaton Algorthm for TCP Traffc n Dffserv Network A New Token Allocaton Algorthm for TCP Traffc n Dffserv Network S. Sudha and N. Ammasagounden Natonal Insttute of Technology, Truchrappall,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams Self-Tunng, Bandwdth-Aware Montorng for Dynamc Data Streams Navendu Jan, Praveen Yalagandula, Mke Dahln, Yn Zhang Mcrosoft Research HP Labs The Unversty of Texas at Austn Abstract We present, a self-tunng,

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Page Table Structures Page table structures Page table structures Problem: smple lnear table s too bg Problem:

More information

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

#4 Inverted page table. The need for more bookkeeping. Inverted page table architecture. Today. Our Small Quiz

#4 Inverted page table. The need for more bookkeeping. Inverted page table architecture. Today. Our Small Quiz ADRIAN PERRIG & TORSTEN HOEFLER Networks and Operatng Systems (-6-) Chapter 6: Demand Pagng http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor () # Inverted table One system-wde table

More information

Sample Solution. Advanced Computer Networks P 1 P 2 P 3 P 4 P 5. Module: IN2097 Date: Examiner: Prof. Dr.-Ing. Georg Carle Exam: Final exam

Sample Solution. Advanced Computer Networks P 1 P 2 P 3 P 4 P 5. Module: IN2097 Date: Examiner: Prof. Dr.-Ing. Georg Carle Exam: Final exam Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

WITH rapid improvements of wireless technologies,

WITH rapid improvements of wireless technologies, JOURNAL OF SYSTEMS ARCHITECTURE, SPECIAL ISSUE: HIGHLY-RELIABLE CPS, VOL. 00, NO. 0, MONTH 013 1 Adaptve GTS Allocaton n IEEE 80.15.4 for Real-Tme Wreless Sensor Networks Feng Xa, Ruonan Hao, Je L, Naxue

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Technical Report. i-game: An Implicit GTS Allocation Mechanism in IEEE for Time- Sensitive Wireless Sensor Networks

Technical Report. i-game: An Implicit GTS Allocation Mechanism in IEEE for Time- Sensitive Wireless Sensor Networks www.hurray.sep.pp.pt Techncal Report -GAME: An Implct GTS Allocaton Mechansm n IEEE 802.15.4 for Tme- Senstve Wreless Sensor etworks Ans Koubaa Máro Alves Eduardo Tovar TR-060706 Verson: 1.0 Date: Jul

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A HIERARCHICAL SIMULATION FRAMEWORK FOR APPLICATION DEVELOPMENT ON SYSTEM-ON-CHIP ARCHITECTURES. Vaibhav Mathur and Viktor K.

A HIERARCHICAL SIMULATION FRAMEWORK FOR APPLICATION DEVELOPMENT ON SYSTEM-ON-CHIP ARCHITECTURES. Vaibhav Mathur and Viktor K. A HIERARCHICAL SIMULATION FRAMEWORK FOR APPLICATION DEVELOPMENT ON SYSTEM-ON-CHIP ARCHITECTURES Vabhav Mathur and Vktor K. Prasanna Department of EE-Systems Unversty of Southern Calforna Los Angeles, CA

More information

Quantifying Performance Models

Quantifying Performance Models Quantfyng Performance Models Prof. Danel A. Menascé Department of Computer Scence George Mason Unversty www.cs.gmu.edu/faculty/menasce.html 1 Copyrght Notce Most of the fgures n ths set of sldes come from

More information