FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems

Size: px
Start display at page:

Download "FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems"

Transcription

1 FIRM: Far and Hgh-Performance Memory Control for Persstent Memory Systems Jshen Zhao, Onur Mutlu, Yuan Xe Pennsylvana State Unversty, Carnege Mellon Unversty, Unversty of Calforna, Santa Barbara, Hewlett-Packard Labs Abstract Byte-addressable nonvolatle memores promse a new technology, persstent memory, whch ncorporates desrable attrbutes from both tradtonal man memory (byte-addressablty and fast nterface) and tradtonal storage (data persstence). To support data persstence, a persstent memory system requres sophstcated data duplcaton and orderng control for wrte requests. As a result, applcatons that manpulate persstent memory (persstent applcatons) have very dfferent memory access characterstcs than tradtonal (non-persstent) applcatons, as shown n ths paper. Persstent applcatons ntroduce heavy wrte traffc to contguous memory regons at a memory channel, whch cannot concurrently servce read and wrte requests, leadng to memory bandwdth underutlzaton due to low bank-level parallelsm, frequent wrte queue drans, and frequent bus turnarounds between reads and wrtes. These characterstcs undermne the hgh-performance and farness offered by conventonal memory schedulng schemes desgned for non-persstent applcatons. Our goal n ths paper s to desgn a far and hgh-performance memory control scheme for a persstent memory based system that runs both persstent and non-persstent applcatons. Our proposal, FIRM, conssts of three key deas. Frst, FIRM categorzes request sources as non-ntensve, streamng, random and persstent, and forms batches of requests for each source. Second, FIRM strdes persstent memory updates across multple banks, thereby mprovng bank-level parallelsm and hence memory bandwdth utlzaton of persstent memory accesses. Thrd, FIRM schedules read and wrte request batches from dfferent sources n a manner that mnmzes bus turnarounds and wrte queue drans. Our detaled evaluatons show that, compared to fve prevous memory scheduler desgns, FIRM provdes sgnfcantly hgher system performance and farness. Index Terms memory schedulng; persstent memory; farness; memory nterference; nonvolatle memory; data persstence 1. INTRODUCTION For decades, computer systems have adopted a two-level storage model consstng of: 1) a fast, byte-addressable man memory that temporarly stores applcatons workng sets, whch s lost on a system halt/reboot/crash, and 2) a slow, block-addressable storage devce that permanently stores persstent data, whch can survve across system boots/crashes. Recently, ths tradtonal storage model s enrched by the new persstent memory technology a new ter between tradtonal man memory and storage wth attrbutes from both [2, 9, 47, 54, 59]. Persstent memory allows applcatons to perform loads and stores to manpulate persstent data, as f they are accessng tradtonal man memory. Yet, persstent memory s the permanent home of persstent data, whch s protected by versonng (e.g., loggng and shadow updates) [17, 54, 88, 90] and wrte-order control [17, 54, 66], borrowed from databases and fle systems to provde consstency of data, as f data s stored n tradtonal storage devces (.e., hard dsks or flash memory). By enablng data persstence n man memory, applcatons can drectly access persstent data through a fast memory nterface wthout pagng data blocks n and out of slow storage devces or performng context swtches for page faults. As such, persstent memory can dramatcally boost the performance of applcatons that requre hgh relablty demand, such as databases and fle systems, and enable the desgn of more robust systems at hgh performance. As a result, persstent memory has recently drawn sgnfcant nterest from both academa and ndustry [1, 2, 16, 17, 37, 54, 60, 66, 70, 71, 88, 90]. Recent works [54, 92] even demonstrated a persstent memory system wth performance close to that of a system wthout persstence support n memory. Varous types of physcal devces can be used to buld persstent memory, as long as they appear byte-addressable and nonvolatle to applcatons. Examples of such byte-addressable nonvolatle memores (BA-NVMs) nclude spn-transfer torque RAM (STT- MRAM) [31, 93], phase-change memory (PCM) [75, 81], resstve random-access memory (ReRAM) [14, 21], battery-backed DRAM [13, 18, 28], and nonvolatle dual n-lne memory modules (NV-DIMMs) [89]. 1 As t s n ts early stages of development, persstent memory especally serves applcatons that can beneft from reducng storage (or, persstent data) access latency wth relatvely few or lghtweght changes to applcaton programs, system software, and hardware [10]. Such applcatons nclude databases [90], fle systems [1, 17], keyvalue stores [16], and persstent fle caches [8, 10]. Other types of applcatons may not drectly beneft from persstent memory, but can stll use BA-NVMs as ther workng memory (nonvolatle man memory wthout persstence) to leverage the benefts of large capacty and low stand-by power [45, 73]. For example, a large number of recent works am to ft BA-NVMs as part of man memory n the tradtonal two-level storage model [22, 23, 33, 45, 46, 57, 72, 73, 74, 91, 94]. Several very recent works [38, 52, 59] envson that BA-NVMs can be smultaneously used as persstent memory and workng memory. In ths paper, we call applcatons leveragng BA- NVMs to manpulate persstent data as persstent applcatons, and those usng BA-NVMs solely as workng memory as non-persstent applcatons. 2 Most pror work focused on desgnng memory systems to accommodate ether type of applcatons, persstent or non-persstent. Strkngly lttle attenton has been pad to study the cases when these two types of applcatons concurrently run n a system. Persstent applcatons requre the memory system to support crash consstency, or the persstence property, typcally supported n tradtonal storage systems. Ths property guarantees that the system s data wll be n a consstent state after a system or applcaton crash, by ensurng that persstent memory updates are done carefully such that ncomplete updates are recoverable. Dong so requres data duplcaton and careful control over the orderng of wrtes arrvng at memory (Secton 2.2). The sophstcated desgns to support persstence lead to new memory access characterstcs for persstent applcatons. In partcular, we fnd that these applcatons have very hgh wrte ntensty and very low memory bank parallelsm due to frequent streamng wrtes to persstent data n memory (Secton 3.1). These characterstcs lead to substantal resource contenton between reads and wrtes at the 1 STT-MRAM, PCM, and ReRAM are collectvely called nonvolatle random-access memores (NVRAMs) or storage-class memores (SCMs) n recent studes [1, 52, 90] 2 A system wth BA-NVMs may also employ volatle DRAM, controlled by a separate memory controller [22, 57, 73, 74, 91]. As we show n ths paper, sgnfcant resource contenton exsts at the BA-NVM memory nterface of persstent memory systems between persstent and non-persstent applcatons. e do not focus on the DRAM nterface.

2 shared memory nterface for a system that concurrently runs persstent and non-persstent applcatons, unfarly slowng down ether or both types of applcatons. Prevous memory schedulng schemes, desgned solely for non-persstent applcatons, become neffcent and lowperformance under ths new scenaro (Secton 3.2). e fnd that ths s because the heavy wrte ntensty and low bank parallelsm of persstent applcatons lead to three key problems not handled well by past schemes: 1) frequent wrte queue drans n the memory controller, 2) frequent bus turnarounds between reads and wrtes, both of whch lead to wasted cycles on the memory bus, and 3) low memory bandwdth utlzaton durng wrtes to memory due to low memory bank parallelsm, whch leads to long perods durng whch memory reads are delayed (Secton 3). Our goal s to desgn a memory control scheme that acheves both far memory access and hgh system throughput n a system concurrently runnng persstent and non-persstent applcatons. e propose FIRM, a far and hgh-performance memory control scheme, whch 1) mproves the bandwdth utlzaton of persstent applcatons and 2) balances the bandwdth usage between persstent and non-persstent applcatons. FIRM acheves ths usng three components. Frst, t categorzes memory request sources as non-ntensve, streamng, random and persstent, to ensure far treatment across dfferent sources, and forms batches of requests for each source n a manner that preserves row buffer localty. Second, FIRM strdes persstent memory updates across multple banks, thereby mprovng bank-level parallelsm and hence memory bandwdth utlzaton of persstent memory accesses. Thrd, FIRM schedules read and wrte request batches from dfferent sources n a manner that mnmzes bus turnarounds and wrte queue drans. Compared to fve prevous memory scheduler desgns, FIRM provdes sgnfcantly hgher system performance and farness. Ths paper makes the followng contrbutons: e dentfy new problems related to resource contenton at the shared memory nterface when persstent and non-persstent applcatons concurrently access memory. The key fundamental problems, caused by memory access characterstcs of persstent applcatons, are: 1) frequent wrte queue drans, 2) frequent bus turnarounds, both due to hgh memory wrte ntensty, and 3) memory bandwdth underutlzaton due to low memory wrte parallelsm. e descrbe the neffectveness of pror memory schedulng desgns n handlng these problems. (Secton 3) e propose a new strded wrtng mechansm to mprove the bank-level parallelsm of persstent memory updates. Ths technque mproves memory bandwdth utlzaton of memory wrtes and reduces the stall tme of non-persstent applcatons read requests. (Secton 4.3) e propose a new persstence-aware memory schedulng polcy between read and wrte requests of persstent and non-persstent applcatons to mnmze memory nterference and reduce unfar applcaton slowdowns. Ths technque reduces the overhead of swtchng the memory bus between reads and wrtes by reducng bus turnarounds and wrte queue drans. (Secton 4.4) e comprehensvely compare the performance and farness of our proposed persstent memory control mechansm, FIRM, to fve pror memory schedulers across a varety of workloads and system confguratons. Our results show that 1) FIRM provdes the hghest system performance and farness on average and for all evaluated workloads, 2) FIRM s benefts are robust across system confguratons, 3) FIRM mnmzes the bus turnaround overhead present n pror scheduler desgns. (Secton 7) 2. BACKGROUND In ths secton, we provde background on exstng memory schedulng schemes, the prncples and mechancs of persstent memory, and the memory requests generated by persstent applcatons Conventonal Memory Schedulng Mechansms A memory controller employs memory request buffers, physcally or logcally separated nto a read and a wrte queue, to store the memory requests watng to be scheduled for servce. It also utlzes a memory scheduler to decde whch memory request should be scheduled next. A large body of prevous work developed varous memory schedulng polces [7, 26, 27, 34, 41, 42, 48, 49, 61, 62, 63, 64, 65, 67, 76, 77, 84, 85, 95]. Tradtonal commodty systems employ a varant of the frst-ready frst-come-frst-serve (FR-FCFS) schedulng polcy [76, 77, 95], whch prortzes memory requests that are row-buffer hts over others and, after that, older memory requests over others. Because of ths, t can unfarly deprortze applcatons that have low buffer ht rate and that are not memory ntensve, hurtng both farness and overall system throughput [61, 64]. Several desgns [41, 42, 63, 64, 65, 67, 84, 85] am to mprove ether system performance or farness, or both. PAR-BS [65] provdes farness and starvaton freedom by batchng requests from dfferent applcatons based on ther arrval tmes and prortzng the oldest batch over others. It also mproves system throughput by preservng the bank-level parallelsm of each applcaton va the use of rankbased schedulng of applcatons. ATLAS [41] mproves system throughput by prortzng applcatons that have receved the least memory servce. However, t may unfarly deprortze and slow down memory-ntensve applcatons due to the strct rankng t employs between memory-ntensve applcatons [41, 42]. To address ths ssue, TCM [42] dynamcally classfes applcatons nto two clusters, low and hgh memory-ntensty, and employs heterogeneous schedulng polces across the clusters to optmze for both system throughput and farness. TCM prortzes the applcatons n the lowmemory-ntensty cluster over others, mprovng system throughput, and shuffles thread rankng between applcatons n the hgh-memoryntensty cluster, mprovng farness and system throughput. hle shown to be effectve n a system that executes only non-persstent applcatons, unfortunately, none of these schedulng schemes address the memory request schedulng challenges posed by concurrentlyrunnng persstent and non-persstent applcatons, as we dscuss n Secton 3 and evaluate n detal n Secton Persstent Memory Most persstent applcatons stem from tradtonal storage system workloads (databases and fle systems), whch requre persstent memory [1, 2, 16, 17, 88, 90, 92] to support crash consstency [6],.e., the persstence property. The persstence property guarantees that the crtcal data (e.g., database records, fles, and the correspondng metadata) stored n nonvolatle devces retans a consstent state n case of power loss or a program crash, even when all the data n volatle devces may be lost. Achevng persstence n BA-NVM s nontrval, due to the presence of volatle processor caches and memory wrte reorderng performed by the wrte-back caches and memory controllers. For nstance, a power outage may occur whle a persstent applcaton s nsertng a node to a lnked lst stored n BA-NVM. Processor caches and memory controllers may reorder 3 The recently developed BLISS scheduler [84] was shown to be more effectve than TCM whle provdng low cost. Even though we do not evaluate BLISS, t also does not take nto account the nature of nterference caused by persstent applcatons.

3 the wrte requests, wrtng the ponter nto BA-NVM before wrtng the values of the new node. The lnked lst can lose consstency wth danglng ponters, f values of the new node remanng n processor caches are lost due to power outage, whch may lead to unrecoverable data corrupton. To avod such nconsstency problems, most persstent memory desgns borrow the ACID (atomcty, consstency, solaton, and durablty) concepts from the database and fle system communtes [17, 54, 88, 90, 92]. Enforcng these concepts, as explaned below, leads to addtonal memory requests, whch affect the memory access behavor of persstent applcatons. Versonng and rte Orderng. hle durablty can be guaranteed by BA-NVMs non-volatle nature, atomcty and consstency are supported by storng multple versons of the same pece of data and carefully controllng the order of wrtes nto persstent memory (please refer to pror studes for detals [17, 54, 88, 90, 92]). Fgure 1 shows a persstent tree data structure as an example to llustrate the dfferent methods to mantan versons and orderng. Assume nodes N 3 and N 4 are updated. e dscuss two commonly-used methods to mantan multple versons and orderng. The frst one s redo loggng [16, 90]. th ths method, new values of the two nodes, along wth ther addresses, are wrtten nto a log (logn 3 and logn 4) before ther orgnal locatons are updated n memory (Fgure 1(a)). If a system loses power before loggng s completed, persstent memory can always recover, usng the ntact orgnal data n memory. A memory barrer s employed between the wrtes to the log and wrtes to the orgnal locatons n memory. Ths orderng control, wth enough nformaton kept n the log, ensures that the system can recover to a consstent state even f t crashes before all orgnal locatons are updated. The second method, llustrated n Fgure 1(b), s the noton of shadow updates (copy-on-wrte) [17, 88]. Instead of storng logs, a temporary data buffer s allocated to store new values (shadow copes) of the nodes. Note that the parent node N 1 s also shadow-coped, wth the new ponter N 1 pontng to the shadow copes N 3 and N 4. Orderng control (shown as a memory barrer n Fgure 1(b)) ensures that the root ponter s not updated untl wrtes to the shadow copes are completed n persstent memory. (Persstent wrtes are updates of and ) Root Root Shadow Copes Memory N1 N1 N1' Barrer Root N1 N2 N3 N4 Root N1 N2 N3 N4 Root N1 N3' N4' N2 N3 N4 Root N2 N3 N4 Root N3' N4' Memory Barrer Log: log log N3' N4' (a) N2 log log N3' N4' N2 N1' N3' N4' N2 (b) N1 N3 N4 N1' N3' N4' Fg. 1. Example persstent wrtes wth (a) redo loggng and (b) shadow updates, when nodes N3 and N4 n a tree data structure are updated. Relaxed Persstence. Strct persstence [53, 54, 70] requres mantanng the program order of every wrte request, even wthn a sngle log update. Pelley et al. recently ntroduced a relaxed persstence model to mnmze the orderng control to buffer and coalesce wrtes to the same data [70]. 4 Our desgn adopts ther relaxed persstence model. For example, we only enforce the orderng between the wrtes to shadow copes and to the root ponter, as shown n Fgure 1(b). Another recent work, Kln [92] relaxed versonng, elmnatng the 4 More recently, Lu et al. [54] proposed the noton of loose-orderng consstency, whch relaxes the orderng of persstent memory wrtes even more by performng them speculatvely. use of loggng or shadow updates by mplementng a nonvolatle last-level cache (NV cache). However, due to the lmted capacty and assocatvty of the NV cache, the desgn cannot effcently accommodate large-granularty persstent updates n database and fle system applcatons. Consequently, we envson that loggng, shadow updates, and Kln-lke desgns wll coexst n persstent memory desgns n the near future Memory Requests of Persstent Applcatons Persstent rtes. e defne the wrtes to perform crtcal data updates that need to be persstent (ncludng updates to orgnal data locatons, log updates, and shadow-copy updates), as persstent wrtes. Each crtcal data update may generate an arbtrary number of persstent wrtes dependng on the granularty of the update. For example, n a key-value store, an update may be the addton of a new value of several bytes, several klobytes, several megabytes, or larger. Note that persstent memory archtectures ether typcally flush persstent wrtes (.e., drty blocks) out of processor caches at the pont of memory barrers, or mplement persstent wrtes as uncacheable (UC) wrtes [54, 88, 90, 92]. Non-persstent rtes. Non-crtcal data, such as stacks and data buffers, are not requred to survve system falures. Typcally, persstent memory does not need to perform versonng or orderng control over these wrtes. As such, persstent applcatons not only perform persstent wrtes but also non-persstent wrtes as well. Reads. Persstent applcatons also perform reads of n-flght persstent wrtes and other ndependent reads. Persstent memory can relax the orderng of ndependent reads wthout volatng the persstence requrement. However, dong so can mpose substantal performance penaltes (Secton 3.2). Reads of n-flght persstent updates need to wat untl these persstent wrtes arrve at BA-NVMs. Conventonal memory controller desgns provde read-after-wrte orderng by servcng reads of n-flght wrtes from wrte buffers. th volatle memory, such a behavor does not affect memory consstency. th nonvolatle memory, however, power outages or program crashes can destroy n-flght persstent wrtes before they are wrtten to persstent memory. Speculatve reads of n-flght persstent updates can lead to ncorrect orderng and potental resultng nconsstency, because f a read has already gotten the value of an n-flght wrte that would dsappear on a crash, wrong data may eventually propagate to persstent memory as a result of the read. 3. MOTIVATION: HANDLING PERSISTENT MEMORY ACCESSES Conventonal memory schedulng schemes are desgned based on the assumpton that man memory s used as workng memory,.e., a fle cache for storage systems. Ths assumpton no longer holds when man memory also supports data persstence, by accommodatng persstent applcatons that access memory dfferently from tradtonal non-persstent applcatons. Ths s because persstent memory wrtes have dfferent consstency requrements than workng memory wrtes, as we descrbed n Sectons 2.2 and 2.3. In ths secton, we study the performance mplcatons caused by ths dfferent memory access behavor of persstent applcatons (Secton 3.1), dscuss the problems of drectly adoptng exstng memory schedulng methods to handle persstent memory accesses (Secton 3.2), and descrbe why naïvely extendng past memory schedulers does not solve the problem (Secton 3.3) Memory Access Characterstcs of Persstent Applcatons An applcaton s memory access characterstcs can be evaluated usng four metrcs: a) memory ntensty, measured as the number of

4 last-level cache msses per thousand nstructons (MPKI) [19, 42]; b) wrte ntensty, measured as the porton of wrte msses (R%) out of all cache msses; c) bank-level parallelsm (BLP), measured as the average number of banks wth outstandng memory requests, when at least one other outstandng request exsts [50, 65]; d) row-buffer localty (RBL), measured as the average ht rate of the row buffer across all banks [64, 77]. To llustrate the dfferent memory access characterstcs of persstent and non-persstent applcatons, we studed the memory accesses of three representatve mcro-benchmarks, streamng, random, and KVStore. Streamng and random [42, 61] are both memory-ntensve, non-persstent applcatons, performng streamng and random accesses to a large array, respectvely. They serve as the two extreme cases wth dramatcally dfferent BLP and RBL. The persstent applcaton KVStore performs nserts and deletes to key-value pars (25-byte keys and 2K-byte values) of an n-memory B+ tree data structure. The szes of keys and values were specfcally chosen so that KVStore had the same memory ntensty as the other two mcro-benchmarks. e buld ths benchmark by mplementng a redo loggng (.e., wrtng new updates to a log whle keepng the orgnal data ntact) nterface on top of STX B+ Tree [12] to provde persstence support. Redo loggng behaves very smlarly to shadow updates (Secton 2.2), whch perform the updates n a shadow verson of the data structure nstead of loggng them n a log space. Our experments (not shown here) show that the performance mplcatons of KVStore wth shadow updates are smlar to those of KVStore wth redo loggng, whch we present here. Table 1 lsts the memory access characterstcs of the three mcrobenchmarks runnng separately. The persstent applcaton KVStore, especally n ts persstence phase when t performs persstent wrtes, has three major dscrepant memory access characterstcs n comparson to the two non-persstent applcatons. Table 1. Memory access characterstcs of three applcatons runnng ndvdually. The last row shows the memory access characterstcs of KVStore when t performs persstent wrtes. MPKI R% BLP RBL Streamng 100/Hgh 47%/Low 0.05/Low 96%/Hgh Random 100/Hgh 46%/Low 6.3/Hgh 0.4%/Low KVStore 100/Hgh 77%/Hgh 0.05/Low 71%/Hgh Persstence Phase (KVStore) 675/Hgh 92%/Hgh 0.01/Low 97%/Hgh 1. Hgh wrte ntensty. hle the three applcatons have the same memory ntensty, KVStore has much hgher wrte ntensty than the other two. Ths s because each nsert or delete operaton trggers a redo log update, whch appends a log entry contanng the addresses and the data of the modfed key-value par. The log updates generate extra wrte traffc n addton to the orgnal locaton updates. 2. Hgher memory ntensty wth persstent wrtes. The last row of Table 1 shows that whle the KVStore applcaton s n ts persstence phase (.e., when t s performng persstent wrtes and flushng these wrtes out of processor caches), t causes greatly hgher memory traffc (MPKI s 675). Durng ths phase, wrtes make up almost all (92%) the memory traffc. 3. Low BLP and hgh RBL wth persstent wrtes. KVStore, especally whle performng persstent wrtes, has low BLP and hgh RBL. KVStore s log s mplemented as a crcular buffer, smlar to those used n pror persstent memory desgns [90], by allocatng (as much as possble) one or more contguous regons n the physcal address space. As a result, the log updates lead to consecutve wrtes to contguous locatons n the same bank,.e., an access pattern that can be characterzed as streamng wrtes. Ths makes KVStore s wrte behavor smlar to that of streamng s reads: low BLP and hgh RBL. However, the memory bus can only servce ether reads or wrtes (to any bank) at any gven tme because the bus can be drven n only one drecton [49], whch causes a fundamental dfference (and conflct) between handlng streamng reads and streamng wrtes. e conclude that the persstent wrtes cause persstent applcatons to have wdely dfferent memory access characterstcs than nonpersstent applcatons. As we show next, the hgh wrte ntensty and low bank-level parallelsm of wrtes n persstent applcatons cause a fundamental challenge to exstng memory scheduler desgns for two reasons: 1) the hgh wrte ntensty causes frequent swtchng of the memory bus between reads and wrtes, causng bus turnaround delays, 2) the low wrte BLP causes underutlzaton of memory bandwdth whle wrtes are beng servced, whch delays any reads n the memory request buffer. These two problems become exacerbated when persstent applcatons run together wth non-persstent ones, a scenaro where both reads and persstent wrtes are frequently present n the memory request buffer Ineffcency of Pror Memory Schedulng Schemes As we mentoned above, the memory bus can servce ether reads or wrtes (to any bank) at any gven tme because the bus can be drven n only one drecton [49]. Pror memory controllers (e.g., [26, 27, 41, 42, 49, 63, 64, 65, 76, 77, 95]) buffer wrtes n a wrte queue to allow read requests to aggressvely utlze the memory bus. hen the wrte queue s full or s flled to a predefned level, the memory scheduler swtches to a wrte dran mode where t drans the wrte queue ether fully or to a predetermned level [49, 78, 83], n order to prevent stallng the entre processor ppelne. Durng the wrte dran mode, the memory bus can servce only wrtes. In addton, swtchng nto and out of the wrte dran mode from the read mode nduces addtonal penalty n the DRAM protocol (called read-to-wrte and wrte-to-read turnaround delays, t RT and t TR, approxmately 7.5ns and 15ns, respectvely [43]) durng whch no read or wrte commands can be scheduled on the bus, causng valuable memory bus cycles to be wasted. Therefore, frequent swtches nto the wrte dran mode and long tme spent n the wrte dran mode can sgnfcantly slow down reads and can harm the performance of read-ntensve applcatons and the entre system [49]. Ths desgn of conventonal memory schedulers s based on two assumptons, whch are generally sound for non-persstent applcatons. Frst, reads are on the crtcal path of applcaton executon whereas wrtes are usually not. Ths s sound when most nonpersstent applcatons abound wth read-dependent arthmetc, logc, and control flow operatons and wrtes can be servced from wrte buffers n caches and n the memory controller. Therefore, most pror memory schedulng schemes prortze reads over wrtes. Second, applcatons are usually read-ntensve, and memory controllers can delay wrtes wthout frequently fllng up the wrte queues. Therefore, optmzng the performance of wrtes s not as crtcal to performance n many workloads as the wrte queues are large enough for such read-ntensve applcatons. Unfortunately, these assumptons no longer hold when persstent wrtes need to go through the same shared memory nterface as nonpersstent requests. Frst, the orderng control of persstent wrtes requres the seralzaton of the persstent wrte traffc to man memory (e.g., va the use of memory barrers, as descrbed n Secton 2.2). Ths causes the persstent wrtes, reads of n-flght persstent wrtes, and computatons dependent on these wrtes (and

5 potentally all computatons after the persstent wrtes, dependng on the mplementaton) to be seralzed. As such, persstent wrtes are also on the crtcal executon path. As a result, smply prortzng read requests over persstent wrte requests can hurt system performance. Second, persstent applcatons are wrte-ntensve as opposed to read-ntensve. Ths s due to not only the persstent nature of data manpulaton, whch mght lead to more frequent memory updates, but also the way persstence s guaranteed usng multple persstent updates (.e., to the orgnal locaton as well as the alternate verson of the data n a redo log or a shadow copy, as explaned n Secton 2.2). Because of these characterstcs of persstent applcatons, exstng memory controllers are neffcent n handlng them concurrently wth non-persstent applcatons. Fgure 2 llustrates ths neffcency n a system that concurrently runs KVStore wth ether the streamng or the random applcaton. Ths fgure shows the fracton of memory access cycles that are spent due to delays related to bus turnaround between reads and wrtes as a functon of the number of wrte queue entres. 5 The fgure shows that up to 17% of memory bus cycles are wasted due to frequent bus turnarounds, wth a commonly-used 64- entry wrte queue. e found that ths s manly because persstent wrtes frequently overflow the wrte queue and force the memory controller to dran the wrtes. Typcal schedulers n modern processors have only 32 to 64 wrte queue entres to buffer memory requests [30]. Smply ncreasng the number of wrte queue entres n the scheduler s not a scalable soluton [7]. nd Bus Turnarou Overhead 20% 10% 0% 17% 8% KVStore+Streamng KVStore+Random rte Queue Entres Fg. 2. Fracton of memory access cycles wasted due to delays related to bus turnaround between reads and wrtes. In summary, conventonal memory schedulng schemes, whch prortze reads over persstent wrtes, become neffcent when persstent and non-persstent applcatons share the memory nterface. Ths causes relatvely low performance and farness (as we show next) Analyss of Pror and Naïve Schedulng Polces e have observed, n Secton 2.3, that the persstent applcatons (e.g., KVStore s) wrtes behave smlarly to streamng reads. As such, a natural dea would be to assgn these persstent wrtes the same prorty as read requests, nstead of deprortzng them below read requests, to ensure that persstent applcatons are not unfarly penalzed. Ths s a naïve (yet smple) method of extendng past schedulers to potentally deal wth persstent wrtes. In ths secton, we provde a case study analyss of farness and performance of both pror schedulers (FR-FCFS [76, 77, 95] and TCM [42]) and naïve extensons of these pror schedulers (FRFCFS-modfed and TCM-modfed) that gve equal prorty to persstent reads and wrtes. 6 Fgure 3 llustrates farness and system performance of these schedulers for two workloads where KVStore 5 Secton 6 explans our system setup and methodology. 6 Note that we preserve all the other orderng rules of FR-FCFS and TCM n FRCFCS-modfed and TCM-modfed. thn each prortzaton level, reads and persstent wrtes are prortzed over non-persstent wrtes. For example, wth FRCFCS-modfed, the hghest prorty requests are row-buffer-ht read and persstent wrte requests, second hghest prorty requests are row-bufferht non-persstent wrte requests. s run together wth streamng or random. To evaluate farness, we consder both the ndvdual slowdown of each applcaton [48] and the maxmum slowdown [20, 41, 42, 87] across both applcatons n a workload. e make several major observatons. Slowdo own FR-FCFS FRFCFS-modfed TCM TCM-modfed L1 L2 5 Maxmum Slowdown 1.5 Maxmum Slowdown 4 23% 29% % 0% Slowdo own KVStore Streamng KVStore Random L1 L2 (a) (b) (c) Fg. 3. Performance and farness of pror and naïve schedulng methods. Case Study 1 (L1 n Fgure 3(a) and (c)): hen KVStore s run together wth streamng, pror schedulng polces (FR-FCFS and TCM) unfarly slow down the persstent KVStore. Because these polces delay wrtes behnd reads, and streamng s reads wth hgh row-buffer localty capture a memory bank for a long tme, KVStore s wrtes need to wat for long tme perods even though they also have hgh row buffer localty. hen the naïve polces are employed, the effect s reversed: FRFCFS-modfed and TCM-modfed reduce the slowdown of KVStore but ncrease the slowdown of streamng compared to FRFCFS and TCM. KVStore performance mproves because, as persstent wrtes are the same prorty as reads, ts frequent wrtes are not delayed too long behnd streamng s reads. Streamng slows down greatly due to two major reasons. Frst, ts read requests are nterfered much more frequently wth the wrte requests of KVStore. Second, due to equal read and persstent wrte prortes, the memory bus has to be frequently swtched between persstent wrtes and streamng reads, leadng to hgh bus turnaround latences where no request gets scheduled on the bus. These delays slow down both applcatons but affect streamng a lot more because almost all accesses of streamng are reads and are on the crtcal path, and are affected by both read-to-wrte and wrte-to-read turnaround delays whereas KVStore s wrtes are less affected by wrte-to-read turnaround delays. Fgure 3(c) shows that the naïve polces greatly degrade overall system performance on ths workload, even though they mprove KVStore s performance. e fnd ths system performance degradaton s manly due to the frequent bus turnarounds. Case Study 2 (L2 n Fgure 3(b) and (c)): KVStore and random are two applcatons wth almost exactly opposte BLP, RBL, and wrte ntensty. hen these two run together, random slows down the most wth all of the four evaluated schedulng polces. Ths s because random s more vulnerable to nterference than the mostly-streamng KVStore due to ts hgh BLP, as also observed n prevous studes [42]. FRFCFS-modfed slghtly mproves KVStore s performance whle largely degradng random s performance due to the same reason descrbed for L1. TCMmodfed does not sgnfcantly affect ether applcaton s performance because three competng effects end up cancelng any benefts. Frst, TCM-modfed ends up prortzng the random-access random over streamng KVStore n some tme ntervals, as t s aware of the hgh vulnerablty of random due to ts hgh BLP and low RBL. Second, at other tmes, t prortzes the frequent persstent wrte requests of KVStore over read requests of random due to the equal prorty of reads and persstent wrtes. Thrd, frequent bus turnarounds (as dscussed above for L1) degrade both applcatons performance. Fgure 3(c) shows that the naïve polces slghtly degrade or not affect overall system performance on ths workload. eghted Speedup

6 3.4. Summary and Our Goal In summary, nether conventonal schedulng polces nor ther naïve extensons that take nto account persstent wrtes provde hgh farness and hgh system performance. Ths s because they lead to 1) frequent entres nto wrte dran mode due to hgh ntensty of persstent wrtes, 2) resultng frequent bus turnarounds between read and wrte requests that cause wasted bus cycles, and 3) memory bandwdth underutlzaton durng wrte dran mode due to low BLP of persstent wrtes. These three problems are pctorally llustrated n Fgure 4(a) and (b), whch depct the servce tmelne of memory requests wth conventonal schedulng and ts naïve extenson. Ths llustraton shows that 1) persstent wrtes heavly access Bank-1, leadng to hgh bandwdth underutlzaton wth both schedulers, 2) both schedulers lead to frequent swtchng between reads and wrtes, and 3) the naïve scheduler delays read requests sgnfcantly because t prortzes persstent wrtes, and t does not reduce the bus turnarounds. Our evaluaton of 18 workload combnatons (n Secton 7) shows that varous conventonal and naïve schedulng schemes lead to low system performance and farness, due to these three reasons. Therefore, a new memory scheduler desgn s needed to overcome these challenges and provde hgh performance and farness n a system where the memory nterface s shared between persstent and non-persstent applcatons. Our goal n ths work s to desgn such a scheduler (Secton 4). Conventonal Schedulng (a) NaÏve Schedulng (b) FIRM (c) R Streamng Reads R Random Reads Persstent rtes Bus Turnaround Bank 1 R R R R R Bank 2 R Read rte rte rte Batch Queue Batch Queue Servced Full Servced Full Bank 1 R R R R R Bank 2 Bank 1 Bank 2 R Read rte Batch Batch Servced Servced R R R rte Queue Full Saved Tme R R R Tme Tme Tme 1 Persstent wrte strdng: 2 Persstence-aware Increasng BLP of memory schedulng: persstent wrtes Reducng wrte queue dran and bus turnarounds Fg. 4. Example comparng conventonal, naïve, and proposed schemes. 4. FIRM DESIGN Overvew. e propose FIRM, a memory control scheme that ams to serve requests from persstent and non-persstent applcatons n a far and hgh throughput manner at the shared memory nterface. FIRM ntroduces two novel desgn prncples to acheve ths goal, whch are llustrated conceptually n Fgure 4(c). Frst, persstent wrte strdng ensures that persstent wrtes to memory have hgh BLP such that memory bandwdth s well-utlzed durng the wrte dran mode. It does so by ensurng that consecutvely-ssued groups of wrtes to the log or shadow copes n persstent memory are mapped to dfferent memory banks. Ths reduces not only the duraton of the wrte dran mode but also the frequency of entry nto wrte dran mode compared to pror methods, as shown n Fgure 4(c). Second, persstence-aware memory schedulng mnmzes the frequency of wrte queue drans and bus turnarounds by schedulng the queued up reads and wrtes n a far manner. It does so by balancng the amount of tme spent n wrte dran mode and read mode, whle ensurng that the tme spent n each mode s long enough such that the wasted cycles due to bus turnaround delays are mnmzed. Persstence-aware memory schedulng therefore reduces: 1) the latency of servcng the persstent wrtes, 2) the amount of tme persstent wrtes block outstandng reads, and 3) the frequency of entry nto wrte queue dran mode. The realzaton of these two prncples leads to hgher performance and effcency than conventonal and naïve scheduler desgns, as shown n Fgure 4. FIRM desgn conssts of four components: 1) request batchng, whch forms separate batches of read and wrte requests that go to the same row, to maxmze row buffer localty, 2) source categorzaton, whch categorzes the request sources for effectve schedulng by dstngushng varous access patterns of applcatons, 3) persstent wrte strdng, whch maxmzes BLP of persstent requests, and 4) persstence-aware memory schedulng, whch maxmzes performance and farness by approprately adjustng the number of read and wrte batches to be servced at a tme. Fgure 5(a) depcts an overvew of the components, whch we descrbe next Request Batchng The goal of request batchng s to group together the set of requests to the same memory row from each source (.e., process or hardware thread context, as descrbed below n Secton 4.2). Batches are formed per source, smlarly to prevous work [7, 65], separately for reads and wrtes. If scheduled consecutvely, all requests n a read or wrte batch (except for the frst one) wll ht n the row buffer, mnmzng latency and maxmzng memory data throughput. A batch s consdered to be formed when the next memory request n the request buffer of a source s to a dfferent row [7] Source Categorzaton To apply approprate memory control over requests wth varous characterstcs, FIRM dynamcally classfes the sources of memory requests nto four: non-ntensve, streamng, random, persstent. A source s defned as a process or thread durng a partcular tme perod, when t s generatng memory requests n a specfc manner. For example, a persstent applcaton s consdered a persstent source when t s performng persstent wrtes. It may also be a non-ntensve, a streamng, orarandom source n other tme perods. FIRM categorzes sources on an nterval bass. At the end of an nterval, each source s categorzed based on ts memory ntensty, RBL, BLP, and persstence characterstcs durng the nterval, predctng that t wll exhbt smlar behavor n the next nterval. 7 The man new feature of FIRM s source categorzaton s ts detecton of a persstent source (nspred by the dscrepant characterstcs of persstent applcatons descrbed n Secton 2.3). Table 2 depcts the rules FIRM employs to categorze a source as persstent. FIRM uses program hnts (wth the software nterface descrbed n Secton 5) to determne whether a hardware context belongs to a persstent applcaton. Ths ensures that a non-persstent applcaton does not get classfed as a persstent source. If a hardware context belongng to such an applcaton s generatng wrte batches that are larger than a pre-defned threshold (.e., has an average wrte batch sze greater than 30 n the prevous nterval) and f t nserts memory barrers between memory requests (.e., has nserted at least one memory barrer between wrte requests n the prevous nterval), FIRM categorzes t as a persstent source. 7 e use an nterval sze of one mllon cycles, whch we emprcally fnd to provde a good tradeoff between predcton accuracy, adaptvty to workload behavor, and overhead of categorzaton.

7 Memory Requests Request Batchng Source Categorzaton Nonntensve Random Streamng Persstent Persstent rte Strdng Persstence-aware Memory Schedulng (a) FIRM components. A Batch Data Buffer (Log or Shadow Copes) A Batch (b) Offset Buffer Group (Row-buffer Sze) Persstent wrtes ssued to a contguous memory space Strded persstent wrtes scheduled to memory Persstent wrte strdng. Tme Tme R Streamng Reads R Random Reads Persstent rtes (Strded) Read Queue R2 R2 R2 R2 R2 R1 R1 Bank 1 Bank 2 rte Queue t r max = t r 3 R3 R3 R3 R3 R3 R1 R1 Bank 3 t w max = t w R1 Bank 4 Schedule 1 read batch group and 1 persstent wrte batch t r group n each tme nterval. t r 3 2 t r t r j 1 t w t w 2 1 Ready read batches: R1, R2, R3 Possble batch groups: 1. [ R1 ] 2. [ R1, R2 ] 3. [ R1, R2, R3 ] hch batch group to schedule? Ready persstent wrte batches: 1, 2 Possble batch groups: 1. [ 1 ] 2. [ 1, 2 ] hch batch group to schedule? 2 1 Bank 1 Bank 2 Bank 3 Bank 4 (c) Persstence-aware memory schedulng polcy. Fg. 5. Overvew of the FIRM desgn and ts two key technques. t w j Table 2. Rules used to dentfy persstent sources. A thread s dentfed as a persstent source, f t 1: belongs to a persstent applcaton; 2: s generatng wrte batches that are larger than a pre-defned threshold n the past nterval; 3: nserts memory barrers between memory requests. Sources that are not persstent are classfed nto non-ntensve, streamng, and random based on three metrcs: MPKI (memory ntensty), BLP, RBL. Ths categorzaton s nspred by prevous studes [42, 63], showng varyng characterstcs of such sources. A non-ntensve source has low memory ntensty. e dentfy these sources to prortze ther batches over batches of other sources; ths maxmzes system performance as such sources are latency senstve [42]. Streamng and random sources are typcally read ntensve, havng opposte BLP and RBL characterstcs (Table 1). 8 Ths streamng and random source classfcaton s used later by the underlyng schedulng polcy FIRM borrows from past works to maxmze system performance and farness (e.g., TCM [42]) Persstent rte Strdng The goal of ths mechansm s to reduce the latency of servcng consecutvely-ssued persstent wrtes by ensurng they have hgh BLP and thus fully utlze memory bandwdth. e acheve ths goal by strdng the persstent wrtes across multple memory banks va hardware or software support. The basc dea of persstent wrte strdng s smple: nstead of mappng consecutve groups of row-buffer-szed persstent wrtes to consecutve row-buffer-szed locatons n a persstent data buffer (that s used for the redo log or shadow copes n a persstent applcaton), whch causes them to map to the same memory bank, change the mappng such that they are strded by an offset that ensures they map to dfferent memory banks. Fgure 5(b) llustrates ths dea. A persstent applcaton can stll allocate a contguous memory space for the persstent data buffer. Our method maps the accesses to the data buffer to dfferent banks n a strded manner. Contguous persstent wrtes of less than or equal to the row-buffer sze are stll mapped to contguous data buffer space wth of a row buffer sze (called a buffer group ) to acheve hgh RBL. However, contguous persstent wrtes beyond the sze of the row-buffer are strded by an offset. The value of the offset s determned by the poston of bank ndex bts used n the physcal 8 In our experments, a hardware context s classfed as non-ntensve f ts MPKI< 1. A hardware context s classfed as streamng f ts MPKI>1, BLP<4 and RBL>70%. All other hardware contexts that are not persstent are classfed as random. Hgher-order address bts Fg. 6. Physcal address to bank mappng example. address mappng scheme employed by the memory controller. For example, wth the address mappng scheme n Fgure 6, the offset should be 128K bytes f we want to fully utlze all eght banks wth persstent wrtes (because a contguous memory chunk of 16KB gets mapped to the same bank wth ths address mappng scheme,.e., the memory nterleavng granularty s 16KB across banks). Ths persstent wrte strdng mechansm can be mplemented n ether the memory controller hardware or a user-mode lbrary, as we descrbe n Secton 5. Note that the persstent wrte strdng mechansm provdes a determnstc (re)mappng of persstent data buffer physcal addresses to physcal memory addresses n a strded manner. The remapped physcal addresses wll not exceed the boundary of the orgnal data buffer. As a result, re-accessng or recoverng data at any tme from the persstent data buffer s not an ssue: all accesses to the buffer go through ths remappng. Alternatve Methods. Note that commodty memory controllers randomze hgher-order address bts to mnmze bank conflcts (Fgure 6). However, they can stll fal to map persstent wrtes to dfferent banks because as we showed n Secton 3.1, persstent wrtes are usually streamng and hence they are lkely to map to the same bank. It s mpractcal to mprove the BLP of persstent wrtes by aggressvely bufferng them due to two reasons: 1) The large bufferng capacty requred. For example, we mght need a wrte queue as large as 128KB to utlze all eght banks of a DDR3 channel wth the address mappng shown n Fgure 6. 2) The regon of concurrent contguous wrtes may not be large enough to cover multple banks (.e., there may not be enough wrtes present to dfferent banks). Alternatvely, kernel-level memory access randomzaton [69] may dstrbute wrtes to multple banks durng persstent applcaton executon. However, the address mappng nformaton can be lost when the system reboots, leavng the BA-NVM wth unrecoverable data. Fnally, t s also prohbtvely complex to randomze the bank mappng of only persstent wrtes by choosng a dfferent set of address bts as ther bank ndexes,.e., mantanng multple address mappng schemes n a sngle memory system. Dong so requres complex bookkeepng mechansms to ensure correct mappng of memory addresses. For these very reasons, we have developed the persstent wrte strdng technque we have descrbed.

8 4.4. Persstence-Aware Memory Schedulng The goal of ths component s to mnmze wrte queue drans and bus turnarounds by ntellgently parttonng memory servce between reads and persstent wrtes whle maxmzng system performance and farness. To acheve ths mult-objectve goal, FIRM operates at the batch granularty and forms a schedule of read and wrte batches of dfferent source types: non-ntensve, streamng, random, and persstent. To maxmze system performance, FIRM prortzes non-ntensve read batches over all other batches. For the remanng batches of requests, FIRM employs a new polcy that determnes 1) how to group read batches and wrte batches and 2) when to swtch between servcng read batches and wrte batches. FIRM does ths n a manner that balances the amount of tme spent n wrte dran mode (servcng wrte batches) and read mode (servcng read batches) n a way that s proportonal to the read and wrte demands, whle ensurng that tme spent n each mode s long enough such that the wasted cycles due to bus turnaround delays are mnmzed. hen the memory scheduler s servcng read or persstent wrte batches, n read mode or wrte dran mode, the schedulng polcy employed can be any of the prevously-proposed memory request schedulng polces (e.g., [26, 41, 42, 49, 64, 65, 76, 77, 95]) and the orderng of persstent wrte batches s fxed by the orderng control of persstent applcatons. The key novelty of our proposal s not the partcular prortzaton polcy between requests, but the mechansm that determnes how to group batches and when to swtch between wrte dran mode and read mode, whch we descrbe n detal next. 9 To mnmze wrte queue drans, FIRM schedules reads and persstent wrtes wthn an nterval n a round-robn manner wth the memory bandwdth (.e., the tme nterval) parttoned between them based on ther demands. To prevent frequent bus turnarounds, FIRM schedules a group of batches n one bus transfer drecton before schedulng another group of batches n the other drecton. Fgure 5(c) llustrates an example of ths persstence-aware memory schedulng polcy. Assume, wthout loss of generalty, that we have the followng batches ready to be scheduled at the begnnng of a tme nterval: a random read batch R1, two streamng read batches R2 and R3, and two (already-strded) persstent wrte batches 1 and 2. e defne a batch group as a group of batches that wll be scheduled together. As llustrated n Fgure 5(c), the memory controller has varous optons to compose the read and wrte batch groups. Ths fgure shows three possble batch groups for reads and two possble batch groups for wrtes. These possbltes assume that the underlyng memory request schedulng polcy dctates the order of batches wthn a batch group. Our proposed technque thus bols down to determnng how many read or wrte batches to be grouped together to be scheduled n the next read mode or wrte dran mode. e desgn a new technque that ams to satsfy the followng two goals: 1) servcng the two batch groups (read and wrte) consumes duratons proportonal to ther demand, 2) the total tme spent servcng the two batch groups s much longer than the bus turnaround tme. The frst goal s to prevent the starvaton of ether reads or persstent wrtes, by farly parttonng the memory bandwdth between them. The second goal s to maxmze performance by ensurng mnmal mount of tme s wasted on bus turnarounds. Mathematcally, we formulate these two goals as the followng 9 Note that most prevous memory schedulng schemes focus on read requests and do not dscuss how to handle swtchng between read and wrte modes n the memory controller, mplctly assumng that reads are prortzed over wrtes untl the wrte queue becomes full or close to full [49]. followng two nequaltes: { t r tr t w max t w max t RT +t TR t r +t w μ turnaround (1) where t r and t w are the tmes to servce a read and a persstent wrte batch group, respectvely (Fgure 5(c)). They are the maxmum servce tme for the batch group at any bank : { t r =max {H r t rht t w =max {H w t wht + M r t rmss }, + M w t wmss } where t rht, t wht, t rmss, and t wmss are the tmes to servce a row buffer ht/mss read/wrte request; H r and H w are the number of row-buffer read/wrte hts; M r and M w are row-buffer read/wrte msses. t r max and t w max are the maxmum tmes to servce all the n-flght read and wrte wrte requests (llustrated n Fgure 5 (c)). μ turnaround s a user-defned parameter to represent the maxmum tolerable fracton of bus turnaround tme out of the total servce tme of memory requests. The goal of our mechansm s to group read and wrte batches (.e., form read and wrte batch groups) to be scheduled n the next read mode and wrte dran mode n a manner that satsfes Equaton 1. Thus, the technque bols down to selectng from the set of possble read/wrte batch groups such that they satsfy t r next (the duraton of the next read mode) and t w next (the duraton of the next wrte dran mode) as ndcated by the constrants n Equaton 3 (whch s obtaned by solvng the nequalty n Equaton 1). Our technque, Algorthm 1, forms a batch group that has a mnmum servce duraton that satsfes the constrant on the rght hand sde of Equaton t r next =mnt r j j, t r j (t RT +t TR )/μ turnaround 1+t w max /tr max t w next = mn t w j, t w j (t RT +t TR )/μ turnaround j 1+t r max /tw max Algorthm 1 Formaton of read and wrte batch groups. Input: t RT,t TR, and μ turnaround. Output: The read batch group to be scheduled, ndcated by t r next ; The persstent wrte batch group to be scheduled, ndcated by t w next. Intalzaton: k r number of read batch groups; k w number of persstent wrte batch groups; for j 0 to k r 1 do Calculate t r j wth Equaton 2; end for for j 0 to k w 1 do Calculate t w j wth Equaton 2; end for t r max kr 1 max j=0 tr j ; t w max k w 1 max j=0 tw j ; Calculate t r next and tw next wth Equaton 3; 5. IMPLEMENTATION 5.1. Software-Hardware Interface FIRM software nterface provdes memory controllers wth the requred nformaton to dentfy persstent sources durng the source categorzaton stage. Ths ncludes 1) the dentfcaton of the persstent applcaton, 2) the communcaton of the executon of memory 10 Ths algorthm can be nvoked only once at the begnnng of each nterval to determne the duraton of consecutve read and wrte dran modes for the nterval. (2) (3)

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior Thread Cluster Memory Schedulng: Explotng Dfferences n Memory Access Behavor Yoongu Km Mchael Papamchael Onur Mutlu Mor Harchol-Balter yoonguk@ece.cmu.edu papamx@cs.cmu.edu onur@cmu.edu harchol@cs.cmu.edu

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Scheduling and queue management. DigiComm II

Scheduling and queue management. DigiComm II Schedulng and queue management Tradtonal queung behavour n routers Data transfer: datagrams: ndvdual packets no recognton of flows connectonless: no sgnallng Forwardng: based on per-datagram forwardng

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

If you miss a key. Chapter 6: Demand Paging Source:

If you miss a key. Chapter 6: Demand Paging Source: ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Source: http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor If you mss a key after yesterday

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

On Achieving Fairness in the Joint Allocation of Buffer and Bandwidth Resources: Principles and Algorithms

On Achieving Fairness in the Joint Allocation of Buffer and Bandwidth Resources: Principles and Algorithms On Achevng Farness n the Jont Allocaton of Buffer and Bandwdth Resources: Prncples and Algorthms Yunka Zhou and Harsh Sethu (correspondng author) Abstract Farness n network traffc management can mprove

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud Environment

TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud Environment TrpS: Automated Mult-tered Data Placement n a Geo-dstrbuted Cloud Envronment Kwangsung Oh, Abhshek Chandra, and Jon Wessman Department of Computer Scence and Engneerng Unversty of Mnnesota Twn Ctes Mnneapols,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Shared Running Buffer Based Proxy Caching of Streaming Sessions

Shared Running Buffer Based Proxy Caching of Streaming Sessions Shared Runnng Buffer Based Proxy Cachng of Streamng Sessons Songqng Chen, Bo Shen, Yong Yan, Sujoy Basu Moble and Meda Systems Laboratory HP Laboratores Palo Alto HPL-23-47 March th, 23* E-mal: sqchen@cs.wm.edu,

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

WITH rapid improvements of wireless technologies,

WITH rapid improvements of wireless technologies, JOURNAL OF SYSTEMS ARCHITECTURE, SPECIAL ISSUE: HIGHLY-RELIABLE CPS, VOL. 00, NO. 0, MONTH 013 1 Adaptve GTS Allocaton n IEEE 80.15.4 for Real-Tme Wreless Sensor Networks Feng Xa, Ruonan Hao, Je L, Naxue

More information

Memory and I/O Organization

Memory and I/O Organization Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams Self-Tunng, Bandwdth-Aware Montorng for Dynamc Data Streams Navendu Jan, Praveen Yalagandula, Mke Dahln, Yn Zhang Mcrosoft Research HP Labs The Unversty of Texas at Austn Abstract We present, a self-tunng,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Real-time Scheduling

Real-time Scheduling Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX

More information

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7 Optmzed Regonal Cachng for On-Demand Data Delvery Derek L. Eager Mchael C. Ferrs Mary K. Vernon Unversty of Saskatchewan Unversty of Wsconsn Madson Saskatoon, SK Canada S7N 5A9 Madson, WI 5376 eager@cs.usask.ca

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Advanced Computer Networks

Advanced Computer Networks Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Page Table Structures Page table structures Page table structures Problem: smple lnear table s too bg Problem:

More information

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems Relablty and Energy-aware Cache Reconfguraton for Embedded Systems Yuanwen Huang and Prabhat Mshra Department of Computer and Informaton Scence and Engneerng Unversty of Florda, Ganesvlle FL 326-62, USA

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

CS 268: Lecture 8 Router Support for Congestion Control

CS 268: Lecture 8 Router Support for Congestion Control CS 268: Lecture 8 Router Support for Congeston Control Ion Stoca Computer Scence Dvson Department of Electrcal Engneerng and Computer Scences Unversty of Calforna, Berkeley Berkeley, CA 9472-1776 Router

More information

3. CR parameters and Multi-Objective Fitness Function

3. CR parameters and Multi-Objective Fitness Function 3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft

More information

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs Utlty-Based Acceleraton of Multthreaded Applcatons on Asymmetrc CMPs José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt ECE Department The Unversty of Texas at Austn Austn, TX, USA {joao, patt}@ece.utexas.edu

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Space-Optimal, Wait-Free Real-Time Synchronization

Space-Optimal, Wait-Free Real-Time Synchronization 1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,

More information

A Frame Packing Mechanism Using PDO Communication Service within CANopen

A Frame Packing Mechanism Using PDO Communication Service within CANopen 28 A Frame Packng Mechansm Usng PDO Communcaton Servce wthn CANopen Mnkoo Kang and Kejn Park Dvson of Industral & Informaton Systems Engneerng, Ajou Unversty, Suwon, Gyeongg-do, South Korea Summary The

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture Goals and Approach CS 194: Dstrbuted Systems Resource Allocaton Goal: acheve predcable performances Three steps: 1) Estmate applcaton s resource needs (not n ths lecture) 2) Admsson control 3) Resource

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics A Hybrd Genetc Algorthm for Routng Optmzaton n IP Networks Utlzng Bandwdth and Delay Metrcs Anton Redl Insttute of Communcaton Networks, Munch Unversty of Technology, Arcsstr. 21, 80290 Munch, Germany

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

Sample Solution. Advanced Computer Networks P 1 P 2 P 3 P 4 P 5. Module: IN2097 Date: Examiner: Prof. Dr.-Ing. Georg Carle Exam: Final exam

Sample Solution. Advanced Computer Networks P 1 P 2 P 3 P 4 P 5. Module: IN2097 Date: Examiner: Prof. Dr.-Ing. Georg Carle Exam: Final exam Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Routing in Degree-constrained FSO Mesh Networks

Routing in Degree-constrained FSO Mesh Networks Internatonal Journal of Hybrd Informaton Technology Vol., No., Aprl, 009 Routng n Degree-constraned FSO Mesh Networks Zpng Hu, Pramode Verma, and James Sluss Jr. School of Electrcal & Computer Engneerng

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems An Investgaton nto Server Parameter Selecton for Herarchcal Fxed Prorty Pre-emptve Systems R.I. Davs and A. Burns Real-Tme Systems Research Group, Department of omputer Scence, Unversty of York, YO10 5DD,

More information

A QoS-aware Scheduling Scheme for Software-Defined Storage Oriented iscsi Target

A QoS-aware Scheduling Scheme for Software-Defined Storage Oriented iscsi Target A QoS-aware Schedulng Scheme for Software-Defned Storage Orented SCSI Target Xanghu Meng 1,2, Xuewen Zeng 1, Xao Chen 1, Xaozhou Ye 1,* 1 Natonal Network New Meda Engneerng Research Center, Insttute of

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han UNITY as a Tool for Desgn and Valdaton of a Data Replcaton System Phlppe Quennec Gerard Padou CENA IRIT-ENSEEIHT y Nnth Internatonal Conference on Systems Engneerng Unversty of Nevada, Las Vegas { 14-16

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

#4 Inverted page table. The need for more bookkeeping. Inverted page table architecture. Today. Our Small Quiz

#4 Inverted page table. The need for more bookkeeping. Inverted page table architecture. Today. Our Small Quiz ADRIAN PERRIG & TORSTEN HOEFLER Networks and Operatng Systems (-6-) Chapter 6: Demand Pagng http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor () # Inverted table One system-wde table

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

WIRELESS communication technology has gained widespread

WIRELESS communication technology has gained widespread 616 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 4, NO. 6, NOVEMBER/DECEMBER 2005 Dstrbuted Far Schedulng n a Wreless LAN Ntn Vadya, Senor Member, IEEE, Anurag Dugar, Seema Gupta, and Paramvr Bahl, Senor

More information

Technical Report. i-game: An Implicit GTS Allocation Mechanism in IEEE for Time- Sensitive Wireless Sensor Networks

Technical Report. i-game: An Implicit GTS Allocation Mechanism in IEEE for Time- Sensitive Wireless Sensor Networks www.hurray.sep.pp.pt Techncal Report -GAME: An Implct GTS Allocaton Mechansm n IEEE 802.15.4 for Tme- Senstve Wreless Sensor etworks Ans Koubaa Máro Alves Eduardo Tovar TR-060706 Verson: 1.0 Date: Jul

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources

On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources On the Farness-Effcency Tradeoff for Packet Processng wth Multple Resources We Wang, Chen Feng, Baochun L, and Ben Lang Department of Electrcal and Computer Engneerng, Unversty of Toronto {wewang, cfeng,

More information

Fibre-Optic AWG-based Real-Time Networks

Fibre-Optic AWG-based Real-Time Networks Fbre-Optc AWG-based Real-Tme Networks Krstna Kunert, Annette Böhm, Magnus Jonsson, School of Informaton Scence, Computer and Electrcal Engneerng, Halmstad Unversty {Magnus.Jonsson, Krstna.Kunert}@de.hh.se

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

The Impact of Delayed Acknowledgement on E-TCP Performance In Wireless networks

The Impact of Delayed Acknowledgement on E-TCP Performance In Wireless networks The mpact of Delayed Acknoledgement on E-TCP Performance n Wreless netorks Deddy Chandra and Rchard J. Harrs School of Electrcal and Computer System Engneerng Royal Melbourne nsttute of Technology Melbourne,

More information

Analysis of Collaborative Distributed Admission Control in x Networks

Analysis of Collaborative Distributed Admission Control in x Networks 1 Analyss of Collaboratve Dstrbuted Admsson Control n 82.11x Networks Thnh Nguyen, Member, IEEE, Ken Nguyen, Member, IEEE, Lnha He, Member, IEEE, Abstract Wth the recent surge of wreless home networks,

More information

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices MQSm: A Framework for Enablng Realstc Studes of Modern Mult-Queue SSD Devces Arash Tavakkol, Juan Gómez-Luna, and Mohammad Sadrosadat, ETH Zürch; Saugata Ghose, Carnege Mellon Unversty; Onur Mutlu, ETH

More information

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams

Self-Tuning, Bandwidth-Aware Monitoring for Dynamic Data Streams Self-Tunng, Bandwdth-Aware Montorng for Dynamc Data Streams Navendu Jan #, Praveen Yalagandula, Mke Dahln #, Yn Zhang # # Unversty of Texas at Austn HP Labs Abstract We present, a self-tunng, bandwdth-aware

More information

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals

More information

IP Camera Configuration Software Instruction Manual

IP Camera Configuration Software Instruction Manual IP Camera 9483 - Confguraton Software Instructon Manual VBD 612-4 (10.14) Dear Customer, Wth your purchase of ths IP Camera, you have chosen a qualty product manufactured by RADEMACHER. Thank you for the

More information

Lecture 7 Real Time Task Scheduling. Forrest Brewer

Lecture 7 Real Time Task Scheduling. Forrest Brewer Lecture 7 Real Tme Task Schedulng Forrest Brewer Real Tme ANSI defnes real tme as A Real tme process s a process whch delvers the results of processng n a gven tme span A data may requre processng at a

More information

SRB: Shared Running Buffers in Proxy to Exploit Memory Locality of Multiple Streaming Media Sessions

SRB: Shared Running Buffers in Proxy to Exploit Memory Locality of Multiple Streaming Media Sessions SRB: Shared Runnng Buffers n Proxy to Explot Memory Localty of Multple Streamng Meda Sessons Songqng Chen,BoShen, Yong Yan, Sujoy Basu, and Xaodong Zhang Department of Computer Scence Moble and Meda System

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

A fair buffer allocation scheme

A fair buffer allocation scheme A far buffer allocaton scheme Juha Henanen and Kalev Klkk Telecom Fnland P.O. Box 228, SF-330 Tampere, Fnland E-mal: juha.henanen@tele.f Abstract An approprate servce for data traffc n ATM networks requres

More information