Diploma Thesis. Instruction Timing Analysis for Linux/x86-based Embedded and Desktop Systems

Size: px
Start display at page:

Download "Diploma Thesis. Instruction Timing Analysis for Linux/x86-based Embedded and Desktop Systems"

Transcription

1 Facuty of Eectrica Engineering & Information Technoogy Chair of Process Automation Dipoma Thesis Instruction Timing Anaysis for Linux/x86-based Embedded and Desktop Systems Author: Supervisors: Tobias John Prof Peter Protze Dr Robert Baumgart Date of Submission: 22th September 2005

2 John, Tobias Instruction Timing Anaysis for Linux/x86-based Embedded and Desktop Systems Dipoma Thesis, Facuty of Eectrica Engineering & Information Technoogy Chemnitz University of Technoogy, 2oo5

3 Decaration of Authorship I hereby decare that the whoe of this dipoma thesis is my own work, except where expicity stated otherwise in the text or in the bibiography This work is submitted to Chemnitz University of Technoogy as a requirement for being awarded a dipoma in Eectrica Engineering - Automation Engineering I decare that it has not been submitted in whoe, or in part, for any other degree Chemnitz, 22th September 2005 Tobias John

4

5 Conceptua formuation 1 Objective: Rea-time aspects are increasingy reevant in standard PC environments At the same time x86-based processors are used more often in embedded systems For the construction of rea-time systems standard techniques are avaiabe, which unfortunatey reduce the efficiency of the systems Therefore specific system parameters often do without a guaranty, and overdimensioned hardware is used instead The intention of this work is to obtain quantitative statements about the chronoogica behaviour of current x86 based processor architectures under the Linux operating system Reevant aspects are as foows: * comparison of types Pentium 4 and AMD Ean SC 410 * outine and abstract of micro-benchmarks which address certain units (ALU, FPU, MMX, ISSE) specificay * assessment of typica operating systems services (often used system cas) on both architectures * assessment of given rea-time appications (for exampe current muti media codes) This work wi be ministered by chair of Process Automation of the facuty of Eectrica Engineering & Information Technoogy (Prof Protze) and by the juniorprofessorship Rea-time Systems of the computing facuty (Dr Baumgart) 1 Appendix A hods a copy of the origina (german) conceptua formuation

6

7 Abstract As the conceptua formuation expresses this work shoud compare x86 based genera purpose processors with embedded ones intended to be used in rea-time systems The focus was directed especiay the jitter of execution times However to anayse the execution of instructions on a microprocessor in a way that the competion is one time as fast as possibe and the other time as sow as in the worst case, requires that the architecture is known in detai It is necessary to know which aspects, and in which order, infuence the execution fow This information is often not pubished, not detaied enough or even wrong 2 As a consequence the first step has to be the anaysis of the underying hardware That this step woud take me most of the time coud not be known in advance and therefore the resuts of this work are a coection of microbenchmarks to expore the caching and branch prediction architecture to obtain the missing information Nevertheess, the gained resuts are interpreted in a way that best and worst case in execution timing are stressed 2 Eg [LH] and [Sea00] decare that the PII uses a strict LRU repacement strategy, athough a pseudo technique is actuay appied to L1!

8 Acknowedgements Athough this work has been created by mysef, severa friends have had great infuence in it, that is why I want to mention those: I woud ike to thank Jette & Tim for proofreading, my famiy for the patience they had with me, Chris for remembering me on the duties of a student and for aways being open to my questions Specia thanks goes to Brigitte for wakening the ast energy resources and to Cari, she showed me that there is something worth fighting for

9 CONTENTS Contents 1 Introduction 1 2 Background knowedge 2 21 Hardware 2 22 Performance monitoring 2 23 Branch prediction 3 24 CPU caches INTELs Cache Architecture 8 3 Anaysis concepts Performance Monitoring - IA-32 architecture P6 famiy PIV, Xeon Branch prediction Branch History (shift-)register (BHR) Branch Target Buffer (BTB) Caching Adjacent cache ine prefetch - PIV Caching strategy of the L1 cache Repacement strategy 22 4 Worst case on caching Cache fooding I Doube Purge Scenario Cache fooder as described in [LH] Cache fooding II The agorithm Conditions on the cache architecture Fiing vs Fooding 37 5 Resuts Branch prediction BHR - Branch History Register BTB - Branch Target Buffer Caching Adjacent cache ine prefetch - PIV L1 - caching strategy Repacement strategy Cache fooding 50 6 Concusions 53 A Conceptua formuation 55 Tobias John i

10 CONTENTS ii Tobias John

11 1 INTRODUCTION 1 Introduction Rea-time aspects are becoming more important in standard desktop PC environments and x86 based processors are being utiized in embedded systems more often Whie these processors were not created for use in hard rea time systems, they are fast and inexpensive and can be used if it is possibe to determine the worst case execution time Information on CPU caches (L1, L2) and branch prediction architecture is necessary to simuate best and worst cases in execution timing, but is often not detaied enough and sometimes not pubished at a This document describes how the underying hardware can be anaysed to obtain this information This document is structured as foows: The foowing section (sec 2) gives background information to the covered topics: performance monitoring, branch prediction and caching With this genera knowedge it shoud be no probem to understand the exporation techniques presented in section 3 These pages cover the ideas behind the benchmarks and how they have to be impemented Section 4 describes the worst case on caching Two possibiities are expained on how to achieve it, whereas the first one is based on [LH] and the second has been deveoped by mysef These theoretica concepts have been impemented and tested on Inte Pentium II, III and IV processor, beonging to the architectures P6 and Netburst The resuts obtained are presented in section 5 Finay summary and a ist of open questions and remaining probems (sec 6) foows Tobias John 1

12 2 BACKGROUND KNOWLEDGE 2 Background knowedge 21 Hardware The tests described in the foowing sections were executed on different Inte architectures: P6 (Pentium II, III) and Netburst (Pentium IV) The PII is a Kamath at 233 MHz with a 66 MHz system bus frequency and a 512 KB second eve cache running at haf the core cock The PIII, from the same famiy of processors, has a CPU frequency of 500 MHz, a system bus rated at 100 MHz and an L2 cache of the same size as the PII that runs at haf core speed, too Its Codename is Katmai Both processors have the MMX instruction set but ony the PIII utiizes SSE The representative of the Netburst microarchitecture, the PIV, is a Northood processor It does not feature Hyper Threading, has a cock frequency of 266 GHz and a 512 KB big second eve cache 22 Performance monitoring If the execution time of a program varies, this parameter is not usabe for making comparisons or assertions anymore Therefore counting the events corresponding to the anaysed topic as cache or branch behaviour is the ony way to draw exact concusions Many processors therefore provide mode specific registers (MSR) that serve this purpose, the so caed performance monitoring registers There exist severa utiities to access these registers as VTune for the Inte processors, PAPI, OProfie and many others VTune is imited to Inte processors and to some specia Linux distributions ony PAPI, the Performance API, is an interface to the performance counter hardware for different patforms It mainy consists of a ibrary that needs to be inked to your software, wherefore it cannot be used in kerne modues OProfie is a profier under GNU GPL, that supports even anaysis of kerne modues As the name suggests it is a profier that is not caed directy to read the counters Instead it is caed through an interrupt reeased by an overfow of a performance counter Under some circumstances it is unavoidabe to work in kerne mode For exampe if physica addresses are needed ( cache fooder) or if you intent to anayse a RTAI modue That is why ibraries as PAPI cannot be used Profiing software is either too inexact or produces too much overhead (if the counter is setup to overfow at a minima count) wherefore I decided to impement my own functions to access the performance monitoring hardware - for the time being, imited to the Pentium II/III, IV [Int04] describes how to configure the MSRs to count certain events Some hepfu functions as setting up registers and starting, stopping and resetting counters were written as inine assemby macros in a header fie So it is possibe to count for exampe cache misses and branch (mis)predictions within a kerne modue without instaing additiona software or patching the kerne (as it is necessary for PAPI) Direct manipuation of the MSRs is aowed in priviege eve 0 ony, so our performance monitoring macros are imited to kerne code, but because our intention was to anayse kerne modues (they are in conjunction with RTAI the simpest possibiity to run code without interruption) this does not face a probem 2 Tobias John

13 2 BACKGROUND KNOWLEDGE 23 Branch prediction In order to get more instructions competed faster, modern microprocessors are deepy pipeined That means that instructions do not wait for the previous ones to compete before their execution begins A probem with this approach arises, due to conditiona branches If a conditiona branch is encountered and the resut of the condition has not yet been cacuated, the microprocessor does not know whether to take the branch or not The appied soution is branch prediction - the processor decides whether to take the branch or not and starts executing the instructions at the predicted branch target Finay when the resut of the branch condition is known it is obvious whether the branch has been predicted correcty or not In the atter case the aready executed instructions of the wrong (mispredicted) path have to be thrown out (fushing the pipeine) which is particuary expensive with deepy pipeined processors The deay for a mispredicted branch is usuay equivaent to the pipeine depth There are two main types of branch prediction: static and dynamic one Static prediction assumes that the majority of backwards pointing branches occur in the context of repetitive oops, where the condition is used to determine whether the oop is to be repeated or not Therefore backward branches are predicted to be taken, whereas forward pointing branches are predicted not to be taken STRONGLY WEAKLY act not taken Figure 1: bimoda counter - used for dynamic prediction act taken act not taken st snt actuay taken act not taken actuay taken actuay taken wt wnt act not taken NOT TAKEN TAKEN The abiity to dynamicay predict the direction and the target of branches is based on the branch instruction s inear address, using the branch target buffer (BTB) If there is no vaid entry in the BTB for the recent branch then static prediction wi be used to decide which path to take A widey used scheme is the foowing: There is a branch history register (BHR) with a width of N H bits that stores the outcomes of the ast N H conditiona branches Either there is one goba BHR, based on the correations between subsequent branches in the whoe program fow or severa oca ones that are based on the correation between subsequent executions of the same branch Some bits of the BHR together with the branch instruction s address index a tabe of n-bit saturating counters (usuay n = 2 strongy taken (st), weaky taken (wt), weaky not taken (wnt), strongy not taken (snt)) that are updated when a jump condition is evauated and predict the branch outcome Tobias John 3

14 2 BACKGROUND KNOWLEDGE Figure 1 shows such a bimoda counter and figure 2 is a scheme of the described architecture [Sto01], [MMK04] branch address BTB target address next seq addr BHR BPT goba oca f Figure 2: usua branch prediction architecture Unfortunatey, some processor manufacturers provide amost no information on the exact predictor impementation, athough there are severa advises on how to optimize your code ( [And]) 4 Tobias John

15 2 BACKGROUND KNOWLEDGE 24 CPU caches Most common CPU cache architectures use severa eves of set-associative caches to reduce the number of cyces the CPU has to wait for data from memory A ine is the smaest unit that can be transferred between a cache and main memory If a ine can be stored in any pace in a cache it is caed fuy associative as opposed to direct mapped, where each ine can ony be stored in a specific pace Set associativity is the compromise where a ine can be cached in W different ocations Those W ines form a set and because the address of the data within one ine can be stored in any of that W paces the address ony indexes the set and not the ine within the set The appropriate ine within a set is found through comparison Cache set S-1 { L Main Mem set 2 set 1 set 0 { { { }{{}}{{} }{{} way 0 way 1 way W -1 Figure 3: simpified structure of a cache If data, that is accessed by the CPU, is found in a cache, it is caed a cache hit, otherwise a miss Severa strategies exist to decide what to do with an accessed ine The most common are: Tabe 1: common caching strategies ine in cache? strategy description hit miss write through write back write aocate write no-aocate the cache is updated and the next eve of memory too (either another (sower) cache or main mem) ony the cache is updated (the ine is marked dirty and written back ater) the next eve of memory is updated and the ine is fetched into the cache the next eve of memory is updated the ine is not fetched When data is going to be cached and no free space is avaiabe (That does not mean that the whoe cache is fied! Because a ine can be stored in ony W paces of one set it is of no hep if there are other free sets avaiabe) another ine has to be purged out of the cache - which one, the repacement poicy decides Based on [Mi04] the most common repacement poicies are Random, Round-Robin, LRU (Least Recenty Used) and plru (pseudo LRU) Among the plru agorithms are plrut (pseudo LRU tree based) and plrum (pseudo LRU Tobias John 5

16 2 BACKGROUND KNOWLEDGE based on MRU (Most Recenty Used) bits) Strict LRU is quite costy and compex because the whoe history of the W cache ines in a set have to be saved and updated on an access Therefore the pseudo LRU mechanisms try to reduce the number of bits needed to store the plru information and the time to manage them, through approximation of the rea agorithm The tree based plru poicy (plrut) uses a binary tree to point to the assumed east recenty used cache ine When accessing a ine and thereby making it the most recent one, a the tree bits that ay on the path to that ine are updated to point to the opposing direction (0 1) The eft coumn of figure 4 shows an exampe to a 4-way cache that utiises the plrut strategy To keep it simpe, ony one set of the cache is shown and the tree bits keep r / for right / eft instead of 0 / 1 The back arrows denote the path down the tree that points to the (pseudo) east recenty used cache ine The contents of a cache ine are marked through owercase aphabetic characters and bod etters symboize modified or updated entries Each of the three pictures in a coumn of figure 4 shows the cache ine and the plru bits after the given instruction (eg read(b)) has been executed The red boxed bits are those that have been updated With respect to the tree bits in step 0 of figure 4 the third cache ine is the east recenty used When data b is read, which is aready in the cache, a tree bits that are on the path down to that ine have to be updated to point to the opposite direction In step 2 a new ine is read and an other entry has to be freed to store k in the cache Because the tree bits point to the second way (data c) this ine is repaced and after updating the tree, data d is the LRU ine plrut plrum 0) r r MRU bits tree bits {}}{ n o o n e c b d e b c d 1) read(b) r r n n o n e c b d e b c d 2) read(k) r r o o n o e k b d e b k d Figure 4: tree based LRU and MRU based LRU repacement poicies 6 Tobias John

17 2 BACKGROUND KNOWLEDGE Another approximation of the LRU poicy is the MRU bits based (plrum) method Every ine in a set has a MRU bit that shows whether that ine has recenty been used ( n : new) or not ( o : od) An accessed ine is marked new and ony od ones are repaced If the ast od ine of a set is repaced and marked as new a other S 1 MRU bits are updated to od otherwise a bits woud mark their corresponding ines as new and none coud be repaced Step 2 in figure 4 shows that speciaity The other steps are sef-expanatory Necessary information to describe a cache is: name data parameter associativity (the number of ways) 2 w i w i cache size 2 s i s i ine ength 2 i = {1, 2} and refers to L1, L2 It is common, athough not necessary, that the ine ength is identica for both cache eves, therefore this document covers the case where 1 = 2 = A set-associative cache has 2 s w entries: size associativity ineength = That means it has an address width a i = s i w i 2s 2 w 2 = 2s w The east significant bits of an address are used to determine the corresponding byte within a cache ine and the foowing a 1 bits are used to index the corresponding set of L1 When data is fetched from memory aways a whoe cache ine is read That is why the rightmost bits are of no importance in indexing a cache Often reduced addresses are used to ease understanding and so the east significant bits are negected An exampe for a 2-way cache of 32 B size and 4 B ine size is shown in figure 5: way 0 way Q R S T { set I J K L I J K L A B C D Q R S T A B C D 1 Byte { Figure 5: exampe of a 2-way cache Tobias John 7

18 2 BACKGROUND KNOWLEDGE Tabe 2: parameters of the exempary 2-way cache ways 2 w = 1 ine size 4 = 2 cache size 32 s = 5 addrwidth a = s w = 2 Because ony a i bits of an address are used to index the corresponding cache entry, an address can be denoted as foows: 31 0 x y z a 1 a 2 Figure 6: highighting the bits used to index a cache The z part of an address (which is a 1 bits wide) indexes L1 and part yz (a 2 bits) indexes L2 241 INTELs Cache Architecture This section describes the cache architecture of INTELs PII/III and the Pentium IV The PII/III beong to the same famiy caed P6 famiy and therefore share the same architecture Some of the given parameters vary among different editions of a processor The bod vaues refer to the examined processor L1 L2 Data Instruction Trace unified Tabe 3: overview of Intes caching architecture PII/III 16 KB 8 KB 4 ways 4 ways 32 B ine ength 64 B ine ength 8/16 KB 4 ways 32 B ine ength Trace Cache 12 Kµops 8 ways 128/256/512/1024/2048 KB 256/512 KB 4 ways 8 ways PIV 32 B ine ength 64 B ine ength, 128 B sector The information about the size, associativity and ine ength can be gathered through evauating the bits returned by the cpuid instruction This is what the too cpuid from [sou] does [Hay] states that the P6 famiy has an 8-way second eve cache but [Int04] knows that the associativity is ony 4, which is acknowedged by the cpuid information! A ot of detaied information about processors can be found on [san], however some points are 8 Tobias John

19 2 BACKGROUND KNOWLEDGE missing or even are incorrect, eg the statement that the L1-Cache of the PII/III uses a LRU repacement strategy ( section 333) A genera description of the caching architecture and their configuration gives Inte s System Programming Guide ([Int04]) The Netburst microarchitecture ([H + 01]) of the P4 features a 128 B sectored cache that fetches 2 adjacent cache ines on a miss from memory and a hardware prefetcher that monitors access patterns and prefetches data automaticay Both features can be disabed through the IA32 MISC ENABLE MSR (bit 9 and/or 19 on address 0x1a0) When enabed one cache miss initiates two 64 B memory reads, to fi two adjacent cache ines (sector based read), however writes are aways ine based and ony write the modified 64 B back into main memory [Int] states that L1 uses a write through poicy and that A caches use a pseudo-lru repacement agorithm Yet which plru agorithm is not mentioned! Tobias John 9

20 3 ANALYSIS CONCEPTS 3 Anaysis concepts 31 Performance Monitoring - IA-32 architecture Mode Specific Registers (MSRs) can be read and written in priviege eve 0 ony, using the rdmsr, wrmsr instructions, where registers EDX:EAX hod the content that is either read from or written to the MSR addressed by ECX The Time Stamp Counter, avaiabe since the Pentium processor, is incremented every CPU cyce and can be read using the rdtsc instruction Here again, the 64 bit content is avaiabe in EDX:EAX The RDTSC instruction is not seriaizing or ordered with other instructions Thus, it does not necessariy wait unti a previous instructions have been executed before reading the counter ([Int04, p 15-26]) The instructions rdmsr, wrmsr however, are seriaizing and can be executed before the TSC is read 311 P6 famiy The P6 famiy - to which the PII, PIII beong - utiizes two 40 bit counter with corresponding event seect and controing registers: Tabe 4: performance monitoring MSRs - P6 famiy Name Address Meaning PerfEvtSe0 0x186 event seection MSR 0 PerfEvtSe1 0x187 event seection MSR 1 PerfCtr0 0xC1 counter 0 PerfCtr1 0xC2 counter 1 Because the event seection registers are 32 bit wide, it is enough to modify ony EAX when writing to these MSRs: // set the ow part of PerfEvtSe MSR to va #define set_esr(msr, va)\ asm voatie (\ "xor %%edx, %%edx\n\t"\ "wrmsr"\ :\ : "c" (msr), "a" (va)\ : "edx") The counters can be started and stopped by setting / cearing the ENABLE fag in the PerfEvt- Se0 register: // start counting #define start_counting()\ asm voatie (\ "mov $0x186, %%ecx\n\t"\ "rdmsr\n\t"\ "bts $22, %%eax\n\t"\ "wrmsr"\ :\ :\ : "eax", "ecx", "edx") // stop counting #define stop_counting()\ asm voatie (\ "mov $0x186, %%ecx\n\t"\ "rdmsr\n\t"\ "btr $22, %%eax\n\t"\ "wrmsr"\ :\ :\ : "eax", "ecx", "edx") Appendix A3 of [Int04] ists performance monitoring events of the P6 famiy These are simpe to configure and are not mentioned here 10 Tobias John

21 3 ANALYSIS CONCEPTS 312 PIV, Xeon The Pentium IV features 18 performance counters and configuration registers Tabe 15-2 on page of [Int04] shows the association of counters, CCCRs (Counter Configuration Contro Reg) and ESCRs (Event Seection Contro Reg) and appendix A1 gives information on countabe events Tabe 5 ists ony a few of them The coumns ESCR, CCCR, Counter show the addresses of the MSRs, whereas EvSe and CSe give the vaues that have to be in the EVENTSELECT fied of the ESCR and in the ESCR-SELECT fied of the CCCR register If severa CCCR and counter addresses are given, then one of them has to be chosen The configuration and contro registers are 64 bit wide, however the upper 32 bit are reserved, so that ony the ower part (EAX) is modified Bit 16 and 17 sha aways be set, so this is done when setting up a CCCR: // set the ow part of CounterConfigurationContro MSR to va // set bit 16,17 (must be set) #define set_cccr(cccr, va)\ asm voatie (\ "xor %%edx, %%edx\n\t"\ "or $(0b11<<16), %%eax\n\t"\ "wrmsr"\ :\ : "c" (cccr), "a" (va)\ : "edx") Counters are started and stopped through bit 12 of the corresponding CCCR Counting non-seep CPU cyces Chapter of [Int04] describes how to count non-seep cock ticks Non-Seep Cockticks - Measures cock cyces in which the specified physica processor is not in a seep mode or in a power-saving state 1 seect one of the 18 counters and its corresponding ESCR, CCCR ( tabe 15-2 of [Int04]) 2 set EvSe to anything other than no event: 0x01 3 enabe threshod comparison: set compare bit in CCCR (bit 18) 4 set threshod (bit 20-23) to 15 and set compement fag in CCCR (bit 19) // cnt non-seep cyces #define CYCLE_ESCR 0x3a6 #define CYCLE_CCCR 0x368 #define CYCLE_CNTR 0x308 // ---- set up counting of non-seep cyces ---- set_escr(cycle_escr, ESCR_OS (0x01<<ESCR_EVS_SHIFT)); set_cccr(cycle_cccr, CCCR_CMP CCCR_CPL 0xf<<CCCR_THRESH_SHIFT); Tobias John 11

22 3 ANALYSIS CONCEPTS Tabe 5: PIV - MSR configuration of seected performance monitoring events event ESCR CCCR Counter Event ESCR Event Seect Seect Mask predicted / mispred branches 0x3cc 0x3cd 0x36c 0x36d 0x370 0x36e 0x36f 0x371 0x30c 0x30d 0x310 0x30e 0x30f 0x311 0x06 0x TM TP NM NP T : Taken N: Not-Taken P : Predicted M: Mispredicted cache misses 0x3cc 0x3cd 0x36c 0x36d 0x370 0x36e 0x36f 0x371 0x30c 0x30d 0x310 0x30e 0x30f 0x311 0x09 0x05 MSR 0xef1: set bits 24, 0 (L1 miss), 1 (L2 miss) MSR 0x3f2: set bit 0 µops retired 0x3b8 0x3b9 0x36c 0x36d 0x370 0x36e 0x36f 0x371 0x30c 0x30d 0x310 0x30e 0x30f 0x311 0x01 0x04 Bit 0: bogus Bit 1: non-bogus FSB data activity IOQ aocation 0x3a2 0x360 0x300 0x361 0x301 0x17 0x06 0x362 0x302 0x3a3 0x363 0x303 0x3a2 0x360 0x300 0x361 0x301 0x03 0x06 0x362 0x302 0x3a3 0x363 0x303 0: drive data onto bus 1: read data 2: other processors reset bits 3,4,5 0-4: 0b : read 6: write 7:UC 8:WC 9:WT 10:WP 11:WB 13: own 14: other proc 14: prefetch 12 Tobias John

23 3 ANALYSIS CONCEPTS 32 Branch prediction [MMK04] describes possibiities to determine the organization of branch predictors, eg to expore the width N H of the BHR and the number of bits that are used to index the branch target buffer (BTB) With this information it is easy to achieve both, the best and worst case in a branch prediction benchmark The strategies presented in [MMK04] were adapted to run as RTAI kerne modues and were extended by a test whether a oca or a goba history component is used, which seemed to be more simpe and easy to understand than the one given in the paper As we, the agorithms were extended to take the cache behaviour into account ( sec 512, p 40) 321 Branch History (shift-)register (BHR) If the BHR has a width of N H bits it can store the ast N H outcomes (T: Taken / N: Not taken) Any further branches wi override previous ones and the index function wi seect a wrong entry in the BTB - a misprediction is ikey to occur The outcomes of a branch that is taken ony every mod th iteration wi fit in the BHR as ong as mod < mod and amost no mispredictions wi be counted Yet, for mod mod the MPR (MisPrediction Ratio) wi raise, because every mod th branch is mispredicted mov $ITER, %ecx mov $MOD, %ebx # number of iterations (outer oop) # moduo parameter again: xor %edx, %edx # cear edx mov %ecx, %eax div %ebx test %edx, %edx jz 0 cc 0: dec %ecx jnz again # eax=(int)(eax/ebx), edx=moduo(eax,ebx) # spy branch # do sth Figure 7: BHR benchmark At this point it has to be distinguished between oca and goba BHRs For a oca component the number of history bits is N H = mod 2, because the history refers to the spy branch ony but a goba register saves the outcomes of the surrounding oop too, so N H = 2 (mod 2 ) The reason for subtracting 2 is given in an exampe with a goba BHR of width N H = 6: N H = 6 means that the ast 6 outcomes can be saved and because it is a goba register ony every second bit of the register is reserved for the spy branch (the other one is for the outer oop jnz again), therefore a history pattern of ength 3 shoud fit in the register and one of ength 4 shoud not fit and cause mispredictions: Tobias John 13

24 3 ANALYSIS CONCEPTS BHR i T T T 1 T T N T T N T 2 T T N T N T T N T N T 3 T N T N T N N T N T N T 4 T N T N T T N T N T T T 5 T N T T T N N T T T N T 6 T T T N T N T T N T N T 7 T N T N T N N T N T N T 8 T N T N T T = Taken unique because of T map to their own BTB entry = Taken Figure 8: BHR pattern of ength 4 As it can be seen in figure 8, a history pattern of ength 4 can sti be predicted correcty because the address of the spy branch instruction together with the history register content is unique for every taken spy branch and therefore refers to the same BTB index However a pattern of ength 1 2 N H+2 = mod = 5 causes every 5th branch to be mispredicted, due to a non-unique BHR content for the taken spy branches Through a variation of mod it is possibe to find mod which is either an indication to a oca mod 2 bit BHR or a goba 2 (mod 2 ) bit BHR The moduo parameter mod can be given as mod=<num> option when oading the kerne modue that executes the given code above (fig 7) BHR i T T T 1 T T N T T N T 2 T T N T N T T N T N T 3 T N T N T N N T N T N T 4 T N T N T N N T N T N T 5 T N T N T T N T N T T T 6 T N T T T N N T T T N T 7 T T T N T N T T N T N T 8 T N T N T N N T N T N T 9 T N T N T N N T N T N T 10 T N T N T T = wnt snt = snt wnt unique because of T map to their own BTB entry = wnt snt = snt wnt Figure 9: BHR pattern of ength 5 To verify which kind of register - oca or goba - is used, I decided to execute a test with two of these moduo-branches, with the same moduo parameter If oca history registers are used, then both branches have their own and the MPR wi raise at the same moduo parameter as in the test with just one spy branch However, if a goba register is used, then both branches have to share it and infuence each other, wherefore the MPR raises at a ower moduo parameter (at a shorter history pattern) 14 Tobias John

25 3 ANALYSIS CONCEPTS 322 Branch Target Buffer (BTB) When describing how to anayse the branch prediction architecture, [MMK04] assumes the number of BTB entries is known Yet it is not difficut to obtain that information through some tests based on the agorithm expained in the foowing section, therefore the hints on how to measure N BT B are given afterwards (p 19) As described when introducing caches (sec 24, p 5), addresses are spit up into different parts to indicate which bits are used for what purpose Because ony one buffer is avaiabe, ony one part, namey the z part is used to address it and the y and x portions are combined to y The foowing figure shows an address used to index the BTB: 31 0 y z a 1 Figure 10: address used to index the BTB ony the z part of the address is used to index the buffer If 2 consecutive addresses map to the same set, then the east significant bits of an address are not used to index the entry Starting with an exampe to a BTB that s parameters are a known: It is a buffer with 512 entries, 4 ways (that means it has 128 sets) and an unused part that is = 4 bits wide Because 128 sets have to be indexed the z part has to hod a 1 = 7 bits Addresses y i z j 0 to y i z j 15 map to the same set, because the east significant bits are not used for indexing When executing 512 branch instructions with varying the distance between consecutive branch instruction s addresses, severa phenomens can be observed: (512 branch addresses shoud fit in a buffer with 512 entries) For a distance of 2 bytes, 8 consecutive branch addresses (y i z j 0, y i z j 2,, y i z j 12, y i z j 14) map to the same set Yet, there are just 4 ways to store them, so the ast 4 branch addresses wi override the previous 4 In the next iteration of the outer oop, the addresses can not be found in the BTB and dynamic prediction is unavaiabe, therefore every branch wi be staticay predicted If the branches are coded to be conditiona forward-jumps that are aways taken, then static prediction wi aways fai For a distance of 4 bytes, 4 consecutive addresses (y i z j 0, y i z j 4, y i z j 8, y i z j 12) map to the same BTB entry, so that every way is fied and a 512 addresses are stored in the BTB Therefore no mispredictions wi occur ( fig 11) Tobias John 15

26 3 ANALYSIS CONCEPTS dist = 4 [y 0 0 ] 0: jump cond 1 [y 0 4 ] 1: jump cond 2 [y 0 8 ] [y 0 12] [y 1 0 ] [y 1 4 ] [y127 0 ] [y127 4 ] [y127 8 ] [y12712] 127 y y y127 0 y 0 4 y 1 4 y127 4 y 0 8 y 1 8 y127 8 y 0 12 y 1 12 y12712 Figure 11: conditiona branches at a distance of 4 bytes A distance of 8 bytes wi aso fit in the buffer because a group of addresses y i z j 0, y i, z j 8 needs ony 2 ways and an 16 byte-distance is no probem too ( fig 12) [y0 0 0] [y0 1 0] [y0 2 0] [y01260] [y01270] [y11270] [y21270] [y31270] 127 y y y01270 y1 0 0 y1 1 0 y11270 y2 0 0 y2 1 0 y21270 y3 0 0 y3 1 0 y31270 Figure 12: conditiona branches at a distance of 16 bytes However, addresses with distances greater than 16 wi not fit in the BTB because some cache sets remain unused (marked as free in figure 13): 16 Tobias John

27 3 ANALYSIS CONCEPTS [y0 0 0] [y0 2 0] [y0 4 0] [y01260] [y1 0 0] [y11260] [y71260] y3 0 0 y7 0 0 y2 0 0 free y6 0 0 y3 2 0 y1 0 0 free y7 2 0 y5 0 0 y2 2 0 y0 0 0 free y6 2 0 y4 0 0 y1 2 0 free y5 2 0 y31260 y0 2 0 y71260 y4 2 0 y21260 y61260 free y11260 y51260 free y01260 y41260 free free Figure 13: conditiona branches at a distance of 32 bytes If one address is y i z j 0 the next foowing address for a distance of 32 bytes woud be y i z j+2 0 and the set with the index z j+1 wi remain unused, so that the addresses cannot fit into the buffer So for distances greater than 16 the misprediction ratio (MPR) wi aways be high The concusion is that the unused = 4 bits of an address ead to a maximum fitting distance of D = 16 and that the sum of 3 fitting distances resuts from the 4-way associativity The 4 bits of do not refer to an associativity of 4! If = 3 then there woud be 3 fitting distances, too Yet the highest possibe distance was not 16 but 8 From this exampe it can be generay derived that: 1 f fitting distances ead to an associativity of W = 2 F 1 w = F 1 2 the highest fitting distance D eads to = og 2 (D) 3 N BT B entries at an associativity of W form S = N BT B W sets, so that the z part must contain S addresses, that means: ( ) NBT B a 1 = og 2 (S) = og 2 W The benchmark aows to vary the distance between subsequent conditiona jump instructions (jcc) and the number of these Aso, any conditiona branch is directed forward, so that static prediction wi mispredict them A probem faces the surrounding oop, that aows repetition of the whoe jump scenario so that the infuence of inaccuracy of the performance counters can be reduced Tobias John 17

28 3 ANALYSIS CONCEPTS again: xor %ecx, %ecx mov $10, %eax cmp $1000, %ecx j 0 jmp fin 0: cmp $15, %eax j 1 0: j 2 distance D D The eipsis symboize any assember instructions that are used to fi the distance D The size of the unconditiona jmp instructions depends on the jump distance For distances ess than 128 B no more than 2 B are needed, instead of 5 B for greater distances However the distance can not easiy be cacuated because the distance itsef depends on the size of the unconditiona jump To circumvent the probem, avoid combinations?: inc %ecx with jmp again <number of branches> <distance> 128 fin: Or at east be aware that the first distance D might not be of correct size Figure 14: structure of the BTB benchmark Figures 15 and 16 show a code exampe for a microbenchmark with a distance D = 32 and a number of B = 3 conditiona branches The instructions reay executed in both exampes are the same, except that figure 15 uses forward and figure 16 backward branches 18 Tobias John

29 3 ANALYSIS CONCEPTS Figure 15: code snippet of the BTB microbenchmark with forward branches asm voatie ( "xor %%ecx, %%ecx\n\t" "mov $10, %%eax\n\t" "again: cmp $100000, %%ecx\n\t" "j 0\n\t" "jmp fin\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "0: cmp $15, %%eax\n\t" "j 1\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "1: j 2\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "2: inc %%ecx\n\t" "jmp again\n\t" "fin:" ::: "eax", "ecx" ); Figure 16: code snippet of the BTB microbenchmark with backward branches asm voatie ( "xor %%ecx, %%ecx\n\t" "mov $10, %%eax\n\t" "jmp again\n\t" "0: inc %%ecx\n\t" "jmp again\n\t" "1: j 0\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "cc\n\t" "cc\n\t" "2: cmp $15, %%eax\n\t" "j 1\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "mov $10, %%eax\n\t" "cc\n\t" "cc\n\t" "again: cmp $100000, %%ecx\n\t" "j 2" ::: "eax", "ecx" ); Number of BTB entries To use that benchmark it is necessary to know the number of BTB entries Either this information can be found in some documentation or it is gained with the benchmark itsef: Start at the smaest possibe distance D = 2 and a sma number of branches, eg 32 A If the MPR is ow, increase the number of branches unti the MPR is high The highest number of branches that did not cause a high MPR is the number of BTB entries N BT B B If the MPR is high, then increase the distance D unti the MPR gets ow (eave the number of branches unchanged!) When a distance with a ow MPR is found, continue with step A Tobias John 19

30 3 ANALYSIS CONCEPTS 33 Caching Legend of used variabes {a, b, } cache ine {A 1, B 1, } address range that fis one L1 cache way {A 2, B 2, } address range that fis one L2 cache way {A 0, A 1, } an address range that fis L1/L2 competey (here the number does not refer to L1/L2, it just distinguishes different addr ranges) eve i cache A 0 = [A i,, D i ] A i B i C i D i a b c d i = {1, 2} Figure 17: cache variabes 331 Adjacent cache ine prefetch - PIV To test the sectored cache ine fi and its disabing I wrote a short program that aocates a continuous memory bock as big as the second eve cache (512 KB) and accessed (read) ony every second ine within this bock If the adjacent cache ine prefetch is enabed (defaut), a addresses of the memory area shoud be cached and cause no L2 misses on a second read, but if this feature is disabed ony haf the addresses wi be avaiabe in L2 Because it is not known at which address the memory bock starts - at the first byte of a sector or at the second, the test is executed twice The first time the memory is accessed starting at an offset of 0 B, the second time at an offset of 1 B 332 Caching strategy of the L1 cache write-aocate/write-no-aocate Simpe to test is the usage of write-aocate/write-no-aocate because if L1 hods data that has been written to, the cache utiizes a write-aocate poicy So the foowing steps have to be performed: 1 invaidate caches (at east L1), so that no data is cached wbinvd 2 write to as many addresses as fit into L1 write(a 0 ) 3 read these addresses and count the L1 misses ( n miss ) read(a 0 ) n miss if the misses are approximatey as high as the number of read addresses, write-noaocate is used 20 Tobias John

31 3 ANALYSIS CONCEPTS if there are just a few misses, the data has been stored in L1 = write-aocate write-through/write-back Write-through/write-back are cache hit poicies, so the data, the test is working on, has to be in L1 If write-though is used then any write is performed in L1 and in L2 too, as opposed to write-back, where ony the data in L1 is updated If L1 is fied with data that has been written to and new addresses are oaded, then the modified data has to be purged out of L1 In the case of a write-through L1 cache the od, modified data is aready resident in L2 and can immediatey be overwritten, whereas a write-back L1 cache has to first write back the data into L2, before the new can be oaded The time needed for oading the new data shoud be onger in the atter case The structure of the benchmark is as foows: 1 fi L1 twice through reading the addresses A 0 for the first fi and A 1 for the second read(a 0 ) read(a 1 ) 2 write to the addresses A 1 aready in L1 (we are examining a write hit poicy) if L1 uses write-back then L2 is not updated write(a 1 ) 3 read addresses A 0 (these have to be oaded from L2 and push out the modified data A 1 ) and take the time ( t mod ) and count the L2 accesses ( n mod ) read(a 0 ) t mod, n mod 4 repeat step 1 (fi L1 twice) as a resut addresses A 1 are cached in L1 read(a 0 ) read(a 1 ) 5 repeat step 3 (read addresses A 0 ) time t unmod L2 accesses n unmod because the data has not been modified, this time, it can be purged regardess of the appied poicy and t unmod, n unmod correspond to t w-through, n w-through read(a 0 ) t unmod, n unmod if the time is onger and the count is bigger in the case of the modified data then a write-back poicy is used for L1 } t mod > t unmod = t w-through write-back n mod > n unmod = n w-through Tobias John 21

32 3 ANALYSIS CONCEPTS 333 Repacement strategy When there were some documents with information to base my foundings on caching strategy, there is amost none (and if, not extensive) information on repacement poicies As far as I found out, the Inte manuas ony cover the Netburst microarchitecture when stating A caches use a pseudo-lru (east recenty used) repacement agorithm However, it is never mentioned which plru strategy is appied The statement of [Sea00] to the Pentium II is more cear because more simpe: [Int, p 1-19] L1 uses a 4-way set associative mapping which divides the 512 ines into 128 sets of 4 cache ines Each of these sets is reay a east recenty used (LRU) ist Because it seems that Inte makes use of LRU based strategies the benchmarks are aimed in that direction As can be seen in figure 4 the plrum agorithm has some disadvantages compared to plrut: 1 it needs more plru bits namey one per way in each set N bits, m = W S whereas plrut needs 1 bit ess per set N bits, t = (W 1) S 2 the moment the ast od entry of a set is repaced, marked as new and a other entries are marked od, the history of these ines is ost see figure 18 0) n n n o a b c LRU MRU a b c 1) read(d) o o o n a b c d a b c d 2) read(c) o o n n a b c d a b d c 3) read(b) o n n n a b c d a d c b 4) read(a) n o o o a b c d d c b a Figure 18: plrum history oss 22 Tobias John

33 3 ANALYSIS CONCEPTS Figure 18 shows quite ceary that the MRU-based pseudo LRU poicy is an approximation to the strict LRU agorithm After ines a, b, c, d have been read, to be oaded in the L1 cache, they are read in the reverse order (c, b, a) to make a the most and d the east recenty used entry (step 4: LRU stack is d, c, b, a) That means, when repacing ines, d shoud be the first, c, b the next and a the ast However we assumed that the cache ways are fied from eft to right (step 0, 1), therefore the od ines wi be repaced in the order b, c, d! Foowing these thoughts my assumption was that if a pseudo LRU strategy was used, the tree based agorithm woud be preferred To anayse the underying poicy one coud fi the cache, reoad some ways to make these addresses the most recent ones and afterwards oad any new way to purge an od one - which one, sha be the indication for the used agorithm The detaied structure ooks ike: 1 read as many addresses as fit into L1 PII/III/IV have a 4-way L1 cache, so addresses A 1, B 1, C 1, D 1 have to be oaded read(a 1 ) read(d 1 ) 2 reoad (read) some od ways to make them the MRU ones read({a 1,, D 1 }) 3 oad (read) a new way which overrides another one read(e 1 ) 4 read the od addresses and count the L1 misses for each way read(a 1 ) n A1 read(d 1 ) n D1 the way with the most misses is the one been repaced If the second eve cache is that arge, that one of its ways can hod more ines than fit into L1, it is sure that the reoading of one L2 way reay reads from and updates the ines in L2, not ony in L1 a 2 a 1 + w 1 If this condition is met, the presented agorithm can be used to anayse the repacement strategy of L2, too Tobias John 23

34 3 ANALYSIS CONCEPTS L1 L2 A31127 B31127 C31127 D31127 D280 D28127 D29127 D30127 D31127 D280 D290 D300 D310 A00 B00 C00 D00 Figure 19: sizing reations L1 - L2 based on PII If ways have been oaded in the order A 2, B 2, C 2, D 2 and afterwards way C 2 sha be reoaded, it is sure that none of the addresses C00 - C31127 sti reside in L1, because they have been purged when oading D 2 To obtain severa hints on the appied repacement poicy I decided to vary the reoading of the 3 odest ways A i, B i, C i Resuting possibiities are: %/A i /B i /A i, B i /C i /A i, C i /B i, C i /A i, B i, C i After carrying out the first experiments, I reaized that there are different possibiities how a cache with a plrut poicy can be oaded: Either the tree bits are used and the empty cache is fied in tree order, or more simpe, the cache fis the first (eg from eft to right) free entry and ony uses the tree bits, when a entries of a set are fied Figure 20 shows a 4-way cache set that uses a plrut poicy with tree based fiing of empty entries, whereas fig 21 presents a fiing of the first free entry Once again, updated tree bits are boxed red and back arrows symboize the path through the tree Every step shows the cache set after the given instruction has been executed Step 0 shows the fresh set after the cache has been invaidated Steps 1 to 4 present the fiing and steps 5, 6 expain how the pseudo-lru agorithm can be tricked to repace an entry other than the east recenty used 24 Tobias John

35 3 ANALYSIS CONCEPTS 0) 0) 1) read(a) r 1) read(a) r r r a a 2) read(b) 2) read(b) r r r a b a b 3) read(c) r 3) read(c) r r a c b a b c 4) read(d) 4) read(d) a c b d a b c d 5) read(c) r 5) read(b) r a c b d a b c d 6) read(e) 6) read(e) r r a c e d a b e d Figure 20: tree based fi Figure 21: sequentia fi In steps 0 to 3 of drawing 21 the tree bits do not have arrows indicating the path to the LRU ine to stress that this path is not evauated as ong as there are free entries Tabe 6 faces the resuts of varying the ways to be reoaded between tree based and the sequentia fiing The eftmost coumn gives the combinations of the 3 odest ways to be reoaded Coumns 2 to 5 show the cache ways that shoud be repaced Lightbue backgrounded rows are those which have a different resut depending on the fiing method Tobias John 25

36 3 ANALYSIS CONCEPTS Tabe 6: cache ways expected to be repaced through a plrut strategy 4 ways reoad tree based fi sequentia fi none (%) A A A B C B A C A, B C C C B A A, C B B B, C D A A, B, C D A 8 ways tree based fi sequentia fi A A B E A E C E B E B E D E D E How the pseudo LRU tree based repacement strategy is appied on a 8 times associative cache is iustrated in figure 22 The drawing is to be understood as the others of this kind before This benchmark anaysed which ways of a cache are being repaced when reading a new way not yet cached To obtain severa hints on the underying poicy the number and order of reoaded ways A i,, C i coud be changed to infuence the plru history Yet there is sti another simpe method to test the repacement strategy: First the cache is fied with addresses A 0 Afterwards one new way is read and it is checked which od way it repaces Then the cache is fied again with A 0 but this time two new ways are oaded and it is noted which two od ways are repaced by them This method is repeated unti as many new ways are oaded as fit into the cache, that means unti a od ways have been purged With that incrementa repacing of one to W ways of a cache it can be observed which ways the pseudo LRU agorithm seects as east recenty used and that heps to identify the used agorithm The structure of this benchmark is as foows: 1 read as many addresses as fit into L1 PII/III/IV have a 4-way L1 cache, so addresses A 1, B 1, C 1, D 1 have to be oaded read(a 1 ) read(d 1 ) 2 oad (read) one new way which overrides another one read(e 1 ) 3 read the od addresses and count the L1 misses for each way read(a 1 ) n A1 read(d 1 ) n D1 26 Tobias John

37 3 ANALYSIS CONCEPTS 0) 1) read(a) r r r a 2) read(b) r r r r 3) read(c) a r b r r r r a c b 4) read(d) r r r r a c b d 5) read(e) r r r r r a e c b d 6) read(f,g,h) a e c g b f d h Figure 22: exampe of a 8-way cache with a plrut poicy and tree based fiing Tobias John 27

38 3 ANALYSIS CONCEPTS the way with the most misses is the one been repaced: X 1 4 repeat step 1 read(a 1 ) read(d 1 ) 5 oad (read) two new ways which override another two 6 repeat step 3 read(e 1 ) read(d 1 ) read(a 1 ) n A1 read(d 1 ) n D1 the ways with the most misses are the ones being repaced: X 1, X 2 One of the two repaced ways is that, been repaced in step 3, so the other one hods new information 9 repeat step 3 the ways with the most misses are those being repaced: X 1, X 2, X 3 12 repeat step 3 the ways with the most misses are those being repaced: X 1, X 2, X 3, X 4 X k = {A 1, B 1, C 1, D 1 }, k = {1, 2, 3, 4} As expained before ( fig 19, p 24), this test can be appied to L2 too Ony address ranges have to be adapted: A 2,, D 2 /A 2,, H 2 3 have to be read to fi L2 and E 2,, H 2 /I 2,, P 2 3 are those new ways to purge the od ones 3 4-way/8-way L2 28 Tobias John

39 4 WORST CASE ON CACHING 4 Worst case on caching Because access to caches is much faster than access to main memory, caches may greaty improve performance However if data is not found in cache it has to be retrieved from memory and even worse if the cache is aready fied with modified data, which has to be written back to memory before new data can be oaded into it, it needs much more time than a singe oad from main memory 41 Cache fooding I [LH] describes the worst case when working with a 2-eve memory cache architecture and how to achieve that case Conditions to that so caed doube purge configuration are: * 2-eve cache architecture * at east a 2-way L1 * write-back strategy * a strict LRU cache ine substitution mechanism * L2 is at east a 1 a 2 times bigger than L1 (a 1, a 2 are the address widths of L1/L2) 411 Doube Purge Scenario Mem L2 x a y m z i A L1 y m z i B x b y m z i B z i A y n z i C x c y n z i C x d y n z i D Figure 23: cache configuration, soid arrows indicate mapping reations, access to D resuts in doube purge case The uppercase aphabetic characters (A,B,C,D) in figure 23 denote a whoe cache ine and the soid arrows indicate the mapping reations from memory to L2 and from L2 to L1 The cache ines A, B, C contain modified data Cache ine D maps to the entry z i of L1 which is aready fied with A, so for caching D in L1, ine A has to be purged to L2 However L2 cannot hod A because modified ine B occupies the corresponding entry, so A is written-through to memory Tobias John 29

COS 318: Operating Systems. Virtual Memory Design Issues: Paging and Caching. Jaswinder Pal Singh Computer Science Department Princeton University

COS 318: Operating Systems. Virtual Memory Design Issues: Paging and Caching. Jaswinder Pal Singh Computer Science Department Princeton University COS 318: Operating Systems Virtua Memory Design Issues: Paging and Caching Jaswinder Pa Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Virtua Memory:

More information

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges Specia Edition Using Microsoft Exce 2000 - Lesson 3 - Seecting and Naming Ces and.. Page 1 of 8 [Figures are not incuded in this sampe chapter] Specia Edition Using Microsoft Exce 2000-3 - Seecting and

More information

MCSE Training Guide: Windows Architecture and Memory

MCSE Training Guide: Windows Architecture and Memory MCSE Training Guide: Windows 95 -- Ch 2 -- Architecture and Memory Page 1 of 13 MCSE Training Guide: Windows 95-2 - Architecture and Memory This chapter wi hep you prepare for the exam by covering the

More information

As Michi Henning and Steve Vinoski showed 1, calling a remote

As Michi Henning and Steve Vinoski showed 1, calling a remote Reducing CORBA Ca Latency by Caching and Prefetching Bernd Brügge and Christoph Vismeier Technische Universität München Method ca atency is a major probem in approaches based on object-oriented middeware

More information

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01 Page 1 of 15 Chapter 9 Chapter 9: Deveoping the Logica Data Mode The information requirements and business rues provide the information to produce the entities, attributes, and reationships in ogica mode.

More information

Infinity Connect Web App Customization Guide

Infinity Connect Web App Customization Guide Infinity Connect Web App Customization Guide Contents Introduction 1 Hosting the customized Web App 2 Customizing the appication 3 More information 8 Introduction The Infinity Connect Web App is incuded

More information

Sample of a training manual for a software tool

Sample of a training manual for a software tool Sampe of a training manua for a software too We use FogBugz for tracking bugs discovered in RAPPID. I wrote this manua as a training too for instructing the programmers and engineers in the use of FogBugz.

More information

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program?

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program? Intro to Programming & C++ Unit 1 Sections 1.1-3 and 2.1-10, 2.12-13, 2.15-17 CS 1428 Spring 2018 Ji Seaman 1.1 Why Program? Computer programmabe machine designed to foow instructions Program a set of

More information

ECEn 528 Prof. Archibald Lab: Dynamic Scheduling Part A: due Nov. 6, 2018 Part B: due Nov. 13, 2018

ECEn 528 Prof. Archibald Lab: Dynamic Scheduling Part A: due Nov. 6, 2018 Part B: due Nov. 13, 2018 ECEn 528 Prof. Archibad Lab: Dynamic Scheduing Part A: due Nov. 6, 2018 Part B: due Nov. 13, 2018 Overview This ab's purpose is to expore issues invoved in the design of out-of-order issue processors.

More information

A Petrel Plugin for Surface Modeling

A Petrel Plugin for Surface Modeling A Petre Pugin for Surface Modeing R. M. Hassanpour, S. H. Derakhshan and C. V. Deutsch Structure and thickness uncertainty are important components of any uncertainty study. The exact ocations of the geoogica

More information

Mobile App Recommendation: Maximize the Total App Downloads

Mobile App Recommendation: Maximize the Total App Downloads Mobie App Recommendation: Maximize the Tota App Downoads Zhuohua Chen Schoo of Economics and Management Tsinghua University chenzhh3.12@sem.tsinghua.edu.cn Yinghui (Catherine) Yang Graduate Schoo of Management

More information

Register Allocation. Consider the following assignment statement: x = (a*b)+((c*d)+(e*f)); In posfix notation: ab*cd*ef*++x

Register Allocation. Consider the following assignment statement: x = (a*b)+((c*d)+(e*f)); In posfix notation: ab*cd*ef*++x Register Aocation Consider the foowing assignment statement: x = (a*b)+((c*d)+(e*f)); In posfix notation: ab*cd*ef*++x Assume that two registers are avaiabe. Starting from the eft a compier woud generate

More information

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory 0 th Word Congress on Structura and Mutidiscipinary Optimization May 9 -, 03, Orando, Forida, USA A Design Method for Optima Truss Structures with Certain Redundancy Based on Combinatoria Rigidity Theory

More information

Self-Control Cyclic Access with Time Division - A MAC Proposal for The HFC System

Self-Control Cyclic Access with Time Division - A MAC Proposal for The HFC System Sef-Contro Cycic Access with Time Division - A MAC Proposa for The HFC System S.M. Jiang, Danny H.K. Tsang, Samue T. Chanson Hong Kong University of Science & Technoogy Cear Water Bay, Kowoon, Hong Kong

More information

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Scheduling

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Scheduling CSE120 Principes of Operating Systems Prof Yuanyuan (YY) Zhou Scheduing Announcement Homework 2 due on October 25th Project 1 due on October 26th 2 CSE 120 Scheduing and Deadock Scheduing Overview In discussing

More information

Load Balancing by MPLS in Differentiated Services Networks

Load Balancing by MPLS in Differentiated Services Networks Load Baancing by MPLS in Differentiated Services Networks Riikka Susitaiva, Jorma Virtamo, and Samui Aato Networking Laboratory, Hesinki University of Technoogy P.O.Box 3000, FIN-02015 HUT, Finand {riikka.susitaiva,

More information

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Hardware Components Illustrated

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Hardware Components Illustrated Intro to Programming & C++ Unit 1 Sections 1.1-3 and 2.1-10, 2.12-13, 2.15-17 CS 1428 Fa 2017 Ji Seaman 1.1 Why Program? Computer programmabe machine designed to foow instructions Program instructions

More information

A Memory Grouping Method for Sharing Memory BIST Logic

A Memory Grouping Method for Sharing Memory BIST Logic A Memory Grouping Method for Sharing Memory BIST Logic Masahide Miyazai, Tomoazu Yoneda, and Hideo Fuiwara Graduate Schoo of Information Science, Nara Institute of Science and Technoogy (NAIST), 8916-5

More information

ECE 172 Digital Systems. Chapter 5 Uniprocessor Data Cache. Herbert G. Mayer, PSU Status 6/10/2018

ECE 172 Digital Systems. Chapter 5 Uniprocessor Data Cache. Herbert G. Mayer, PSU Status 6/10/2018 ECE 172 Digita Systems Chapter 5 Uniprocessor Data Cache Herbert G. Mayer, PSU Status 6/10/2018 1 Syabus UP Caches Cache Design Parameters Effective Time t eff Cache Performance Parameters Repacement Poicies

More information

Bridge Talk Release Notes for Meeting Exchange 5.0

Bridge Talk Release Notes for Meeting Exchange 5.0 Bridge Tak Reease Notes for Meeting Exchange 5.0 This document ists new product features, issues resoved since the previous reease, and current operationa issues. New Features This section provides a brief

More information

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm A Comparison of a Second-Order versus a Fourth- Order Lapacian Operator in the Mutigrid Agorithm Kaushik Datta (kdatta@cs.berkeey.edu Math Project May 9, 003 Abstract In this paper, the mutigrid agorithm

More information

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS A C Finch K J Mackenzie G J Basdon G Symonds Raca-Redac Ltd Newtown Tewkesbury Gos Engand ABSTRACT The introduction of fine-ine technoogies to printed

More information

Meeting Exchange 4.1 Service Pack 2 Release Notes for the S6200/S6800 Servers

Meeting Exchange 4.1 Service Pack 2 Release Notes for the S6200/S6800 Servers Meeting Exchange 4.1 Service Pack 2 Reease Notes for the S6200/S6800 Servers The Meeting Exchange S6200/S6800 Media Servers are SIP-based voice and web conferencing soutions that extend Avaya s conferencing

More information

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code Further Optimization of the Decoding Method for Shortened Binary Cycic Fire Code Ch. Nanda Kishore Heosoft (India) Private Limited 8-2-703, Road No-12 Banjara His, Hyderabad, INDIA Phone: +91-040-3378222

More information

Neural Network Enhancement of the Los Alamos Force Deployment Estimator

Neural Network Enhancement of the Los Alamos Force Deployment Estimator Missouri University of Science and Technoogy Schoars' Mine Eectrica and Computer Engineering Facuty Research & Creative Works Eectrica and Computer Engineering 1-1-1994 Neura Network Enhancement of the

More information

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm Outine Parae Numerica Agorithms Chapter 8 Prof. Michae T. Heath Department of Computer Science University of Iinois at Urbana-Champaign CS 554 / CSE 512 1 2 3 4 Trianguar Matrices Michae T. Heath Parae

More information

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Advanced Memory Management

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Advanced Memory Management CSE120 Principes of Operating Systems Prof Yuanyuan (YY) Zhou Advanced Memory Management Advanced Functionaity Now we re going to ook at some advanced functionaity that the OS can provide appications using

More information

3.1 The cin Object. Expressions & I/O. Console Input. Example program using cin. Unit 2. Sections 2.14, , 5.1, CS 1428 Spring 2018

3.1 The cin Object. Expressions & I/O. Console Input. Example program using cin. Unit 2. Sections 2.14, , 5.1, CS 1428 Spring 2018 Expressions & I/O Unit 2 Sections 2.14, 3.1-10, 5.1, 5.11 CS 1428 Spring 2018 Ji Seaman 1 3.1 The cin Object cin: short for consoe input a stream object: represents the contents of the screen that are

More information

Arithmetic Coding. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering National Taiwan University

Arithmetic Coding. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering National Taiwan University Arithmetic Coding Prof. Ja-Ling Wu Department of Computer Science and Information Engineering Nationa Taiwan University F(X) Shannon-Fano-Eias Coding W..o.g. we can take X={,,,m}. Assume p()>0 for a. The

More information

Windows NT, Terminal Server and Citrix MetaFrame Terminal Server Architecture

Windows NT, Terminal Server and Citrix MetaFrame Terminal Server Architecture Windows NT, Termina Server and Citrix MetaFrame - CH 3 - Termina Server Architect.. Page 1 of 13 [Figures are not incuded in this sampe chapter] Windows NT, Termina Server and Citrix MetaFrame - 3 - Termina

More information

Concurrent programming: From theory to practice. Concurrent Algorithms 2016 Tudor David

Concurrent programming: From theory to practice. Concurrent Algorithms 2016 Tudor David oncurrent programming: From theory to practice oncurrent Agorithms 2016 Tudor David From theory to practice Theoretica (design) Practica (design) Practica (impementation) 2 From theory to practice Theoretica

More information

CSE120 Principles of Operating Systems. Architecture Support for OS

CSE120 Principles of Operating Systems. Architecture Support for OS CSE120 Principes of Operating Systems Architecture Support for OS Why are you sti here? You shoud run away from my CSE120! 2 CSE 120 Architectura Support Announcement Have you visited the web page? http://cseweb.ucsd.edu/casses/fa18/cse120-a/

More information

Functions. 6.1 Modular Programming. 6.2 Defining and Calling Functions. Gaddis: 6.1-5,7-10,13,15-16 and 7.7

Functions. 6.1 Modular Programming. 6.2 Defining and Calling Functions. Gaddis: 6.1-5,7-10,13,15-16 and 7.7 Functions Unit 6 Gaddis: 6.1-5,7-10,13,15-16 and 7.7 CS 1428 Spring 2018 Ji Seaman 6.1 Moduar Programming Moduar programming: breaking a program up into smaer, manageabe components (modues) Function: a

More information

Dynamic Symbolic Execution of Distributed Concurrent Objects

Dynamic Symbolic Execution of Distributed Concurrent Objects Dynamic Symboic Execution of Distributed Concurrent Objects Andreas Griesmayer 1, Bernhard Aichernig 1,2, Einar Broch Johnsen 3, and Rudof Schatte 1,2 1 Internationa Institute for Software Technoogy, United

More information

Searching, Sorting & Analysis

Searching, Sorting & Analysis Searching, Sorting & Anaysis Unit 2 Chapter 8 CS 2308 Fa 2018 Ji Seaman 1 Definitions of Search and Sort Search: find a given item in an array, return the index of the item, or -1 if not found. Sort: rearrange

More information

Outerjoins, Constraints, Triggers

Outerjoins, Constraints, Triggers Outerjoins, Constraints, Triggers Lecture #13 Autumn, 2001 Fa, 2001, LRX #13 Outerjoins, Constraints, Triggers HUST,Wuhan,China 358 Outerjoin R S = R S with danging tupes padded with nus and incuded in

More information

Navigating and searching theweb

Navigating and searching theweb Navigating and searching theweb Contents Introduction 3 1 The Word Wide Web 3 2 Navigating the web 4 3 Hyperinks 5 4 Searching the web 7 5 Improving your searches 8 6 Activities 9 6.1 Navigating the web

More information

Modelling and Performance Evaluation of Router Transparent Web cache Mode

Modelling and Performance Evaluation of Router Transparent Web cache Mode Emad Hassan A-Hemiary IJCSET Juy 2012 Vo 2, Issue 7,1316-1320 Modeing and Performance Evauation of Transparent cache Mode Emad Hassan A-Hemiary Network Engineering Department, Coege of Information Engineering,

More information

MCSE TestPrep SQL Server 6.5 Design & Implementation - 3- Data Definition

MCSE TestPrep SQL Server 6.5 Design & Implementation - 3- Data Definition MCSE TestPrep SQL Server 6.5 Design & Impementation - Data Definition Page 1 of 38 [Figures are not incuded in this sampe chapter] MCSE TestPrep SQL Server 6.5 Design & Impementation - 3- Data Definition

More information

The Big Picture WELCOME TO ESIGNAL

The Big Picture WELCOME TO ESIGNAL 2 The Big Picture HERE S SOME GOOD NEWS. You don t have to be a rocket scientist to harness the power of esigna. That s exciting because we re certain that most of you view your PC and esigna as toos for

More information

IBC DOCUMENT PROG007. SA/STA SERIES User's Guide V7.0

IBC DOCUMENT PROG007. SA/STA SERIES User's Guide V7.0 IBC DOCUMENT SA/STA SERIES User's Guide V7.0 Page 2 New Features for Version 7.0 Mutipe Schedues This version of the SA/STA firmware supports mutipe schedues for empoyees. The mutipe schedues are impemented

More information

Real-Time Image Generation with Simultaneous Video Memory Read/Write Access and Fast Physical Addressing

Real-Time Image Generation with Simultaneous Video Memory Read/Write Access and Fast Physical Addressing Rea-Time Image Generation with Simutaneous Video Memory Read/rite Access and Fast Physica Addressing Mountassar Maamoun 1, Bouaem Laichi 2, Abdehaim Benbekacem 3, Daoud Berkani 4 1 Department of Eectronic,

More information

An Introduction to Design Patterns

An Introduction to Design Patterns An Introduction to Design Patterns 1 Definitions A pattern is a recurring soution to a standard probem, in a context. Christopher Aexander, a professor of architecture Why woud what a prof of architecture

More information

Data Management Updates

Data Management Updates Data Management Updates Jenny Darcy Data Management Aiance CRP Meeting, Thursday, November 1st, 2018 Presentation Objectives New staff Update on Ingres (JCCS) conversion project Fina IRB cosure at study

More information

Language Identification for Texts Written in Transliteration

Language Identification for Texts Written in Transliteration Language Identification for Texts Written in Transiteration Andrey Chepovskiy, Sergey Gusev, Margarita Kurbatova Higher Schoo of Economics, Data Anaysis and Artificia Inteigence Department, Pokrovskiy

More information

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Lecture 4: Threads

CSE120 Principles of Operating Systems. Prof Yuanyuan (YY) Zhou Lecture 4: Threads CSE120 Principes of Operating Systems Prof Yuanyuan (YY) Zhou Lecture 4: Threads Announcement Project 0 Due Project 1 out Homework 1 due on Thursday Submit it to Gradescope onine 2 Processes Reca that

More information

BEA WebLogic Server. Release Notes for WebLogic Tuxedo Connector 1.0

BEA WebLogic Server. Release Notes for WebLogic Tuxedo Connector 1.0 BEA WebLogic Server Reease Notes for WebLogic Tuxedo Connector 1.0 BEA WebLogic Tuxedo Connector Reease 1.0 Document Date: June 29, 2001 Copyright Copyright 2001 BEA Systems, Inc. A Rights Reserved. Restricted

More information

Directives & Memory Spaces. Dr. Farid Farahmand Updated: 2/18/2019

Directives & Memory Spaces. Dr. Farid Farahmand Updated: 2/18/2019 Directives & Memory Spaces Dr. Farid Farahmand Updated: 2/18/2019 Memory Types Program Memory Data Memory Stack Interna PIC18 Architecture Data Memory I/O Ports 8 wires 31 x 21 Stack Memory Timers 21 wires

More information

RDF Objects 1. Alex Barnell Information Infrastructure Laboratory HP Laboratories Bristol HPL November 27 th, 2002*

RDF Objects 1. Alex Barnell Information Infrastructure Laboratory HP Laboratories Bristol HPL November 27 th, 2002* RDF Objects 1 Aex Barne Information Infrastructure Laboratory HP Laboratories Bristo HPL-2002-315 November 27 th, 2002* E-mai: Andy_Seaborne@hp.hp.com RDF, semantic web, ontoogy, object-oriented datastructures

More information

Guardian 365 Pro App Guide. For more exciting new products please visit our website: Australia: OWNER S MANUAL

Guardian 365 Pro App Guide. For more exciting new products please visit our website: Australia:   OWNER S MANUAL Guardian 365 Pro App Guide For more exciting new products pease visit our website: Austraia: www.uniden.com.au OWNER S MANUAL Privacy Protection Notice As the device user or data controer, you might coect

More information

Insert the power cord into the AC input socket of your projector, as shown in Figure 1. Connect the other end of the power cord to an AC outlet.

Insert the power cord into the AC input socket of your projector, as shown in Figure 1. Connect the other end of the power cord to an AC outlet. Getting Started This chapter wi expain the set-up and connection procedures for your projector, incuding information pertaining to basic adjustments and interfacing with periphera equipment. Powering Up

More information

Operating Avaya Aura Conferencing

Operating Avaya Aura Conferencing Operating Avaya Aura Conferencing Reease 6.0 June 2011 04-603510 Issue 1 2010 Avaya Inc. A Rights Reserved. Notice Whie reasonabe efforts were made to ensure that the information in this document was compete

More information

May 13, Mark Lutz Boulder, Colorado (303) [work] (303) [home]

May 13, Mark Lutz Boulder, Colorado (303) [work] (303) [home] "Using Python": a Book Preview May 13, 1995 Mark Lutz Bouder, Coorado utz@kapre.com (303) 546-8848 [work] (303) 684-9565 [home] Introduction. This paper is a brief overview of the upcoming Python O'Reiy

More information

Community-Aware Opportunistic Routing in Mobile Social Networks

Community-Aware Opportunistic Routing in Mobile Social Networks IEEE TRANSACTIONS ON COMPUTERS VOL:PP NO:99 YEAR 213 Community-Aware Opportunistic Routing in Mobie Socia Networks Mingjun Xiao, Member, IEEE Jie Wu, Feow, IEEE, and Liusheng Huang, Member, IEEE Abstract

More information

lnput/output (I/O) AND INTERFACING

lnput/output (I/O) AND INTERFACING CHAPTER 7 NPUT/OUTPUT (I/O) AND INTERFACING INTRODUCTION The input/output section, under the contro of the CPU s contro section, aows the computer to communicate with and/or contro other computers, periphera

More information

User s Guide. Eaton Bypass Power Module (BPM) For use with the following: Eaton 9155 UPS (8 15 kva)

User s Guide. Eaton Bypass Power Module (BPM) For use with the following: Eaton 9155 UPS (8 15 kva) Eaton Bypass Power Modue (BPM) User s Guide For use with the foowing: Eaton 9155 UPS (8 15 kva) Eaton 9170+ UPS (3 18 kva) Eaton 9PX Spit-Phase UPS (6 10 kva) Specia Symbos The foowing are exampes of symbos

More information

Chapter 3: Introduction to the Flash Workspace

Chapter 3: Introduction to the Flash Workspace Chapter 3: Introduction to the Fash Workspace Page 1 of 10 Chapter 3: Introduction to the Fash Workspace In This Chapter Features and Functionaity of the Timeine Features and Functionaity of the Stage

More information

Chapter 5: Transactions in Federated Databases

Chapter 5: Transactions in Federated Databases Federated Databases Chapter 5: in Federated Databases Saes R&D Human Resources Kemens Böhm Distributed Data Management: in Federated Databases 1 Kemens Böhm Distributed Data Management: in Federated Databases

More information

Hiding secrete data in compressed images using histogram analysis

Hiding secrete data in compressed images using histogram analysis University of Woongong Research Onine University of Woongong in Dubai - Papers University of Woongong in Dubai 2 iding secrete data in compressed images using histogram anaysis Farhad Keissarian University

More information

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion.

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion. Lecture outine 433-324 Graphics and Interaction Scan Converting Poygons and Lines Department of Computer Science and Software Engineering The Introduction Scan conversion Scan-ine agorithm Edge coherence

More information

On-Chip CNN Accelerator for Image Super-Resolution

On-Chip CNN Accelerator for Image Super-Resolution On-Chip CNN Acceerator for Image Super-Resoution Jung-Woo Chang and Suk-Ju Kang Dept. of Eectronic Engineering, Sogang University, Seou, South Korea {zwzang91, sjkang}@sogang.ac.kr ABSTRACT To impement

More information

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line American J. of Engineering and Appied Sciences 3 (): 5-24, 200 ISSN 94-7020 200 Science Pubications Appication of Inteigence Based Genetic Agorithm for Job Sequencing Probem on Parae Mixed-Mode Assemby

More information

A Fast Block Matching Algorithm Based on the Winner-Update Strategy

A Fast Block Matching Algorithm Based on the Winner-Update Strategy In Proceedings of the Fourth Asian Conference on Computer Vision, Taipei, Taiwan, Jan. 000, Voume, pages 977 98 A Fast Bock Matching Agorithm Based on the Winner-Update Strategy Yong-Sheng Chenyz Yi-Ping

More information

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining Data Mining Cassification: Basic Concepts, Decision Trees, and Mode Evauation Lecture Notes for Chapter 4 Part III Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

Special Edition Using Microsoft Office Sharing Documents Within a Workgroup

Special Edition Using Microsoft Office Sharing Documents Within a Workgroup Specia Edition Using Microsoft Office 2000 - Chapter 7 - Sharing Documents Within a.. Page 1 of 8 [Figures are not incuded in this sampe chapter] Specia Edition Using Microsoft Office 2000-7 - Sharing

More information

DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS

DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS Pave Tchesmedjiev, Peter Vassiev Centre for Biomedica Engineering,

More information

Authorization of a QoS Path based on Generic AAA. Leon Gommans, Cees de Laat, Bas van Oudenaarde, Arie Taal

Authorization of a QoS Path based on Generic AAA. Leon Gommans, Cees de Laat, Bas van Oudenaarde, Arie Taal Abstract Authorization of a QoS Path based on Generic Leon Gommans, Cees de Laat, Bas van Oudenaarde, Arie Taa Advanced Internet Research Group, Department of Computer Science, University of Amsterdam.

More information

For Review Only. CFP: Cooperative Fast Protection. Bin Wu, Pin-Han Ho, Kwan L. Yeung, János Tapolcai and Hussein T. Mouftah

For Review Only. CFP: Cooperative Fast Protection. Bin Wu, Pin-Han Ho, Kwan L. Yeung, János Tapolcai and Hussein T. Mouftah Journa of Lightwave Technoogy Page of CFP: Cooperative Fast Protection Bin Wu, Pin-Han Ho, Kwan L. Yeung, János Tapocai and Hussein T. Mouftah Abstract We introduce a nove protection scheme, caed Cooperative

More information

understood as processors that match AST patterns of the source language and translate them into patterns in the target language.

understood as processors that match AST patterns of the source language and translate them into patterns in the target language. A Basic Compier At a fundamenta eve compiers can be understood as processors that match AST patterns of the source anguage and transate them into patterns in the target anguage. Here we wi ook at a basic

More information

Space-Time Trade-offs.

Space-Time Trade-offs. Space-Time Trade-offs. Chethan Kamath 03.07.2017 1 Motivation An important question in the study of computation is how to best use the registers in a CPU. In most cases, the amount of registers avaiabe

More information

Portable Compiler Optimisation Across Embedded Programs and Microarchitectures using Machine Learning

Portable Compiler Optimisation Across Embedded Programs and Microarchitectures using Machine Learning Portabe Compier Optimisation Across Embedded Programs and Microarchitectures using Machine Learning Christophe Dubach, Timothy M. Jones, Edwin V. Bonia Members of HiPEAC Schoo of Informatics University

More information

Computer Networks. College of Computing. Copyleft 2003~2018

Computer Networks. College of Computing.   Copyleft 2003~2018 Computer Networks Computer Networks Prof. Lin Weiguo Coege of Computing Copyeft 2003~2018 inwei@cuc.edu.cn http://icourse.cuc.edu.cn/computernetworks/ http://tc.cuc.edu.cn Attention The materias beow are

More information

Topology-aware Key Management Schemes for Wireless Multicast

Topology-aware Key Management Schemes for Wireless Multicast Topoogy-aware Key Management Schemes for Wireess Muticast Yan Sun, Wade Trappe,andK.J.RayLiu Department of Eectrica and Computer Engineering, University of Maryand, Coege Park Emai: ysun, kjriu@gue.umd.edu

More information

Priority Queueing for Packets with Two Characteristics

Priority Queueing for Packets with Two Characteristics 1 Priority Queueing for Packets with Two Characteristics Pave Chuprikov, Sergey I. Nikoenko, Aex Davydow, Kiri Kogan Abstract Modern network eements are increasingy required to dea with heterogeneous traffic.

More information

Nearest Neighbor Learning

Nearest Neighbor Learning Nearest Neighbor Learning Cassify based on oca simiarity Ranges from simpe nearest neighbor to case-based and anaogica reasoning Use oca information near the current query instance to decide the cassification

More information

THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM

THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM P. Murray 1, S. Marsha 1, and E.Buinger 2 1 Dept. of Eectronic

More information

A Novel Congestion Control Scheme for Elastic Flows in Network-on-Chip Based on Sum-Rate Optimization

A Novel Congestion Control Scheme for Elastic Flows in Network-on-Chip Based on Sum-Rate Optimization A Nove Congestion Contro Scheme for Eastic Fows in Network-on-Chip Based on Sum-Rate Optimization Mohammad S. Taebi 1, Fahimeh Jafari 1,3, Ahmad Khonsari 2,1, and Mohammad H. Yaghmae 3 1 IPM, Schoo of

More information

Introducing a Target-Based Approach to Rapid Prototyping of ECUs

Introducing a Target-Based Approach to Rapid Prototyping of ECUs Introducing a Target-Based Approach to Rapid Prototyping of ECUs FEBRUARY, 1997 Abstract This paper presents a target-based approach to Rapid Prototyping of Eectronic Contro Units (ECUs). With this approach,

More information

PL/SQL, Embedded SQL. Lecture #14 Autumn, Fall, 2001, LRX

PL/SQL, Embedded SQL. Lecture #14 Autumn, Fall, 2001, LRX PL/SQL, Embedded SQL Lecture #14 Autumn, 2001 Fa, 2001, LRX #14 PL/SQL,Embedded SQL HUST,Wuhan,China 402 PL/SQL Found ony in the Orace SQL processor (sqpus). A compromise between competey procedura programming

More information

Quality of Service Evaluations of Multicast Streaming Protocols *

Quality of Service Evaluations of Multicast Streaming Protocols * Quaity of Service Evauations of Muticast Streaming Protocos Haonan Tan Derek L. Eager Mary. Vernon Hongfei Guo omputer Sciences Department University of Wisconsin-Madison, USA {haonan, vernon, guo}@cs.wisc.edu

More information

Replication of Virtual Network Functions: Optimizing Link Utilization and Resource Costs

Replication of Virtual Network Functions: Optimizing Link Utilization and Resource Costs Repication of Virtua Network Functions: Optimizing Link Utiization and Resource Costs Francisco Carpio, Wogang Bziuk and Admea Jukan Technische Universität Braunschweig, Germany Emai:{f.carpio, w.bziuk,

More information

AgreeYa Solutions. Site Administrator for SharePoint User Guide

AgreeYa Solutions. Site Administrator for SharePoint User Guide AgreeYa Soutions Site Administrator for SharePoint 5.2.4 User Guide 2017 2017 AgreeYa Soutions Inc. A rights reserved. This product is protected by U.S. and internationa copyright and inteectua property

More information

Joint Optimization of Intra- and Inter-Autonomous System Traffic Engineering

Joint Optimization of Intra- and Inter-Autonomous System Traffic Engineering Joint Optimization of Intra- and Inter-Autonomous System Traffic Engineering Kin-Hon Ho, Michae Howarth, Ning Wang, George Pavou and Styianos Georgouas Centre for Communication Systems Research, University

More information

Backing-up Fuzzy Control of a Truck-trailer Equipped with a Kingpin Sliding Mechanism

Backing-up Fuzzy Control of a Truck-trailer Equipped with a Kingpin Sliding Mechanism Backing-up Fuzzy Contro of a Truck-traier Equipped with a Kingpin Siding Mechanism G. Siamantas and S. Manesis Eectrica & Computer Engineering Dept., University of Patras, Patras, Greece gsiama@upatras.gr;stam.manesis@ece.upatras.gr

More information

Quick Start Instructions

Quick Start Instructions Eaton Power Xpert Gateway Minisot (PXGMS) UPS Card Quick Start Instructions Ethernet 10/100 Status DHCP EMP + - CMN 100 Act Ident Power PXGMS UPS Restart TX Setup RX Package Contents Power Xpert Gateway

More information

NCH Software Spin 3D Mesh Converter

NCH Software Spin 3D Mesh Converter NCH Software Spin 3D Mesh Converter This user guide has been created for use with Spin 3D Mesh Converter Version 1.xx NCH Software Technica Support If you have difficuties using Spin 3D Mesh Converter

More information

Chapter 3: KDE Page 1 of 31. Put icons on the desktop to mount and unmount removable disks, such as floppies.

Chapter 3: KDE Page 1 of 31. Put icons on the desktop to mount and unmount removable disks, such as floppies. Chapter 3: KDE Page 1 of 31 Chapter 3: KDE In This Chapter What Is KDE? Instaing KDE Seecting KDE Basic Desktop Eements Running Programs Stopping KDE KDE Capabiities Configuring KDE with the Contro Center

More information

An Adaptive Two-Copy Delayed SR-ARQ for Satellite Channels with Shadowing

An Adaptive Two-Copy Delayed SR-ARQ for Satellite Channels with Shadowing An Adaptive Two-Copy Deayed SR-ARQ for Sateite Channes with Shadowing Jing Zhu, Sumit Roy zhuj@ee.washington.edu Department of Eectrica Engineering, University of Washington Abstract- The paper focuses

More information

Analysis and parallelization strategies for Ruge-Stüben AMG on many-core processors

Analysis and parallelization strategies for Ruge-Stüben AMG on many-core processors Anaysis and paraeization strategies for Ruge-Stüben AMG on many-core processors P. Zaspe Departement Mathematik und Informatik Preprint No. 217-6 Fachbereich Mathematik June 217 Universität Base CH-451

More information

Simba MongoDB ODBC Driver with SQL Connector. Installation and Configuration Guide. Simba Technologies Inc.

Simba MongoDB ODBC Driver with SQL Connector. Installation and Configuration Guide. Simba Technologies Inc. Simba MongoDB ODBC Driver with SQL Instaation and Configuration Guide Simba Technoogies Inc. Version 2.0.1 February 16, 2016 Instaation and Configuration Guide Copyright 2016 Simba Technoogies Inc. A Rights

More information

UnixWare 7 System Administration UnixWare 7 System Configuration

UnixWare 7 System Administration UnixWare 7 System Configuration UnixWare 7 System Administration - CH 3 - UnixWare 7 System Configuration Page 1 of 8 [Figures are not incuded in this sampe chapter] UnixWare 7 System Administration - 3 - UnixWare 7 System Configuration

More information

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER!

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER! [1,2] have, in theory, revoutionized cryptography. Unfortunatey, athough offer many advantages over conventiona and authentication), such cock synchronization in this appication due to the arge operand

More information

Resource Optimization to Provision a Virtual Private Network Using the Hose Model

Resource Optimization to Provision a Virtual Private Network Using the Hose Model Resource Optimization to Provision a Virtua Private Network Using the Hose Mode Monia Ghobadi, Sudhakar Ganti, Ghoamai C. Shoja University of Victoria, Victoria C, Canada V8W 3P6 e-mai: {monia, sganti,

More information

Testing Whether a Set of Code Words Satisfies a Given Set of Constraints *

Testing Whether a Set of Code Words Satisfies a Given Set of Constraints * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 6, 333-346 (010) Testing Whether a Set of Code Words Satisfies a Given Set of Constraints * HSIN-WEN WEI, WAN-CHEN LU, PEI-CHI HUANG, WEI-KUAN SHIH AND MING-YANG

More information

Lecture 3. Jamshaid Yousaf Department of Computer Sciences Cristian college of Business, Arts and Technology Gujranwala.

Lecture 3. Jamshaid Yousaf Department of Computer Sciences Cristian college of Business, Arts and Technology Gujranwala. Lecture 3 Jamshaid Yousaf jamshaid.yousaf@ccbat.com.pk Department of Computer Sciences Cristian coege of Business, Arts and Technoogy Gujranwaa. Overview Importance of text in a mutimedia presentation.

More information

NCH Software Express Delegate

NCH Software Express Delegate NCH Software Express Deegate This user guide has been created for use with Express Deegate Version 4.xx NCH Software Technica Support If you have difficuties using Express Deegate pease read the appicabe

More information

Hour 3: Linux Basics Page 1 of 16

Hour 3: Linux Basics Page 1 of 16 Hour 3: Linux Basics Page 1 of 16 Hour 3: Linux Basics Now that you ve instaed Red Hat Linux, you might wonder what to do next. Whether you re the kind of person who earns by jumping right in and starting

More information

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions 2006 Internationa Joint Conference on Neura Networks Sheraton Vancouver Wa Centre Hote, Vancouver, BC, Canada Juy 16-21, 2006 A New Supervised Custering Agorithm Based on Min-Max Moduar Network with Gaussian-Zero-Crossing

More information

Layout Conscious Approach and Bus Architecture Synthesis for Hardware-Software Co-Design of Systems on Chip Optimized for Speed

Layout Conscious Approach and Bus Architecture Synthesis for Hardware-Software Co-Design of Systems on Chip Optimized for Speed Layout Conscious Approach and Bus Architecture Synthesis for Hardware-Software Co-Design of Systems on Chip Optimized for Speed Nattawut Thepayasuwan, Member, IEEE and Aex Doboi, Member, IEEE Abstract

More information

ECE544: Communication Networks-II, Spring Transport Layer Protocols Sumathi Gopal March 31 st 2006

ECE544: Communication Networks-II, Spring Transport Layer Protocols Sumathi Gopal March 31 st 2006 ECE544: Communication Networks-II, Spring 2006 Transport Layer Protocos Sumathi Gopa March 31 st 2006 Lecture Outine Introduction to end-to-end protocos UDP RTP TCP Programming detais 2 End-To-End Protocos

More information