Allocating Rotating Registers by Scheduling

Size: px
Start display at page:

Download "Allocating Rotating Registers by Scheduling"

Transcription

1 Alloating Rotating Registers by Sheduling Hongbo Rong Hyunhul Park Cheng Wang Youfeng Wu Programming Systems Lab Intel Labs ABSTRACT A rotating alias register file is a salable hardware support to detet memory aliases at run-time. It has been shown that it an enable instrution-level parallelism to be effetively exploited from sequential ode. Yet it is unknown how to apply it to loops. This paper presents an elegant and effiient solution that alloates rotating alias registers for a software-pipelined shedule of a loop. We show that surprisingly, this speifi register alloation problem an be redued to another software pipelining problem, for whih numerous effiient algorithms are available. This is interesting in both theory and pratie. We propose an algorithmi framework to solve the problem. We also present a simple software pipelining algorithm that speially targets register alloation. Comparison with a few other algorithms shows that it usually ahieves the best alloation at the least time ost. Finally, we generalize the approah to alloate generalpurpose (integer/floating-point/prediate) rotating registers by showing that it is also a software pipelining problem. Categories and Subjet Desriptors D.3.4 [PROGRAMMING LANGUAGES]: Proessors Compilers, Optimization General Terms Algorithms, Experimentation, Languages, Theory Keywords Register Alloation, Alias, Sheduling, Software Pipelining 1. INTRODUCTION Memory disambiguation is a fundamental omponent in optimizing ompilers. It disovers unaliased memory operations, i.e., loads or stores that visit different memory loations. These operations may be sheduled to run out of Permission to make digital or hard opies of all or part of this work for personal or lassroom use is granted without fee provided that opies are not made or distributed for profit or ommerial advantage and that opies bear this notie and the full itation on the first page. Copyrights for omponents of this work owned by others than ACM must be honored. Abstrating with redit is permitted. To opy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speifi permission and/or a fee. Request permissions from Permissions@am.org. MICRO 46, Deember 7-11, 213, Davis, CA, USA Copyright 213 ACM /13/12...$15. order for better instrution-level parallelism. Conversely, memory operations that visit the same memory loations are aliased, and annot be sheduled out of order. Memory disambiguation is usually done in a ompiler by alias analysis. Aurate alias analysis, however, is expensive in terms of ompile time. In this paper, we are interested in only memory operations between whih the alias relationship is hard to determine by alias analysis. We all them may-alias operations for brevity. A dynami ompiler optimizes ode at the time the ode runs. So the ompile time is part of the exeution time of the ode, and has to be short. Under this tight ompiletime onstraint, a dynami ompiler an perform only simple alias analysis. Besides, a dynami ompiler often works on binary ode without high-level soure information, and has to be onservative in the analysis. Therefore, for effetive memory disambiguation, hardware support is usually needed by dynami ompilers. For mayalias operations, the ompiler optimistially assumes they never alias with eah other, so as to speulatively shedule them to run out of order; the ompiler sets up the hardware to detet any aliases, if any, when the optimized ode runs. When an alias is deteted, an exeption is thrown and some reovery ode will then be triggered to anel any effets of the failed speulation. Suh a speulation failure, of ourse, is expeted to be rare. There are several kinds of hardware support, inluding the Advaned Load Address Table (ALAT) in Itanium [1], the stati alias register file in Transmeta proessors [7], and more reently, the rotating alias register file (previously alled alias register queue in [19]), and DeAliaser [2]. This paper targets the rotating alias register file. Compared with ALAT and stati alias register file, it has shown advantages in terms of smaller spae requirement for instrution enoding, better salability, and/or less false-positives [19]. Compared with DeAliaser, it allows the heking of a subset of, instead of all, speulative stores, and thus it is more aurate in deteting aliases. A false-positive is an unneessary disovery of alias by the hardware, as to be explained in Setion 2. The lesssalable stati alias register file an be used as a seondary mehanism, and in some orner ases, a stati alias register an be used in plae of a rotating alias register. This is alled a spill, whose purpose will beome lear later in Setion 4.3. The rotating alias register file an be saled to a large size to allow aggressive memory speulation in a large piee of ode. It has enabled ayli ode to be optimized effe- 346

2 tively [19]. How to apply it to loops, however, is a new and open problem. This paper presents an effiient solution that applies rotating alias registers to a software-pipelined shedule of a loop. Software pipelining [1, 13, 18] exploits instrution-level parallelism from a loop by overlapping the exeution of suessive iterations. This optimization has been studied extensively in the past 3 deades, and is widely aknowledged as one of the most effiient optimizations for wide-issue arhitetures, benefiting both VLIW [1, 13] and supersalar [18] mahines. So far, software pipelining is seen only in stati ompilers. However, as dynami ompilers beome inreasingly important today, it is desirable to extend it to dynami ompilers. The benefit is that software pipelining broadens the optimization sope of a dynami ompiler. Dynami ompilation usually optimizes a small piee of ode due to the time onstraint at run-time [7]. For a loop, the sope is usually a loop iteration. Software pipelining enlarges the sope to an entire loop, inluding all its iterations. This potentially permits more aggressive speulation to expose more parallelism. The problem to be solved an be stated as follows: Given a rotating alias register file and a software-pipelined shedule of a loop, how to alloate the rotating alias registers to the memory operations in the shedule, suh that the alloation an detet all aliases between the memory operations when the shedule runs, without any false positive, with the minimal number of spills, and with the minimal number of registers alloated? This paper makes the following ontributions: 1. Problem formulation. It shows that surprisingly, this speifi register alloation problem an be redued to another software pipelining problem, and therefore, any software pipelining algorithm an be used to solve it. This is not only interesting in theory, but also useful in pratie: it learly exposes the nature of the problem, and enables the use of the numerous effiient sheduling algorithms. Traditionally, sheduling and register alloation are solved from different perspetives. The tehniques are fundamentally dissimilar. Sheduling tehniques are based on the dependenes between operations. Dependenes are diretional, e.g., operation a is dependent on operation b, but not vie versa. In ontrast, register alloation tehniques are based on the interferenes between the lifetimes of variables. Interferenes are not diretional, e.g., lifetime a interferes with lifetime b, and vie versa. However, due to the speifi way the rotating alias register file works for alias detetion, rotating alias register alloation an be naturally formulated as a sheduling problem, as we will see later. It is also interesting that in this software pipelining problem, the dependene and resoure onstraints an be unified under a single funtion. 2. Algorithms. Based on the problem formulation, the paper proposes an algorithmi framework. In this framework, the requirements of alias detetion are transformed into dependenes. Then based on the dependenes, software pipelining is performed. Afterwards, some lifetimes are spilled to stati alias registers to address some rare orner ases. Remember the problem has 4 requirements: (1) all aliases an be deteted, (2) there is no false positive, (3) spilling is minimized, and (4) register usage is minimized. The framework ensures that the first two requirements are met. Software pipelining is the key to meet the other two requirements. Any software pipelining approah an be applied to orretly solve the alloation problem, but they may differ in how muh spilling and register usage an be minimized. We propose a simple software pipelining algorithm, alled LCP (Loal Compation followed by Paing), that speially targets this alloation problem with two heuristis, the earliest-start and max R heuristi. 3. Evaluation. The algorithmi framework, the LCP algorithm, along with a few other software pipelining algorithms, have been implemented in Transmeta Code Morphing Software [7]. We have ompared the effetiveness of the algorithms in alloating registers, and shown that LCP usually ahieves the best results and runs the fastest. With LCP, most loops are alloated the minimal number of rotating alias registers, and spills are minimized very well with the earliest-start heuristi. We also show that rotating alias registers are important for performane. They enable the maximal instrutionlevel parallelism to be exploited from loops. 4. Generalization. Now that alloating rotating alias registers is formulated as a software pipelining problem, one annot help wondering the generality of this formulation. Is alloating rotating general (integer/floating-point/prediate) registers also a software pipelining problem? The answer is yes. Thus the problem may be solved in a similar way. We show that from this formulation, we an derive the bin-paking approah of Rau et al [14]. Below we first introdue some bakground knowledge in Setion 2. Then we motivate our rotating alias register alloation approah by an example in Setion 3. Subsequently, we generalize the example to a formal solution in Setion 4 and show experimental results in Setion 5. Then we extend the solution to alloate rotating general registers in Setion 6. Finally, we disuss related work and reah a onlusion. 2. BASIC CONCEPTS In this setion, we briefly introdue software pipelining and alias registers. 2.1 Software Pipelining Software pipelining overlaps the exeution of the iterations of a loop under dependene and resoure onstraints. Modulo sheduling might be the most ommon approah 347

3 of software pipelining. In this paper, we fous only on modulo sheduling, and use the two terms, modulo sheduling and software pipelining, interhangeably. We assume a loop body is a hyperblok [11], where branhes, if any, have been onverted either to prediated ode [3], or to asserts [12]. Formally, let a(i) be operation a in iteration i of the loop before software pipelining, and σ(a, i) be the shedule time of it after software pipelining. A modulo shedule must satisfy the following onstraints: Modulo property: Eah iteration of the loop has the same shedule, and suessive iterations are initiated at a onstant period alled Initiation Interval (). That is, Dependene onstraints: σ(a, i + 1) = σ(a, i) +, i. (1) The shedule must respet every dependene (a b, δ, d), where δ is the lateny and d is the distane. The dependene is a loal dependene if the distane d =, and a loop-arried dependene otherwise. By respeting the dependene, the shedule ensures that a(i) starts at least δ time steps earlier than b(i+d). That is, set b r x a hek (a) An alias register is set by operation b, and then heked by operation a. b d a sr r () One way to avoid the false positive in Fig. 1b is using a stati register sr. Note that a heks both the stati register sr and the rotating register r in Step 3. b d a r 1 r (b) With a rotating alias register file, a set of alias registers are heked. Operation nd d exeute, and they set register r 1 and r, respetively (Step 1 and 2 ). When operation a and exeute, they hek both registers, starting from r (Step 3 and 4 ). b d a r 1 r (d) Another way to avoid the false positive in Fig. 1b is to find a better alloation. Figure 1: Illustrating alias registers. Resoure onstraints: σ(a, i) + δ σ(b, i + d), i. (2) No hardware resoure is used at the same time by two operations. 2.2 Alias Registers Software pipelining may reorder operations. For example, assume there are a pair of may-alias operations, a and b. They may be from the same iteration or different iterations of the loop. Let us say their sequential exeution order is a, b, but after software pipelining, their exeution order beomes b, a. Alias registers are used to detet the alias between them, if any, when the ode runs. In Fig. 1a, first, operation b runs. It sets an alias register r x, where x is the index of the register. By setting the register, it reords the memory range it aesses into the register. Then operation a runs, and it heks r x to see if its own memory range overlaps with b s. If so, an alias is deteted. We say a heks b or a is a heker of b. An operation that only heks other operations but is not heked by any other operation is a pure heker. Usually, a memory operation sets only one alias register, whih is enough to reord its own memory range. The index of the alias register it sets is enoded in it diretly. But it may need to hek multiple alias registers, beause it may alias with multiple memory operations. There are two ases: the stati alias registers to hek are enoded in a bit-mask, where every bit equal to 1 orresponds to a stati register; the rotating alias registers to hek are speified by one and only one single index x, whih means to hek rotating registers r x, r x+1, r x+2,..., r n, where r n is the highest-indexed rotating register. It is important to see that the heking of rotating alias registers is unidiretional, i.e., from a lower-indexed register up toward the highestindexed register. This range of registers must inlude all the registers the operation intends to hek, but it may also inlude other registers that the operation does not intend to hek, whih may trigger false positives. A stati alias register file annot be too big, due to the limited enoding spae to ontain a bit-mask in an operation. A rotating alias register file avoids this limitation, but it an ause false-positives. In Fig. 1b, assume 4 operations that may alias with eah other whose sequential exeution order is a, b,, d, but after software pipelining, whose exeution order beomes b, d, a,. Suppose there are only 2 rotating alias registers. The figure has shown step by step how aliases may be deteted. For example, operations a has been reordered with both nd d, and thus when it exeutes, it needs to hek the registers set by nd d, i.e., r 1 and r. It does so by speifying only r, and the hardware will automatially hek both registers, starting from r, due to its unidiretional heking feature(step 3 ). Similarly, operations has been reordered with d, and thus when it exeutes, it needs to hek the register set by d, i.e., r (Step 4 ). However, due to the unidiretional heking feature, the hardware will also hek r 1, whih is set by b. This hek is unneessary, sine and b have not been reordered at all. This unneessary hek an lead to an unneessary exeption, i.e., a false-positive. There are two ways to avoid this false-positive. One is spilling: let b sets a stati, instead of a rotating, register sr. Unless it is expliitly speified in the bit-mask of an operation, the stati register won t be heked. See Fig. 1. The other way is to have a better alloation like that in Fig. 1d. One an verify that this alloation does not introdue any false-positive. The rotating register file is organized as a irular buffer. 348

4 Starting from one of the registers alled base, the registers are indexed as, 1,..., n. A rotation ation, rotate x, leans register, 1,..., x 1, and sets register x as the new base. Starting from the new base, the registers are re-indexed as, 1,..., n. A memory operation an speify only one rotating alias register with it. It an speify to hek it (whih will also hek the higher-indexed registers), or set it, or both hek and set it. The operation leaves the remaining spae to enode the rotation ation. In exeuting the operation, a rotation ation, if speified, is performed first; then heking of alias registers is done, if speified; and finally, setting of an alias register is done, if speified. The result of a setting is stiky : One a register is set, its ontent does not hange until the register is set again by another operation, or leaned by a rotation ation. 2.3 Terminology Before Setion 6, the paper is on alias registers for memory operations. Other kinds of registers and operations are irrelevant to our problem. Thus to be short, from now on until Setion 6, by register and operation, we will refer to alias register and memory operation by default, unless stated otherwise. 3. A MOTIVATING EXAMPLE In this setion, we motivate our register alloation approah with an example. It is extremely simplified, but is still relevant to onvey the ore information. Fig. 2a shows a loop ontaining a few operations that may alias with eah other, a, nd. A software-pipelined shedule for the loop is illustrated in Fig. 2b with the first few iterations. We ignore the irrelevant details how the shedule was generated. Eah iteration has the same shedule. The iterations are initiated at an interval of 3 time steps ( = 3). Note the reordering of the operations after software pipelining: in Fig. 2b, b(i) is sheduled after both (i) and (i + 1) for any i; also, a(i) is sheduled after b(i), (i) and (i + 1). These are different from the sequential exeution order of the original loop in Fig. 2a. Suh reordering happens beause the ompiler optimistially assumes the operations never alias. However, if they do alias (oasionally during exeution of the software-pipelined shedule), the exeution results of the shedule would be wrong, and some reovery ode must be performed. Rotating alias registers are used to guard the operations for alias detetion at the exeution time. Eah operation produes a lifetime. A rotating register is alloated to the lifetime. When the operation starts exeution, it sets the register. At that time, the lifetime starts. The lifetime is live until the register has been heked by all the hekers of the operation. At that time, the lifetime ends. We all the lifetime produed by any operation o as lifetime o for short. A pure heker does not set any register, and we assume it produes a lifetime whose length is in eah iteration. All the lifetimes need to be plaed into registers in ertain order. For example, let x be the register index of lifetime b(i). In order to detet alias between operation b(i) and (i), and between operation b(i) and (i + 1), x must not be higher than the register indies of lifetime (i) and (i + 1). Then when operation b(i) starts exeution, it heks registers, starting from r x. The unidiretional heking feature of the hardware will guarantee that the registers of lifetime (i) and (i + 1) are heked, and thus detet any alias with them. Suh ordering requirements an be expressed in dependenes between the lifetimes, if we an view eah register index as a time step. For example, to detet alias between operation b(i) and (i), we an build a dependene (b,, ), whih requires lifetime b(i) starts at least time step earlier than lifetime (i) in sheduling terms, i.e., the register index of lifetime b(i) is not higher than the register index of lifetime (i). Similarly, we an build a dependene (b,, 1) to detet alias between operation b(i) and (i + 1). All suh ordering requirements ompose a dependene graph, as shown in Fig. 2. Here every node represents a lifetime. Eah dependene edge is annotated with a dependene distane. The lateny of every dependene is and not shown. Note that there is no dependene to a, whih means operation a is a pure heker and the length of lifetime a(i) is for any i. Based on the dependene graph, we an shedule the lifetimes to registers, and get an alloation in Fig. 2d. In this diagram, the horizontal axis is time, and the vertial axis is register index. We assume there is an unlimited number of registers, and do not onsider rotation: that is a renaming issue that an be addressed afterwards. In the alloation, the bars represent the lifetimes. We have marked the lifetimes with the orresponding operations. Operation a is a pure heker, and its lifetime shares a register with lifetime b in the same loop iteration. We an make the following observations from the alloation in Fig. 2d: 1. The lifetimes produed by the same operation from suessive loop iterations appear along the axis of the registers at a onstant period R. For example, lifetimes b in iteration, 1, and 2 appear at register, 1, 2, respetively. The period R is equal to 1. If we an view eah register index as a time step, then these lifetimes are initiated at a onstant time interval R. This is analogous to the modulo property of modulo sheduling desribed in Setion The alloation respets all the dependenes, i.e., the ordering requirements between the lifetimes. This is analogous to the dependene onstraints of modulo sheduling desribed in Setion 2.1. Lifetime a(i) and b(i) are alloated register i, and lifetime (i) is alloated register i + 1. We an verify that all the ordering requirements are respeted. For example, as required by the dependenes (b,, ) and (b,, 1), the register index of lifetime b(i), i, is not higher than the register indies of lifetime (i) and (i + 1), whih equal i + 1 and i + 2, respetively. To be learer, we have illustrated the two dependenes in Fig. 2d. 3. In the alloation, when two lifetimes are alloated the same register, they annot overlap in time. If every time step is viewed as a resoure, then this means no resoure is over-ommitted. This is analogous to the resoure onstraints of modulo sheduling desribed in Setion

5 for (i=; i<n; i++){ a b } In this setion, we generalize our solution from the motivating example and formally formulate the rotating register alloation problem as a modulo sheduling problem. Based on this formulation, we present an algorithmi framework to solve it, and also propose a software pipelining algorithm, alled LCP, speifially for this alloation problem. (a) A loop 4.1 Iteration Iteration 1 =3 Iteration 2 (b) A software-pipelined shedule a <> <> b <> <1> <1> () The dependene graph of the lifetimes. Iteration 1 Iteration 1 Iteration R=1 Modulo property: r(a, i + 1) = r(a, i) + R, (d) A register alloation. 1* Iteration 2 Iteration 1 Iteration 2 3* 2* b a 1 r(a, i) r(b, i + d) i. (4) That is, lifetime a(i) needs to be plaed in the same register as b(i + d) or in a lower-indexed register than b(i + d). This onstraint an be modeled by a dependene (a b,, d). (e) Register assignment. A means a rotate R ation, and oi means lifetime o is assigned register ri. Figure 2: A Motivating Example Resoure onstraints: If two lifetimes are alloated the same register, they annot overlap in time. For example, register 1 is alloated to lifetime (), b(1), and a(1). We an see () has ended before b(1) starts. Visually, b(1) seems to overlap with a(1) in time. This is not real sine a(1) has a length of, i.e., it does not onsume time at all. In short, the alloation respets 3 onstraints analogous to those of modulo sheduling. It is a modulo shedule of the lifetimes. 4. (3) Dependene onstraints: Suppose a(i) and b(i + d), where d, may alias with eah other. Suppose a(i) is before b(i + d) in the sequential exeution order of the loop, but is after it in the software-pipelined shedule. To make sure any alias between them is deteted, we must let 4* i. Here R is a onstant to be determined during the alloation proess. That is, the lifetimes of an operation from suessive loop iterations appear in a onstant period in registers. Register * Problem Formulation Let us reall modulo sheduling. As we introdued before, for every operation o, modulo sheduling shedules o from suessive loop iterations to time at a onstant period () and assigns o resoures, respeting all dependene onstraints and resoure onstraints. In register alloation, if we view lifetimes as operations, registers as time, and time as resoures, then we an repeat the above statement as follows: for every operation (lifetime) o, the alloation shedules o from suessive loop iterations to time (registers) at a onstant period (R) and assigns o resoures (time steps), respeting all dependene onstraints (the ordering requirement of the lifetimes) and resoure onstraints (two lifetimes in the same register never overlap in time). Therefore, the register alloation is a modulo shedule of the lifetimes. Formally, let r(a, i) be the rotating register alloated to lifetime a(i). The register alloation respets the following onstraints: SOLUTION 4.2 A Unified Expression of Dependene and Resoure Constraints Interestingly, the dependene and resoure onstraints in the above formulation an be enfored in a unified way. Let DIST (a, b) be a funtion returning the set of legal values of r(b, i) r(a, i). When r(b, i) r(a, i) equals any value in this set, all the lifetimes of a and all the lifetimes of b are alloated registers without violating any dependene or resoure onstraints. This funtion an be used in Step 2 of the algorithmi framework, to be introdued in Setion 4.3. Formally, DIST (a, b) = DISTdep (a, b) 35 \ DISTres (a, b), (5)

6 where DIST dep (a, b) and DIST res(a, b) are the sets of legal values required by dependene and resoure onstraints, respetively. To enfore any dependene (a b, δ, d) 1, we need Thus r(a, i) + δ r(b, i + d) (Inequality 2) Therefore DIST dep (a, b) = = r(b, i) + d R (Equation 3). δ d R r(b, i) r(a, i). dependene (a b,δ,d) dependene (b a,δ,d) [δ d R, + ) (, δ + d R] (6) Now onsider resoure onstraints. There are three ases. We denote the legal sets under them as DIST res i, i = 1, 2, 3, respetively. First, if operation a or b is a pure heker, there are no resoure onstraints at all. A pure heker s lifetime from any loop iteration has a length of, i.e., it does not really onsume time. Thus no lifetimes of the two operations an overlap in time. In this ase, r(b, i) r(a, i) an be arbitrary, so DIST res 1(a, b) = (, + ). (7) Seond, if a(i) and b(i+d), d, are not alloated the same register, there are no resoure onstraints, either. Sine this is equivalent to r(a, i) r(b, i + d), d r(b, i) r(a, i) d R, d (Equation 3). In other words, r(b, i) r(a, i) is not a multiple of R. So DIST res 2(a, b) = (, + ) \ R (, + ), (8) where \ is the set differene operation, and R (, + ) is the set of R s multiples. The equation means r(b, i) r(a, i) an be any number exept a multiple of R. Third, if a(i) and b(i + d), for some d, are alloated the same register, i.e., r(b, i) r(a, i) = d R, for some d, (9) then to avoid overlapping, either lifetime a(i) starts after lifetime b(i + d) ends, or lifetime a(i) ends before lifetime b(i + d) starts. That is, or start(a) + i end(b) + (i + d) end(a) + i start(b) + (i + d) where start(o) and end(o) are the start and end time of lifetime o(), for o = a, b. Therefore, start(a) end(b) d or end(a) start(b) d 1 In this paper, δ is always. Here we use δ to be general. Thus or start(a) end(b) R d R (1) end(a) start(b) R d R. (11) Summarizing Formula (9), (1) and (11), we have DIST res 3(a, b) = R (, + ) R, + ) {[ start(a) end(b) (, end(a) start(b) R]} (12) In short, by Equation (7), (8), and (12), { DISTres DIST res(a, b) = 1(a, b) if a or b is a pure heker DIST res 2(a, b) DIST res 3(a, b) otherwise (13) Example 1. We briefly explain the formula with the example in Fig. 2. Let us show how to alulate DIST (b, ). There are two dependenes between nd, i.e., (b,, ) and (b,, 1) (See Fig. 2). They restrit DIST dep (b, ) to be [, + ) [ R, + ) = [, + ), aording to Equation (6). Now onsider the resoure onstraints. Sine neither b nor is a pure heker, we have DIST res(b, ) = DIST res 2(b, ) DISTres 3(b, ), aording to Equation (13). DIST res 2(b, ) inludes all numbers exept R s multiples, aording to Equation (8). DIST res 3(b, ) inludes all R s multiples that are also within set [R, + ) (, 2 R], aording to Equation (12), given = 3 (See Fig. 2b), start(b) = 4, end(b) = 6, start() =, end() = 6 (See Fig. 2d). Together, the dependene and resoure onstraints require that r(, i) r(b, i) must be either within DIST dep (b, ) DIST res 2(b, ), or DIST dep (b, ) DIST res 3(b, ). In the alloation in Fig. 2d, we an see r(, i) r(b, i) = 1, whih is a value from DIST dep (b, ) DIST res 3(b, ): it is a multiple of R = 1 here. Example 2. For the example in Fig. 2, a less effiient but still valid alloation is shown in Fig. 3. Here R = 2. We an see r(, i) r(b, i) = 1 as well, but this time, it is a value from DIST dep (b, ) DIST res 2(b, ): it is not a multiple of R = 2 here. Iteration Iteration 1 Iteration Register R=2 Figure 3: A less effiient alloation with a bigger R for the example in Fig Algorithmi Framework Based on the problem formulation in Setion 4.1, we an find a register alloation by the following steps: 1. Dependene building. 351

7 In this step, we build the dependene graph. The graph is ensured to never have a loal iruit in it. By loal iruit, we refer to a iruit in whih the distane of every dependene edge equals. For every pair of may-alias operations, we build a dependene aording to the dependene onstraints in Setion 4.1. The dependene is added to the dependene graph. Adding suh a dependene will never ause a loal iruit to be formed. Aording to the dependene onstraints (Setion 4.1), a(i) is after b(i+d) in the pipelined shedule, and the dependene is from a to b. If d =, it means that in the pipelined shedule, the dependene is following the reverse exeution order of a single loop iteration. All the dependenes making out of the onstraints are following this same diretion. Thus it is not possible for them to form a loal iruit. For example, in Fig. 2, all the dependenes whose distane equals are in downward diretion and annot form a iruit. Besides the dependenes made from our dependene onstraints in Setion 4.1, there are some other dependenes: before a loop is software pipelined, some other optimizations might have been performed. Just like software pipelining, these optimizations an require ertain ordering between lifetimes in order to detet aliases. These requirements have been passed down from previous ompiler phases to be handled here together. Suh a requirement asks the ompiler to alloate lifetime a(i) to the same register as b(i+d), or to a lowerindexed register than b(i + d). Similar to our dependene onstraints in Setion 4.1, it is transformed into a dependene (a b,, d). Usually, adding it to the dependene graph will not form a loal iruit, with only one exeption: when d =, and a(i) is before b(i + d) in the pipelined shedule 2. This diretion is exatly the opposite to the diretion of the dependenes made out of our dependene onstraints. This exeptional ase may ause a loal iruit to be formed. When a loal iruit exists in the dependene graph, no alloation an respet all the dependenes with rotating registers alone. Sine this ase is very rare, we temporarily ignore suh loal dependenes and do not add them to the dependene graph 3. We will use stati registers to help respet them later. We all those ignored loal dependenes missing loal dependenes. 2. Modulo sheduling. 2 This kind of restrition is alled an anti-onstraint in [19].It is aused by load/store elimination before software pipelining. Unlike a normal dependene, it does not really mean that a(i) should hek b(i+d). Instead, it just wants to make sure that b(i+d) does not hek a(i), to avoid a false positive. For example, to avoid the false-positive in Fig. 1b, disussed in Setion 2, a dependene b an be added, and that would lead to an alloation without any false-positive shown in Fig. 1d. As our solution misses the loal dependenes of this kind, we may have to use stati registers to help avoid a false-positive like that shown in Fig An alternative solution is to allow suh dependenes to be added if they do not really ause any loal iruit to form. In this step, we an apply any modulo sheduling algorithm to shedule the lifetimes to the rotating registers, based on the dependene graph, and onsidering the modulo property and resoure onstraints desribed in Setion 4.1. Modulo sheduling ommonly searhes for a feasible initiation interval within a range. For eah initiation interval R under onsideration, it would shedule lifetimes. For eah lifetime, it ensures that all dependene and resoure onstraints between this lifetime and all the already sheduled lifetimes are respeted. To ensure that, one an use the legal distane alulated by Equation (5), (6), and (13). 3. Removing potential false positives. For eah missing loal dependene a b, hek if it is respeted by the alloation. If not, spill lifetime b(i), i, to a stati alias register, i.e., dealloate the rotating register of b(i) and alloate b(i) a stati register, instead 4. Another kind of false positive is introdued by register reusing during modulo sheduling. When an operation a(i) starts, it may aidentally hek another operation b(j) that it does not intend to hek, where i and j may be arbitrary: lifetime b(j) may be dead, but its register might not be leaned yet and thus a(i) will hek b(j) in effet. In this ase, we also spill lifetime b(j) to a stati alias register. There is no other known soures of false-positives so far. Now that all the ordering requirements have been respeted, the resulting shedule, when it exeutes, will detet all aliases without any false positive. 4. Register assignment. The register alloation we have found assumes infinite number of rotating registers, and it does not rotate the register file. In reality, the number of rotating registers is limited, and we have to rotate the register file periodially in order to lean up some dead lifetimes in the registers, and free the registers for other lifetimes that newly start. We ahieve this purpose by inserting a rotation ation, rotate R, into the software- pipelined shedule of the loop. When the shedule runs, every time steps, R number of dead lifetimes are leaned up and their registers are freed. Fig. 2e shows the register assignment for the alloation in Fig. 2d. Every time steps, a rotate R ation, 4 At this moment, for all i, the same stati register is alloated to lifetime b(i). Later in the ode generation phase (See an example ompile flow in Fig. 5), that register may be renamed to more than one register, so that if lifetime b(i) and b(i+1) overlap in time, the register is renamed suh that b(i) s register is different from b(i + 1) s. This is alled Modulo Variable Expansion (MVE) [1]. The register needs to be expanded to at least len registers, where len is the length of any lifetime b(i). The general-purpose (integer/floatingpoint/prediate) lifetimes are handled exatly the same way in MVE during ode generation, if they do not have rotating register file support. Otherwise, they an also be handled as a software pipelining problem (Setion 6). 352

8 b 1 a Register (a) Without earlieststart heuristi. a 1 Register b (b) With earliest-start heuristi. Figure 4: Illustrating the earliest-start heuristi where R = 1 in this ase, is exeuted first before any operation. After a rotation ation, the lifetimes are simply mapped to the registers aording to their relative positions in the alloation. In Fig. 2e, we have annotated eah lifetime with its orresponding register index. 4.4 A Modulo Sheduling Algorithm Targeting Alias Register Alloation In Step 2 of the algorithmi framework (Setion 4.3), any modulo sheduling algorithm an be applied to alloate the rotating registers. The results are always orret. The same algorithm an be used to address both the traditional software pipelining problem and the new rotating register alloation problem, beause both ases have similar modulo property, and dependene and resoure onstraints. However, the two problems do have one important differene: the optimization objetives. For the alloation problem, the major optimization objetive is to minimize spills: spilling needs stati alias registers, whih are limited due to enoding spae onstraint, as disussed in Setion 2.2. Besides, one stati alias register used in the shedule may be expanded to multiple ones during the next ompiler phase (the ode generation) via Modulo Variable Expansion [1]. Our experiene is that in Transmeta proessors, there are usually few stati alias registers left after all other optimizations have been done and then software pipelining happens, and they an be quikly used up. Thus we should try using the more salable rotating alias registers, whenever possible. The next optimization objetive of alloation is to minimize register usage. However, we an sarifie this objetive for the major objetive, as long as the number of registers alloated does not exeed the number of available registers. That means we do not have to minimize R (the initiation interval). For our motivating example in Fig. 2, if we do not minimize R, we an have a different but still valid alloation shown in Fig. 3. For the traditional software pipelining problem, the optimization objetive is exatly the opposite: to minimize the initiation interval. The smaller the is, the faster the shedule runs. In fat, most of the existing studies on software pipelining, if not all, target how to minimize. With that differene in mind, here we propose a simple modulo sheduling algorithm speifially targeting rotating register alloation. It fits into Step 2 of the algorithmi framework in Setion 4.3. It has two steps: 1. Loal ompation. Shedule the lifetimes based on the resoure onstraints shown in Setion 4.1, and the loal dependenes in the dependene graph built in Step 1 of the algorithmi framework. Loop-arried dependenes in the graph are ignored in this step. This produes a shedule for the lifetimes in a single loop iteration. In this step, any loal sheduling method an be used. Say we use list sheduling, a well-adopted loal sheduling approah. It prioritizes the ready lifetimes into a list. A lifetime beomes ready when all its predeessors, i.e., the lifetimes on whih it depends loally, have been sheduled. For register i, where i starts from register towards +, list sheduling piks up the lifetime with the top priority from the list, and alloates the register to the lifetime. It an ontinue to pik up the other less prioritized lifetimes for this register as well, as long as they do not overlap in time. When there is no lifetime that an be fit into this register, it proeeds to the next register. How to prioritize the lifetimes is a key in this proess. We propose an earliest-start heuristi, whih prioritizes the one with the earliest start time. This heuristi minimizes spills. In Fig. 4, assume there are 2 operations whose exeution order in the pipelined shedule is a, b, and there is a missing loal dependene, whih requires that b should not hek a. The lifetimes are shown in bars in the figure. Lifetime b starts later than a. If we prioritize b, b will be alloated register, and a register 1. Then when operation b starts, it heks registers, starting from its own register. In that way, it will hek register 1, whih is for lifetime a. This is shown by the arrow in Fig. 4a. This hek violates our assumption that b should not hek a. However, if we adopt the earliest-start heuristi, lifetime a will be alloated first to register, and lifetime b is fored to be alloated to register 1. Then n not hek a. See Fig. 4b. With this heuristi, although a missing loal dependene is not added into the dependene graph, it is still potentially handled by the sheduling. This redues the hane of spilling later during Step 3 of the algorithmi framework. The experiments later will show that this heuristi does redue spills. In the ase of a tie when two lifetimes have the same start time, the one with the shorter length is given higher priority, in the hope of paking as many as possible lifetimes into the urrent register. 2. Calulate R. Imagine every loop iteration has the same shedule for its lifetimes. Overlap the shedules of two suessive loop iterations at a onstant pae R, onsidering the ignored loop-arried dependenes, as well as the olletive resoure usage of the overlapping iterations. This produes a more ompat shedule that an reuse registers between iterations. Register reusing, however, may ause false positives, as we disussed in Step 3 of the algorithm framework. To avoid this situation, we propose a max R heuristi: as long as the number of alloated registers does not exeed the number of available registers, use the maximum possible R. This does not affet the per- 353

9 Loop Sheduling Build dependenes between operations Modulo sheduling of operations (with JITSP) Rotating alias register alloation Build dependenes between lifetimes Modulo sheduling of lifetimes (with LCP, RS2, DESP, or JITSP) Remove potential false positives Register assignment Code generation Figure 5: The ompile flow. formane of the loop as the rotating register file has uniform aess lateny for any register. We all this approah Loal Compation followed by Paing, or LCP for short. 4.5 Complexity The algorithmi framework in Setion 4.3 has 4 steps. Let us analyze the omplexity step by step. In step 1, let N be the total number of memory operations in the software-pipelined shedule of a loop. Between a pair of may-alias operations, we may build at most C number of dependene edges, where C is the total number of loop iterations overlapped in the shedule, a onstant. Therefore, we have at most C N 2 edges, whih takes O(N 2 ) time to build. In step 2, there an be numerous modulo sheduling algorithms with various omplexities. For the LCP algorithm we propose, it performs list sheduling, and then alulate R. List sheduling is also a lass of algorithms with variable omplexities. Let E l and E be the number of loal and loop-arried dependene edges, respetively, and M the number of memory units in a proessor. One list sheduling method proposed by Rădulesu and van Gemund [17] takes O(N log(m) + E l ) time in our ase. We san loop-arried dependene edges to ompute R in O(E ) time. In step 3, we san the missing loal dependenes. Experientially, it takes virtually no time, as there are usually no or few suh dependenes, as will be shown in Setion 5. We also san eah pair of lifetimes to see if they inadvertently hek eah other. That takes O(N 2 ) time. The last step takes onstant time. In summary, the time omplexity of the algorithm is about O(N 2 + E ), plus that of list sheduling (if we use LCP algorithm in Step 2), whih may take an additional O(N log(m) + E l ) time. 5. EXPERIMENTS The algorithmi framework (Setion 4.3) for rotating register alloation, the LCP algorithm (Setion 4.4), along with min median mean max # nodes # dependenes # loal dependenes # loop-arried dependenes # missing loal dependenes maxlive Table 1: Charateristis of the dependene graphs of lifetimes %loops Ideal LCP DESP JITSP RS2 #rotating alias registers Figure 6: Cumulative distribution of the number of rotating alias registers min median mean max RS DESP JITSP LCP LCP without Max R heuristi LCP without Max R and earliest-start heuristis Table 2: Number of spilled lifetimes per loop iteration #registers % total alloated - loops maxlive 73.6% % > 5 8.5% (a) The distane of LCP from an ideal alloator #lifetimes spilled % total loops 96.% % >5.6% (b) Spilling in LCP per loop iteration Table 3: Additional statistis on LCP a few other general-purpose software pipelining algorithms inluding RS2 (a variant of Rotation Sheduling [6, 15]), DESP (Deomposed Software Pipelining [4, 9, 2]), and JITSP (Just-In- Software Pipelining, an in-house method), have been implemented in Transmeta Code Morphing Soft- 354

10 ware (CMS) [7] as part of a researh on software pipelining within dynami ompilers. RS2 rotates operations around the loop bak edge until a tight shedule is formed. DESP divides the sheduling proess into two steps to lower the sheduling omplexity: the first step finds a shedule respeting dependene onstraints only, and the seond step respeting only loal dependenes and resoure onstraints. JITSP is an in-house method that improves both RS2 and DESP in many aspets to make the sheduling proess onverge to optimal solutions quikly, and thus makes it feasible for dynami ompilers. The target arhiteture is a VLIW proessor similar to Transmeta Effieon [12], but besides the 14 stati alias registers, it also has a variable-sized rotating alias register file. The rotating alias register file has not been implemented in any ommerial proessor yet, and thus a produt-quality funtional simulator is used in the experiments here. The proessor translates X86 instrutions by the CMS inside into its internal VLIW instrutions, and then runs them. We perform experiments with the loops in SPEC2 benhmarks. The overall ompile flow is shown in Fig. 5. Our software pipelining module has 3 phases: sheduling, rotating alias register alloation, and ode generation. It first shedules the loop to expose parallelism. Then it alloates the rotating registers for the software-pipelined shedule. Finally, it generates ode. In the experiments, the first phase uses JITSP to generate optimal or near-optimal software-pipelined shedules rapidly. In the seond phase, for every pipelined shedule, we apply to it eah of the above algorithms, LCP, RS2, DESP and JITSP, in order to ompare their effetiveness for alloating the rotating registers 5. Note that JITSP is used in both phases, and the majority of its implementation is shared by them. There are only minor neessary differenes between them: Mainly, in the alloation phase, we need to enapsulate a lifetime as an operation, a register as a time step, and a time step as a resoure, in order to reuse the sheduling algorithm. It should be emphasized here that the sheduling and the alias register alloation phase are independent: the alloation phase proesses any shedule generated by any software pipelining algorithm, without favoring any of them. Although our alloation solution is proposed and tested in a speifi ontext, it is essentially independent of any sheduling algorithm, and is feasible for any ompiler, stati or dynami. The effetiveness of an algorithm is evaluated in the following aspets: given the same number of rotating registers, how many loops an be suessfully alloated all the registers they need? How many lifetimes, in a single loop iteration, are spilled afterwards during Step 3 of the algorithmi framework (Setion 4.3)? And how fast is the algorithm? To evaluate how good a register alloation is, we assume there is an ideal alloator, whih alloates exatly maxlive number of registers. MaxLive is the maximum number of lifetimes live simultaneously in a software-pipelined shedule. It is a lower bound of the number of registers to be alloated for the shedule. This is an ideal bound and may or may not be ahievable. Any alloation, even if it is optimal, needs at least maxlive registers. The loser the number of registers alloated is to maxlive, the better the alloation 5 The settings of RS2 are δ = 8 and ρ = 1, and at most 5 rotations are allowed. is. The experiments have two major results: 1. First, the experiments have validated the orretness of our problem formulation and solution. For a piee of X86 ode, our simulator an run the X86 ode and the VLIW instrutions generated by CMS from it, and periodially ompare the memory and register ontents between them. This is known as o-simulation. Any disrepany will be reported. We have not seen any disrepany so far. 2. Seond, the experiments show that LCP is usually the most effetive among all the algorithms. It generates ideal or near-ideal alloations for most loops. It minimizes spills very well, and it is faster than all the other algorithms. In addition, we also show that rotating alias registers are important to enable instrution-level parallelism to be exploited from loops. There are 11,825 software-pipelined loops for whih we perform rotating register alloation. Table 1 haraterizes the dependene graphs of the lifetimes. A graph has 2 to 8 nodes. The median and mean are 6 and 9.64, respetively. Sine the median is less than the mean, the number of nodes skews towards small numbers: the graph tends to have small number of nodes. This skewed distribution appear in the other harateristis as well. For example, a graph an have 1 to 3933 dependene edges, while the median and mean are 7 and 79.2, whih indiates that the graph tends to have small number of dependenes. Among the dependenes, most of them are loop-arried: on average, there are only 11.2 loal dependenes, but 68. loop-arried ones. Their maxima are even more strikingly apart: at most, there are 369 loal dependenes, but an have 3917 loop-arried ones. So in general, there are far less loal dependenes than looparried dependenes. There are usually few missing loal dependenes. Although a graph an have up to 96 missing loal dependenes, the median is, and the mean is.42. Those indiate that it is extremely rare to have any missing loal dependenes. Table 1 also haraterizes the distribution of maxlive. It ranges from 1 to 849, with a median of 3 and a mean of This suggests that usually, the number of lifetimes live simultaneously tends to be small. Fig. 6 shows the umulative distribution of the number of rotating registers. For a given number of rotating registers, it shows the perentage of the software-pipelined loops whose requirement of rotating registers an be met. All the algorithms, exept RS2, have lose-to-ideal results. For example, given 64 rotating registers, the perentages of the loops by an ideal alloator, LCP, DESP, and JITSP are 97.%, 95.9%, 94.1%, and 94.%, respetively. The differenes are small, but LCP is notieably better than the others, exept the ideal alloator. In ontrast, RS2 overs only 77.5% of the loops. The reason why RS2 performs signifiantly worse is that it aggressively shedules lifetimes in order to minimize R, the initiation interval. That is the main target of the generalpurpose software pipelining. As disussed in Setion 4.4, this is not neessarily good for register alloation. We observe from the experiments that it seems to be a general phenomenon: for this speifi register alloation 355

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

This fact makes it difficult to evaluate the cost function to be minimized

This fact makes it difficult to evaluate the cost function to be minimized RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

8 Instruction Selection

8 Instruction Selection 8 Instrution Seletion The IR ode instrutions were designed to do exatly one operation: load/store, add, subtrat, jump, et. The mahine instrutions of a real CPU often perform several of these primitive

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation Prelude COMP 181 Intermediate representations and ode generation November, 009 What is this devie? Large Hadron Collider What is a hadron? Subatomi partile made up of quarks bound by the strong fore What

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Scheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiprocessor

Scheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiprocessor Sheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiproessor Orlando Moreira NXP Semiondutors Researh Eindhoven, Netherlands orlando.moreira@nxp.om Frederio Valente Universidade

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Uncovering Hidden Loop Level Parallelism in Sequential Applications

Uncovering Hidden Loop Level Parallelism in Sequential Applications Unovering Hidden Loop Level Parallelism in Sequential Appliations Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, and Sott Mahlke Advaned Computer Arhiteture Laboratory University of Mihigan, Ann Arbor,

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

splitting tehniques that partition live ranges have been proposed to solve both the spilling problem[5][8] and the assignment problem[8][9]. The parti

splitting tehniques that partition live ranges have been proposed to solve both the spilling problem[5][8] and the assignment problem[8][9]. The parti Load/Store Range Analysis for Global Register Alloation Priyadarshan Kolte and Mary Jean Harrold Department of Computer Siene Clemson University Abstrat Live range splitting tehniques divide the live ranges

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Intra- and Inter-Stream Synchronisation for Stored Multimedia Streams

Intra- and Inter-Stream Synchronisation for Stored Multimedia Streams IEEE International Conferene on Multimedia Computing & Systems, June 17-23, 1996, in Hiroshima, Japan, p 372-381 Intra- and Inter-Stream Synhronisation for Stored Multimedia Streams Ernst Biersak, Werner

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core Announements Your fous should be on the lass projet now Leture 17: Cahing Issues for Multi-ore Proessors This week: status update and meeting A short presentation on: projet desription (problem, importane,

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Compilation Lecture 11a. Register Allocation Noam Rinetzky. Text book: Modern compiler implementation in C Andrew A.

Compilation Lecture 11a. Register Allocation Noam Rinetzky. Text book: Modern compiler implementation in C Andrew A. Compilation 0368-3133 Leture 11a Text book: Modern ompiler implementation in C Andrew A. Appel Register Alloation Noam Rinetzky 1 Registers Dediated memory loations that an be aessed quikly, an have omputations

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

The AMDREL Project in Retrospective

The AMDREL Project in Retrospective The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework EECS 33 There be Dragons here http://ziyang.ees.northwestern.edu/ees33/ Teaher: Offie: Email: Phone: L477 Teh dikrp@northwestern.edu 847 467 2298 Today s material might at first appear diffiult Perhaps

More information

Interconnection Styles

Interconnection Styles Interonnetion tyles oftware Design Following the Export (erver) tyle 2 M1 M4 M5 4 M3 M6 1 3 oftware Design Following the Export (Client) tyle e 2 e M1 M4 M5 4 M3 M6 1 e 3 oftware Design Following the Export

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

An Evaluation of Automatic and Interactive Parallel Programming Tools

An Evaluation of Automatic and Interactive Parallel Programming Tools An Evaluation of Automati and Interative Parallel Programming Tools Doreen Y Cheng Computer Siene Co NASA Ames Researh Center MS 258-6 Moffett Field, CA 9435 Douglas M Pase Formerly at NASA (CSC) Cray

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

C 2 C 3 C 1 M S. f e. e f (3,0) (0,1) (2,0) (-1,1) (1,0) (-1,0) (1,-1) (0,-1) (-2,0) (-3,0) (0,-2)

C 2 C 3 C 1 M S. f e. e f (3,0) (0,1) (2,0) (-1,1) (1,0) (-1,0) (1,-1) (0,-1) (-2,0) (-3,0) (0,-2) SPECIAL ISSUE OF IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION: MULTI-ROBOT SSTEMS, 00 Distributed reonfiguration of hexagonal metamorphi robots Jennifer E. Walter, Jennifer L. Welh, and Nany M. Amato Abstrat

More information

Alleviating DFT cost using testability driven HLS

Alleviating DFT cost using testability driven HLS Alleviating DFT ost using testability driven HLS M.L.Flottes, R.Pires, B.Rouzeyre Laboratoire d Informatique, de Robotique et de Miroéletronique de Montpellier, U.M. CNRS 5506 6 rue Ada, 34392 Montpellier

More information

DECT Module Installation Manual

DECT Module Installation Manual DECT Module Installation Manual Rev. 2.0 This manual desribes the DECT module registration method to the HUB and fan airflow settings. In order for the HUB to ommuniate with a ompatible fan, the DECT module

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

Reducing Runtime Complexity of Long-Running Application Services via Dynamic Profiling and Dynamic Bytecode Adaptation for Improved Quality of Service

Reducing Runtime Complexity of Long-Running Application Services via Dynamic Profiling and Dynamic Bytecode Adaptation for Improved Quality of Service Reduing Runtime Complexity of Long-Running Appliation Servies via Dynami Profiling and Dynami Byteode Adaptation for Improved Quality of Servie ABSTRACT John Bergin Performane Engineering Laboratory University

More information

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded Folding is verse of Unfolding Node A A Folding by N (N=folding fator) Folding A Unfolding by J A A J- Hardware Mapped vs. Time multiplexed l Hardware Mapped vs. Time multiplexed/mirooded FI : y x(n) h

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Robust Dynamic Provable Data Possession

Robust Dynamic Provable Data Possession Robust Dynami Provable Data Possession Bo Chen Reza Curtmola Department of Computer Siene New Jersey Institute of Tehnology Newark, USA Email: b47@njit.edu, rix@njit.edu Abstrat Remote Data Cheking (RDC)

More information

Divide-and-conquer algorithms 1

Divide-and-conquer algorithms 1 * 1 Multipliation Divide-and-onquer algorithms 1 The mathematiian Gauss one notied that although the produt of two omplex numbers seems to! involve four real-number multipliations it an in fat be done

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

OvidSP Quick Reference Card

OvidSP Quick Reference Card OvidSP Quik Referene Card Searh in any of several dynami modes, ombine results, apply limits, use improved researh tools, develop strategies, save searhes, set automati alerts and RSS feeds, share results...

More information

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by Problem Set no.3.a) The ideal low-pass filter is given in the frequeny domain by B ideal ( f ), f f; =, f > f. () And, the (low-pass) Butterworth filter of order m is given in the frequeny domain by B

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

represent = as a finite deimal" either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to

represent = as a finite deimal either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to Sientifi Computing Chapter I Computer Arithmeti Jonathan Goodman Courant Institute of Mathemaial Sienes Last revised January, 00 Introdution One of the many soures of error in sientifi omputing is inexat

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis Design Impliations for Enterprise Storage Systems via Multi-Dimensional Trae Analysis Yanpei Chen, Kiran Srinivasan, Garth Goodson, Randy Katz University of California, Berkeley, NetApp In. {yhen2, randy}@ees.berkeley.edu,

More information

Recommendation Subgraphs for Web Discovery

Recommendation Subgraphs for Web Discovery Reommation Subgraphs for Web Disovery Arda Antikaioglu Department of Mathematis Carnegie Mellon University aantika@andrew.mu.edu R. Ravi Tepper Shool of Business Carnegie Mellon University ravi@mu.edu

More information

Dynamic Backlight Adaptation for Low Power Handheld Devices 1

Dynamic Backlight Adaptation for Low Power Handheld Devices 1 Dynami Baklight Adaptation for ow Power Handheld Devies 1 Sudeep Pasriha, Manev uthra, Shivajit Mohapatra, Nikil Dutt and Nalini Venkatasubramanian 444, Computer Siene Building, Shool of Information &

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Adapting K-Medians to Generate Normalized Cluster Centers

Adapting K-Medians to Generate Normalized Cluster Centers Adapting -Medians to Generate Normalized Cluster Centers Benamin J. Anderson, Deborah S. Gross, David R. Musiant Anna M. Ritz, Thomas G. Smith, Leah E. Steinberg Carleton College andersbe@gmail.om, {dgross,

More information

Tackling IPv6 Address Scalability from the Root

Tackling IPv6 Address Scalability from the Root Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat

More information

Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters

Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters Parallelization and Performane of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters F. Zhang, A. Bilas, A. Dhanantwari, K.N. Plataniotis, R. Abiprojo, and S. Stergiopoulos Dept. of Eletrial

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

Distributed Resource Allocation Strategies for Achieving Quality of Service in Server Clusters

Distributed Resource Allocation Strategies for Achieving Quality of Service in Server Clusters Proeedings of the 45th IEEE Conferene on Deision & Control Manhester Grand Hyatt Hotel an Diego, CA, UA, Deember 13-15, 2006 Distributed Resoure Alloation trategies for Ahieving Quality of ervie in erver

More information

Cluster Centric Fuzzy Modeling

Cluster Centric Fuzzy Modeling 10.1109/TFUZZ.014.300134, IEEE Transations on Fuzzy Systems TFS-013-0379.R1 1 Cluster Centri Fuzzy Modeling Witold Pedryz, Fellow, IEEE, and Hesam Izakian, Student Member, IEEE Abstrat In this study, we

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Type of document: Usebility Checklist

Type of document: Usebility Checklist Projet: JEGraph Type of doument: Usebility Cheklist Author: Max Bryan Version: 1.30 2011 Envidate GmbH Type of Doumet Developer guidelines User guidelines Dutybook Speifiation Programming and testing Test

More information

with respect to the normal in each medium, respectively. The question is: How are θ

with respect to the normal in each medium, respectively. The question is: How are θ Prof. Raghuveer Parthasarathy University of Oregon Physis 35 Winter 8 3 R EFRACTION When light travels from one medium to another, it may hange diretion. This phenomenon familiar whenever we see the bent

More information

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else 3rd International Conferene on Multimedia Tehnolog(ICMT 013) An Effiient Moving Target Traking Strateg Based on OpenCV and CAMShift Theor Dongu Li 1 Abstrat Image movement involved bakground movement and

More information

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Improved Circuit-to-CNF Transformation for SAT-based ATPG Improved Ciruit-to-CNF Transformation for SAT-based ATPG Daniel Tille 1 René Krenz-Bååth 2 Juergen Shloeffel 2 Rolf Drehsler 1 1 Institute of Computer Siene, University of Bremen, 28359 Bremen, Germany

More information

UCSB Math TI-85 Tutorials: Basics

UCSB Math TI-85 Tutorials: Basics 3 UCSB Math TI-85 Tutorials: Basis If your alulator sreen doesn t show anything, try adjusting the ontrast aording to the instrutions on page 3, or page I-3, of the alulator manual You should read the

More information

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller Trajetory Traking Control for A Wheeled Mobile Robot Using Fuzzy Logi Controller K N FARESS 1 M T EL HAGRY 1 A A EL KOSY 2 1 Eletronis researh institute, Cairo, Egypt 2 Faulty of Engineering, Cairo University,

More information

Triangles. Learning Objectives. Pre-Activity

Triangles. Learning Objectives. Pre-Activity Setion 3.2 Pre-tivity Preparation Triangles Geena needs to make sure that the dek she is building is perfetly square to the brae holding the dek in plae. How an she use geometry to ensure that the boards

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW FOREGROUND OBJECT EXTRACTION USING FUZZY C EANS WITH BIT-PLANE SLICING AND OPTICAL FLOW SIVAGAI., REVATHI.T, JEGANATHAN.L 3 APSG, SCSE, VIT University, Chennai, India JRF, DST, Dehi, India. 3 Professor,

More information

The recursive decoupling method for solving tridiagonal linear systems

The recursive decoupling method for solving tridiagonal linear systems Loughborough University Institutional Repository The reursive deoupling method for solving tridiagonal linear systems This item was submitted to Loughborough University's Institutional Repository by the/an

More information