Inuence of Cross-Interferences on Blocked Loops: to know the precise gain brought by blocking. It is even dicult to determine for which problem

Size: px
Start display at page:

Download "Inuence of Cross-Interferences on Blocked Loops: to know the precise gain brought by blocking. It is even dicult to determine for which problem"

Transcription

1 Inuence of Cross-Interferences on Blocke Loops A Case Stuy with Matrix-Vector Multiply CHRISTINE FRICKER INRIA, France an OLIVIER TEMAM an WILLIAM JALBY University of Versailles, France State-of-the art ata locality optimizing algorithms are targete for local memories rather than for cache memories. Recent work on cache interferences seems to inicate that these phenomena can severely aect blocke algorithms cache performance. Because of cache conicts, it is not possible to know the precise gain brought by blocking. It is even icult to etermine for which problem sizes blocking is useful. Computing the actual optimal block size is icult because cache conicts are highly irregular. In this article, we illustrate the issue of precisely evaluating cross-interferences in blocke loops with blocke matrix-vector multiply. Most signicant interference phenomena are capture because unusual parameters such as array base aresses are being consiere. The techniques use allow us to compute the precise improvement ue to blocking an the threshol value of problem parameters for which the blocke loop shoul be preferre. It is also possible to erive an expression of the optimal block size as a function of problem parameters. Finally, it is shown that a precise rather than an approximate evaluation of cache conicts is sometimes necessary to obtain near-optimal performance. Categories an Subject Descriptors B.3.0 [Memory Structures] General; C.4 [Computer Systems Organization] Performance of Systems moeling techniques; D.3.4 [Programming Languages] Processors General Terms Measurement, Performance Aitional Key Wors an Phrases Blocking, cache conicts (interferences), cache performance, ata locality optimization, numerical coes. INTRODUCTION To ate, ata locality optimizing algorithms [Eisenbeis et al. 990; Ferrante et al. 99; McKinley 992; Porterel 989; Wolf an Lam 99] have been concerne with ecreasing capacity misses using blocking an have mostly ignore the occurrence of conict misses. However, previous stuies [Ferrante et al. 99; Lam et al. 99] showe that conict misses can signicantly alter the behavior of blocke algorithms. More precisely, self-interferences in blocke loops [Lam et al. 99] have been shown to be sensitive to the choice of the optimal block size. A ata locality optimization technique which combines tile size optimization an copying has also been propose [Esseghir 993] as a way to reuce self-interferences in numerical This work was fune by the DGXIII ESPRIT BRA III Project APPARC. Authors' aresses C. Fricker, INRIA, 7853 Le Chesnay, France; Christine.Fricker@inria.fr; O. Temam, PRiSM, University of Versailles, Versailles, France; temam@prism.uvsq.fr; W. Jalby, PRiSM, University of Versailles, Versailles, France; jalby@prism.uvsq.fr.

2 2 DO j=0,n- reg = Y(j) DO j2=0,n- reg += A(j2,j) * X(j2) ENDDO Y(j) = reg ENDDO DO jj2=0,n-,b DO j=0,n- reg = Y(j) DO j2=jj2,min(jj2+b-,n-) reg += A(j2,j) * X(j2) ENDDO Y(j) = reg ENDDO ENDDO Fig.. Blocke an nonblocke matrix vector multiply. loops. Recently, we have evelope a moel for evaluating conict misses in numerical loops [Temam et al. 994] with the purpose of unerstaning cache interference phenomena an preicting the cache performance of a numerical loop nest. Three ierent types of interference misses were istinguishe self-interferences, internal cross-interferences (cross-interferences between two references which subscripts have ientical linear expressions), an external cross-interferences (cross-interferences between any two other references). The most frequent an most icult type of interferences to evaluate are external cross-interferences. We have mentione in Temam et al. [993] that two ierent types of evaluation can be performe approximate or precise, but up to now we have mostly focuse on the approximate evaluation. In this article, precise evaluation of external cross-interferences is shown to be sometimes necessary for computing the near-optimal block size of a numerical loop. Most ata locality optimizing algorithms barely eal with the issue of computing the optimal block size. One of the most elaborate treatments of this problem can be foun in Eisenbeis et al. [990], where the computation of the optimal block size sums up to evaluating the number of capacity misses as a function of the block size, an then ning the block size that minimizes this number. The purpose of the article is twofol provie a etaile illustration of the technique use to erive the precise number of external cross-interference misses an show how the precision of the evaluation of conict misses can aect the etermination of the optimal block size an, further, the performance of the loop. Position of the Problem. The example use to illustrate the ierent points evelope in this article is the classic numerical algebra primitive Matrix-Vector multiply an its blocke version (see Figure ). The target architecture consiere is an 8KB irect-mappe cache with a line size equal to 32 bytes, which are the parameters of several current processors [Kane an Heinrich 992; Sites 992]. All problem parameters are expresse in ouble-precision oating numbers, i.e., 8 bytes, so that a cache size C S of 8KB correspons to C S = 024, an a line size of 32 bytes to = 4. Notations. m enotes the total number of cache misses. m t ; m s enote the number of temporal an spatial misses. m i enotes the number of intrinsic misses. m(t ) enotes the total number of cache misses for array T. The notations m t (T ), m s (T ), m i (T ) can also be euce. Furthermore m(t ; T 2 ) enotes the number of misses of T ue to interferences with T 2.

3 3 Experiments. Throughout the article, the actual number of misses is obtaine through simulations using a simulator evelope for that purpose. 2. ESTIMATING THE NUMBER OF CACHE MISSES Because of paper length constraints, this section is restricte to stuying the external cross-interferences between array A an array X. A treatment of other external cross-interferences in the loop can be foun in Fricker et al. [993]. External crossinterferences basically correspon to the ata reuse by a reference being ushe from cache by another reference, an the two references have subscripts with istinct linear expressions. The set of ata to be reuse by the victim reference is calle the reuse set, an the set of ata interfering with this reuse set is calle the interference set. These sets are ene on the loop level where the reuse occurs. So for arrays X an A in the blocke loop nest, the reuse loop is loop j (for X), an the reuse set (of X) an the interference set (of A) both correspon to a set of B array elements or B= cache lines. The problem sums up to stuying the relative cache position of the two sets an to computing the size of their intersection when they overlap. When the intersection size is expresse in cache lines it exactly correspons to the number of conict misses between the two references. 2. Interferences between X an A Let us now stuy the relative cache position of the reuse set of X an the interference set A. The positions of the beginning of these two sets are respectively R X = x 0 + j 2 R A = a 0 + j 2 + Mj Therefore, the relative position of the interference set with respect to the reuse set is the following R XA = a 0? x 0 + Mj Possible Relative Cache Positions of A an X. The rst problem is to n all the possible relative positions of X an A, i.e., all the possible values of R XA. Since R XA = a 0? x 0 + Mj, the possible locations are (a 0? x 0 + Mj ) mo C S. Let = gc (M; C S ) an r = (a 0?x 0 ) mo C S. Then, (a 0?x 0 +Mj ) mo C S = (r+(m=)j ) mo C S. Therefore, the possible positions are all of the form R XA = (r + ) mo C S ; 2 Z. The set of values of corresponing to istinct cache positions is nite. The istance between two consecutive possible cache positions is, an the number of istinct cache positions is equal to C S =. Cache Positions where Interferences Occur. Let us consier the interval I corresponing to C S = consecutive values of an ene by?c S =2 r + C S =2. For 2 I, interferences occur only if?b r + B, i.e., if the istance in cache between the beginning of the intervals of A an X belongs to [?B; B] (see Figure 2(a)). The previous inequation can be rewritten as (?B? r)=e b(b? r)=c. Let B = B + b with b = B mo. It is certain interferences occur for 2 It is assume here that B C S =2.

4 4 =8 =4 = B Ls Cache X A Miss ratio of X Preicte Dimension N (M=N) (a) (b) Fig. 2. (a) Cross-interferences between A an X. (b) Miss ratio of X. [?B ; B? ], while for =?(B + ) an = B, interferences may occur epening on the relative values of b; r; an (this is ue to the ceiling an oor functions of the above inequation). Computing the Number of Temporal Interferences. As mentione in the previous paragraph, the interferences between A an X recur with a perio of C S =. Therefore, the amount of interferences nees to be compute over one perio an then multiplie by the number of perios. An approximate number of perios is N=(C S =). So, in this paragraph, only a chunk of C S = iterations is consiere, e.g., the interval I. For each value of 2 I, the istance in cache between the beginning of the intervals of X an A is jr + j. So, the overlapping (expresse in cache locations) is equal to (B?jr + j) +, where (x) + = max (x; 0). For 2 [?B ;?], the overlapping is equal to (B + r + ) + = B + r +, an for 2 [0; B? ], it is equal to (B? r? ) + = B? r?. For =?(B + ), the overlapping is equal to (B + r? (B + ) ) + = (b + r? ) +, an for = B, the overlapping is equal to (B? r? B ) + = (b? r) +. For any other value of such that?c S =2 r + C S =2, the overlapping is equal to 0. Consequently for one perio of C S = iterations the number of cache lines that overlap is equal to (b + r? ) + + (b? r) + + P B? B? r? + P? =0 =?B B + r + an since P B? B?r?+P? =0 =?B B +r + =?B 2 +2B B = (B 2?b 2 )=, the total number of temporal interferences of X ue to A is given by m t (X; A) = N B N C S (b+r?) + +(b?r) + + B2?b 2 An intuitive representation of such interferences is inicate on Figure 2(a) (all intervals of A which o not interfere with X have not been represente).! ;

5 5 Average interferences m t (X; A) can be average over all possible values of r which may vary between 0 an?. The expression of the average number of interferences is equal to N B N C S P? r=0 2.2 Total Number of Cache Misses (b + r? ) + + (b? r) + + B2?b 2 = N 2 B C S In this section, the analytical expressions of the ierent sources of cache misses are presente. In theory, it is not possible, for one array, to a simply all the associate expressions because of possible reunancy between cross-interferences. However, these reunancies have been ignore because they prove to be negligible in most cases. Array X. Because Y inuces a negligible number of spatial interferences on array X, the term m s (X; Y ) oes not gure in the expression of m(x). So, with m(x) = m i (X) + m t (X; A) + m s (X; A) + m t (X; Y ); m i (X) = N ; m t (X; A) = N 2 B C S ; m s (X; A) = N 2 C S (? ) 2 ; m t (X; Y ) = N 2 C S ; we obtain m(x) = N B C S C S (? ) 2 C S The variations of m(x) can be very important, essentially because of the variations of m(x; A). The precision of the above estimate is illustrate in Figure 2(b). Array Y. The expression of the total number of misses for Y, m(y ), is the following m i (Y ) + m t (Y; Y ) + min (( 2C S?N ) + ; ) (m t (Y; A) + m t (Y; X)) + m s (Y; A) + m s (Y; X) N with m i (Y ) = N ; m t (Y; Y ) = N?(N?2(N?C S )+ ) + ; m t (Y; A) = N 2 min (; 2B B ); m t (Y; X)) = N 2 ; m s (Y; A) = m s (Y; X) = N 2 ( C S C? ); we obtain m(y ) = N min (; 2B ) + N?(N?2(N?C S )+ ) + B Array A. Because array A exhibits no temporal locality, the terms m t (A; X) an m t (A; Y ) o not appear in the expression of m(a). Besies, Y inuces a negligible number of spatial misses on array A (the argument is the same as for array X), so the term m s (A; Y ) has been remove as well. So, with we obtain m(a) = m i (A) + m s (A; X); m i (A) = N 2 ; m s (A; X) = N 2 C S (? ) 2 ; m(a) = N 2 (? ) 2 C S Blocke Matrix-Vector Multiply. Regaring the whole primitive, the misses of each array are clearly cumulative; therefore it is safe to assert that the expression

6 6 Total miss ratio Preicte Total miss ratio mo Ls = 0 mo Ls = mo Ls = 2 mo Ls = Dimension N (M=N) Block size B (Ls = 4) (a) (b) Fig. 3 (a) Total miss ratio of blocke matrix-vector multiply (r=4). (b) Inuence of semiintrinsic misses on global performance. of m, the total number of misses, is the following m = m(x) + m(y ) + m(a) Because the term m t (X; A) has a ominant impact on the total miss ratio, the total miss ratio is closely relate to the miss ratio of X as the comparison of Figure 3(a) with Figure 2(a) shows. 2.3 Spatial Interferences Temporal vs. Spatial Interferences. The main source of cache misses are temporal interferences on X ue to A m t (X) ' (N 2 B)=(C S ). Similarly, for spatial interferences m s (X) ' ((N 2 )=C S )(? = ) 2. An upper boun for m s (X) is (N 2 )=C S. So, if B is large enough m s (X) m t (X), i.e., spatial interferences are negligible with respect to temporal interferences. Note that, in opposition to temporal interferences, spatial interferences are inepenent of B, an therefore they o not inuence the choice of the optimal block size. As a consequence, spatial interferences will be ignore in the computations of Section 3. Semiintrinsic Misses. In the nonblocke loop, the reference to A is R A = a 0 +j 2 + Mj with 0 j < N an 0 j 2 < N, i.e., N elements are accesse consecutively; then a strie of M is applie (if M = N all elements are consecutive). In the blocke loop, R A = a 0 + j 2 + Mj + Bjj 2, i.e., the strie of M is applie much more frequently, every B elements. If oes not ivie B, or if the block of B elements is not aligne on a cache line, some elements of A are loae that o not belong to this block of B elements, i.e., useless elements. Since such elements will only be use after N iterations of loop j (i.e., they are unlikely to be kept in cache) or have alreay been use, they bree aitional cache misses that can be terme semiintrinsic misses.

7 7 Total miss ratio Ls = 2 Ls = Dimension N (M=N) Fig. 4. Inuence of on the relative importance of cache interferences. Even assuming a 0 mo = 0 (the rst element of A is aligne on a cache line), semiintrinsic misses occur if B mo 6= 0 ( oes not ivie B) an/or M mo 6= 0 (a block is not always aligne on a cache line). As can be seen in Figure 3(b), the optimal performance of the blocke loop can only be reache if these two conitions are fullle. Also, the inuence of on the number of interferences can be seen in Figure OPTIMAL BLOCK SIZE AND OPTIMAL GAIN The benet or gain of blocking for array T is ene by G(T ) = m n (T )? m b (T ) (where m n (T ) is m(t ) for the nonblocke? i.e., stanar? loop, an m b (T ) is m(t ) for the blocke loop). G is the total gain, i.e., G = m n? m b. For all the graphs in this section, the expression of the gain g = m n =m b is preferre because it provies the relative instea of the absolute improvement of miss rates ue to blocking. Still G(T ) has been use in the computations for the sake of simplicity. Also, in the next sections the optimal block size is enote B opt. In Section 3., the values of the optimal block size an the gain, as compute by state-of-the-art ata locality optimizing algorithms, are provie. In Section 3.2, the average gain (an the associate optimal block size) erive from the expressions of Section 2 is compute. The threshol value of N for which blocking is useful is compute in Section 3.3. The ierences between accurate an average evaluation of interferences are highlighte in Section 3.4. In Figure 5 the curves corresponing to the ierent expressions of the gain are plotte. Each curve is explaine in one of the following sections. 3. Theoretical Optimal Block Size an Theoretical Gain To ate, the most elaborate metho for computing the optimal block size in any loop can be foun in Eisenbeis et al. [990], so we will start from that point. In Eisenbeis et al. [990], for each reference, the set of ata to be reuse is calle the reference winow. The principle is to n a block size so that all winows t in cache, an which minimizes the number of cache misses. In Eisenbeis et al. [990], only capacity misses are consiere.

8 8 Gain g = Mn / Mb Precise Average Theoretical B=N N (M=N) Gain g = Mn / Mb Precise Average Theoretical B=min(N,Cs) N (M=N) Gain g = Mn / Mb Precise Average Theoretical B=Cs N (M=N) Gain g = Mn / Mb Precise Average Theoretical B=Cs N (M=N) Fig. 5 Optimal gain, preicte precise optimal gain, preicte average optimal gain, theoretical optimal gain. Let us illustrate this process with blocke matrix-vector multiply. The reference winow corresponing to array Y has a size of cache line. For array X it is equal to B= cache lines. An there is no winow for array A because it is not reuse. No reuse is assume to occur for arrays to which blocking is not applie, i.e., array Y. So the number of cache misses of array Y is equal to N=B N=. The number of misses of array A is equal to N 2 = (compulsory misses). Finally, since interferences are ignore, an the winow of B is assume to t in cache, the number of misses of array X is equal to N=B B=. The optimization problem is then the following B N B C S Minimize m b = N B N + N B B = N 2 B + N So, in this case, the problem is equivalent to maximizing B uner the constraints. If N < C S, then B opt = N, an if N C S, B opt = C S, i.e., B opt = min (N; C S ). In orer to compute the gain, the number of cache misses for the nonblocke

9 9 loop nest must be evaluate. Shortly, the number of capacity misses of X in the nonblocke loop nest is equal to m t (X; X) = N (N? (N? 2(N? C S ) + ) + )=. So, N?(N?2(N?C G = m n? m b = S ) + ) + + N? = N?(N?2(N?C S )+ ) +? N 2 min (N;C S ) N 2 min + N (N;C S ) In the remainer of the article, these values of the optimal block size an the optimal gain are terme the theoretical optimal block size an the theoretical optimal gain. In Figure 5, it can be seen that the gain obtaine with the theoretical optimal block size is very low (lower than.2). Besies, the theoretical gain appears to be a strong mispreiction of both the actual gain an even the gain obtaine with the theoretical block size. The theoretical gain actually correspons to what \shoul happen" if blocking was behaving as preicte by the Winow moel, i.e., if capacity misses were remove an there were no interference miss. Incientally, the theoretical gain inicates the maximum gain that can be theoretically expecte, i.e., the ieal gain. Let us compute this maximum gain When N > 2C S, + ) + g = N N?(N?2(N?C S ) N min 2 + N2 (N;C S ) + N2 L + N S L + N S g ' 2 L N2 S = 2 ; N 2 L + N2 + S C S L C S S so g ' 2. The maximum gain that can be expecte is 2 (i.e., blocking woul ivie by 2 the number of cache misses) in the nonblocke loop X exhibits at most N 2 = cache misses; an A also exhibits N 2 = compulsory misses, while in the blocke loop X ieally exhibits only N= compulsory misses in the best case; an A still exhibits N 2 = cache misses. 3.2 Estimate Average Gain an Corresponing Optimal Block Size 3.2. Estimate Average Gain. For computing the average gain, the expression of the average values of interferences are use. Such average expressions have been erive for both the blocke an the nonblocke loops. Because of paper length constraints, the etails of computations have been omitte (see Fricker et al. [993]). N < C S. G = m n? m b = C S N < 2C S. G = m n? m b N = 2 + N? 2C S N. N 3? C S C S N 2 B min (; 2B ) C S C S B C S N 2 B + 2C S?N N 2 min (; 2B ) + N (2C S?N ) + 2N (N?C S ) C S C S N B C S B G = m n? m b = N 2 + N? N 2 B C S C S B

10 0 Because cache interferences are taken into account in the above average estimates an not in the theoretical expressions of Section 3., new terms appear, or existing terms are moie. For instance, in the rst case (N < C S ), the main new term is N 2 B=(C S ) which correspons to temporal interferences between A an X. Because this term is a function of B, it is going to aect the etermination of the optimal block size. Inee, when N < C S, the expression of the theoretical number of misses of the blocke algorithm (see m b in Section 3.) only contains one term which epens on B N 2 =(B ). Consequently, this term is minimal when the block size is the largest possible; hence B opt = min (N; C S ). Now, in the above average expression two terms epen on B (N 2 B)=(C S ) an (N 2 =(B )) min (; 2B=) which respectively increases an ecreases (or is constant) with B. Therefore, the optimal block size is either equal to a traeo value or to (see the etaile computations in Section 3.2.2). The curve Average in Figure 5 correspons to the average optimal gain. It correspons to the above expressions with B = B opt (except g is use instea of G). It is shown in Section how to erive the expression of B opt in the ierent cases. As can be seen in Figure 5, the precision of the average optimal gain is usually close to the actual optimal gain. Still, when N > C S, the actual gain is perioically slightly higher than the average gain, while the precise estimate of the gain correctly preicts such phenomena (see Figure 5). The main ierence between precise an average estimates is that array base aresses are consiere in the precise estimate. In Figure 5, the base aresses of arrays X an A have been chosen large enough that no intense interference phenomena relate to array placement can occur (r = 52). But, in Section 3.4, it is shown that array base aresses can sometimes have a major inuence on the number of interference misses, in which case the precision of the average estimate can be poor Estimate Optimal Block Size Base on the Average Gain. Let us rst inicate the optimal block size expression obtaine in each case an then provie the etails of computations. When N < C S, we obtain B opt = p C S if < p C S an B opt = if p C S (recall that = gc (M; C S )). With the theoretical expression of Section 3., we obtain that B opt = N. When N C S, B opt is either equal to p C S or p 2C S (? C S =N) while the theoretical optimal block size is equal to C S in this case. Therefore, the theoretical expression of the optimal block size is generally a strong overestimate of the optimal block size, which is conrme by Figure 5. The theoretical expression of Section 3. implies that once an element of X is loae into the cache, it will not be ushe. Therefore, the only constraint on B is that it must t in cache. That is why the number of misses of X (N= ) oes not epen on B. On the other han, the expressions compute in Sections 2 an 3.2. take into account the fact the elements of X can be ushe by elements of A. Consequently, with respect to X, the block size shoul be selecte as small as possible so that the elements of X can be reuse before they can be ushe. Intuitively, it means the reuse istance shoul be small enough that the probability an element of X is ushe before it can be reuse is negligible. That is why the number of misses of X, (N 2 B)=(C S ), increases with B.

11 In the following paragraphs, it is now shown how the expression of the optimal block size can be erive from the expression of the average gain. G is now consiere to be a function of B. It is ierentiate along B so that its variations can be analyze. The optimal value of B, i.e., B opt, is the value that maximizes the gain. The computations are mostly etaile for the rst case. N < C S. Two subcases must be istinguishe 2B= < an 2B=. B < =?N 2 =(C S ), < 0 for this interval of values of B. Therefore the local maximum is reache when B is minimum, i.e., B opt =. The corresponing value of the gain is G max = G(B opt ). B =?N 2 =(C S ) =(B 2 ). > 0 if B > p C S < 0 otherwise. Thus, G increases up to the value B = max ( p C S ; =2) an ecreases afterwar. So B opt2 = max ( p C S ; =2). The maximal value of the gain is G max2 = G(B opt2 ). The maximal gain for this interval of N is the largest of the two gains, i.e., G max = max (G max ; G max2 ). These values must then be compare to n the global optimum B opt among B opt an B opt2. If p C S < =2, then G( ) an G(=2) shoul be compare. We obtain G( )? G( 2 ) = N 2 2C S C S Thus G( ) > G(=2) if > 2, which is assume. Hence B opt =. If p CS =2, then G( ) an G( p C S ) shoul be compare, which gives G( )? G( p 2 C S ) = N 2 ( p?? 2 ) CS C S Thus G( ) > G( p C S ) if 2=( p CS ) > =C S + 2=( ), which is equivalent to > p C S (? l= p C S )? ' p C S. The optimal block size for this interval of N is B opt = p C S if < p C S an B opt = otherwise. C S N < 2C S. The same subcases must be istinguishe. We obtain the following B < =2. B opt = min ( p 2(N? C S )C S =N; =2). B =2. B opt2 = max ( p C S ; =2), an these two local maxima are then compare. Note that p 2(N? C S )C S =N < p C S ; thus three cases must be istinguishe, accoring to the respective positions of an interval [ p 2(N? C S )C S =N; p CS ]. Computations show that the optimal block size is B opt = p C S if < h(n) an B opt = p 2(N? C S )C S =N otherwise, where h(n) = 2(N?C S )N p CS + 2C S?N N 2C S?N N 2N2 p Np 2 + N2 p? N 2(N?C S )C S ( CS CS N C N +) S Note that h(n) = p C S if N = C S, an h(n) = 0 if N = 2C S. 2C S N. Here there are no subcases, an B opt = p C S.

12 2 Gain g = Mn / Mb.3.2. Preicte Theoretical r r < B B B < r an B+r < Ls Cache X A N (M=N) r (a) (b) Fig. 6 (a) Determining when blocking is useful. (b) Variations of interferences between X an A. 3.3 Threshol Value of N In this section, the problem of etermining the threshol value N thr of N for which blocking is protable is briey aresse. Basically, blocking is useful if G > 0. Accoring to the theoretical expression of Section 3., G > 0 if N > C S, i.e., N thr = C S. Now, accoring to the expressions of Section 3.2., if N < C S, G > 0 as soon as N is approximately greater than 2 p C S. In fact, the theoretical expression only consiers capacity misses, so that blocking can only become protable if X is larger than the cache, i.e., if N > C S. However, cache interferences can occur even when capacity misses still o not occur. As explaine in Section 3.2.2, blocking has the eect of reucing the reuse istance of X so that it can be useful for minimizing interferences only. That is why N thr is actually much smaller than C S. This observation is conrme by Figure 6(a) (2 p C S ' 64 for C S = 024). 3.4 Estimate Precise Gain an Corresponing Optimal Block Size In Section 2 it has been shown how to obtain a precise evaluation of cross interferences between two arrays. The precise gain is simply obtaine by cumulating such expressions for all pairs of arrays, as was one for the average gain in Section 3.2. Because of paper length constraints, the full expressions are not provie here. On the other han, the ierences between average an precise gain are highlighte. These ierences occur when = gc (M; C S ) is large. The number of possible cache positions of the blocks of size B of A is equal to C S =. So, if is large, there are few such positions. Consequently, the corresponing cache locations are heavily reference. With respect to X, A appears to be istribute into few cache intervals separate by holes (see Figures 2(a) an 6(b)). Therefore, if array X overlaps with one such interval, interferences between A an X are very intense. Overlapping occurs if the relative cache istance r between A an X is smaller than B (see Figure 6(b), case r < B). Overlapping oes not occur if B < r an B + r <. So array base aresses play a signicant role when is large. This is illustrate

13 Precise Average Precise Average Gain g = Mn / Mb Gain g = Mn / Mb Block size B Block size B Fig. 7. Inuence of array base aresses on the optimal block size ( = 52 an = 4). in Figure 7 where = 52 (N = 52), r = 8, an B is varie. The actual optimal value of B is equal to 8, which is correctly preicte by the precise estimate. On the other han, if the average estimate correctly preicts the optimal gain that can be expecte, it is inepenent of r, an therefore it fails to preict for which value of B interferences will occur. For a similar value of N (N = 56), = 4, an performance variations are inepenent of r. Consequently, the average estimate remains precise for all values of B. Also, the average estimate is poor when is large because array base aresses are not consiere. On the other han, the accurate estimate successfully preicts performance variations. Note that this particularity of external cross-interferences can be exploite. If there are holes between two intervals of A, then B shoul be selecte small enough that an interval of X ts into one such hole. In this case, no cross-interference occurs between A an X. This cannot be achieve when is too small (smaller than ) because the optimal block size nees to be at least equal to in orer to exploit spatial locality. However, if is relatively small, it is not obvious that selecting a very small B will yiel important benets consiering the traeos impose by array Y (see Section 3.2.). Another solution is to ajust the relative base aress r (by ajusting the base aress of A or X) so that B < r. If this is not possible because of potential negative inuence on other loops in the coe, another solution is simply to copy array X in another array with a suitable base aress. 4. APPLICATIONS The techniques presente at the beginning of this article consist in precisely evaluating the number of external cross-interferences between two arrays by rst examining their relative cache positions an then computing the number of overlapping array elements. These techniques have been applie to the three ierent pairs of arrays that can be foun in matrix-vector multiply an which exhibit ierent patterns of relative cache positions. As far as the loop nest epth is small, these techniques can be extene to any pair of arrays. If the relative position epens on many

14 4 inices, the same techniques can be use for the innermost inex or inices, an an average estimate can be use for the outer inices (this was one for the pair X; Y ). Accurate evaluation of cache interferences is important for checking whether restructuring techniques o not inuce negative sie-eects that egrae potential benets. It clearly appears in this article that blocking is a elicate traeo which epens on loop an array parameters. Incluing such techniques in a compiler has not yet been achieve, but it is a possible follow-up to this stuy. A rst implementation coul be limite to average estimates which can be erive relatively easily. Precise estimates are more icult to implement, but a rst solution is to only etect that precise estimates are neee by ientifying high-risk cases. For instance, if the relative position of a pair of arrays only epens on one inex with coecient M, the test woul be simply to check the value of parameter = gc (M; C S ). If is large, intense interferences coul occur epening on the array base aresses, an a conservative (but nonoptimal) attitue woul be to select a block size of in these cases. A more immeiate application of this moel is the evelopment of a linear algebra library nely tune for caches. Though it is not possible to act on array base aresses, block size ajustment an copying provie sucient exibility to exploit fully the ierent cases etaile in this article. 5. CONCLUSIONS Several conclusions can be rawn from this analysis of matrix-vector multiply. First, accurately evaluating external cross-interference misses an eriving an analytical expression of the number of such cache misses are tractable tasks. Secon, the optimal block size, as compute by current ata locality optimizing algorithms, is highly inaccurate, because only capacity misses are consiere. If interference misses are taken into account in the optimization problem, the solution then becomes an accurate evaluation of the optimal block size. Thir, average estimate of external cross-interferences is frequently but not always sucient because, in some cases, array base aresses can strongly inuence the occurrence an intensity of cache interferences. REFERENCES Eisenbeis, C., Jalby, W., Winheiser, D., an Boin, F A strategy for array management in local memory. In Proceeings of the 3r Workshop on Programming Languages an Compilers for Parallel Computing. Irvine, California. Esseghir, K Improving ata locality for caches. M.S. thesis, Univ of Texas, Houston, Tex. Ferrante, J., Sarkar, V., an Thrash, W. 99. On estimating an enhancing cache eectiveness. In Proceeings of the 4th Workshop on Languages an Compilers for Parallel Computing. Santa Clara, California. Fricker, C., Temam, O., an Jalby, W Accurate evaluation of blocke algorithms cache interferences. Tech. rep., Leien Univ., Leien, The Netherlans. Mar. Kane, G. an Heinrich, J MIPS RISC Architecture. Prentice-Hall, Englewoo Clis, N.J. Lam, M., Rothberg, E. E., an Wolf, M. E. 99. The cache performance of blocke algorithms. In 4th International Conference on Architectural Support for Programming Languages an Operating Systems. ACM, New York,

15 5 McKinley, K. S Automatic an interactive parallelization. Ph. D. thesis, Tech. Rep. CRPC-TR9224, Rice Univ, Houston, Tex. Porterfiel, A. K Software Methos for Improvement of Cache Performance on Supercomputer Applications. Ph. D. thesis, Tech. Rep. CRPC-TR89-93, Rice Univ, Houston, Tex. Sites, R. L Alpha Architecture Reference Manual. Digital Press, Befor, Mass. Temam, O., Fricker, C., an Jalby, W Impact of cache interferences on usual numerical ense loop nests. In Proc. IEEE, special issue on Computer Performance Evaluation. Temam, O., Fricker, C., an Jalby, W Cache interference phenomena. In Proceeings of the ACM SIGMETRICS Conference on Measurement an Moeling of Computer Systems. (Nashville, Tenn.). ACM, New York. Wolf, M. an Lam, M. 99. A ata locality optimizing algorithm. In Proceeings of the ACM SIGPLAN '9 Conference on Programming Language Design an Implementation. SIGPLAN Not. 26, 6, Receive January 994; revise June 994; accepte February 995

Impact of cache interferences on usual numerical dense loop. nests. O. Temam C. Fricker W. Jalby. University of Leiden INRIA University of Versailles

Impact of cache interferences on usual numerical dense loop. nests. O. Temam C. Fricker W. Jalby. University of Leiden INRIA University of Versailles Impact of cache interferences on usual numerical ense loop nests O. Temam C. Fricker W. Jalby University of Leien INRIA University of Versailles Niels Bohrweg 1 Domaine e Voluceau MASI 2333 CA Leien 78153

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract Optimization Methos for Calculating Design Imprecision y William S. Law Eri K. Antonsson Engineering Design Research Laboratory Division of Engineering an Applie Science California Institute of Technology

More information

Recitation Caches and Blocking. 4 March 2019

Recitation Caches and Blocking. 4 March 2019 15-213 Recitation Caches an Blocking 4 March 2019 Agena Reminers Revisiting Cache Lab Caching Review Blocking to reuce cache misses Cache alignment Reminers Due Dates Cache Lab (Thursay 3/7) Miterm Exam

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

2-connected graphs with small 2-connected dominating sets

2-connected graphs with small 2-connected dominating sets 2-connecte graphs with small 2-connecte ominating sets Yair Caro, Raphael Yuster 1 Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel Abstract Let G be a 2-connecte graph. A

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

P. Fua and Y. G. Leclerc. SRI International. 333 Ravenswood Avenue, Menlo Park, CA

P. Fua and Y. G. Leclerc. SRI International. 333 Ravenswood Avenue, Menlo Park, CA Moel Driven Ege Detection P. Fua an Y. G. Leclerc SI International 333 avenswoo Avenue, Menlo Park, CA 9425 (fua@ai.sri.com leclerc@ai.sri.com) Machine Vision an Applications, 3, 199 Abstract Stanar ege

More information

When Clusters Meet Partitions: Dennis J.-H. Huang and Andrew B. Kahng. UCLA Computer Science Department, Los Angeles, CA USA

When Clusters Meet Partitions: Dennis J.-H. Huang and Andrew B. Kahng. UCLA Computer Science Department, Los Angeles, CA USA When Clusters Meet Partitions: New Density-Base Methos for Circuit Decomposition Dennis J.-H. Huang an Anrew B. Kahng UCLA Computer Science Department, Los Angeles, CA 90024-596 USA jenhsin@cs.ucla.eu,

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Waleed K. Al-Assadi. Anura P. Jayasumana. Yashwant K. Malaiya y. February Colorado State University

Waleed K. Al-Assadi. Anura P. Jayasumana. Yashwant K. Malaiya y. February Colorado State University Dierential I DDQ Testable Static RAM Architecture Walee K. Al-Assai Anura P. Jayasumana Yashwant K. Malaiya y Technical Report CS-96-102 February 1996 Department of Electrical Engineering/ y Department

More information

Multimodal Stereo Image Registration for Pedestrian Detection

Multimodal Stereo Image Registration for Pedestrian Detection Multimoal Stereo Image Registration for Peestrian Detection Stephen Krotosky an Mohan Trivei Abstract This paper presents an approach for the registration of multimoal imagery for peestrian etection when

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

Kinematic Analysis of a Family of 3R Manipulators

Kinematic Analysis of a Family of 3R Manipulators Kinematic Analysis of a Family of R Manipulators Maher Baili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S. 6597 1, rue e la Noë, BP 92101,

More information

Investigation into a new incremental forming process using an adjustable punch set for the manufacture of a doubly curved sheet metal

Investigation into a new incremental forming process using an adjustable punch set for the manufacture of a doubly curved sheet metal 991 Investigation into a new incremental forming process using an ajustable punch set for the manufacture of a oubly curve sheet metal S J Yoon an D Y Yang* Department of Mechanical Engineering, Korea

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

d 3 d 4 d d d d d d d d d d d 1 d d d d d d Proceeings of the IASTED International Conference Software Engineering an Applications (SEA') October 6-, 1, Scottsale, Arizona, USA AN OBJECT-ORIENTED APPROACH FOR MANAGING A NETWORK OF DATABASES Shu-Ching

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Questions? Post on piazza, or  Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)! EE122 Fall 2013 HW3 Instructions Recor your answers in a file calle hw3.pf. Make sure to write your name an SID at the top of your assignment. For each problem, clearly inicate your final answer, bol an

More information

X y. f(x,y,d) f(x,y,d) Peak. Motion stereo space. parameter space. (x,y,d) Motion stereo space. Parameter space. Motion stereo space.

X y. f(x,y,d) f(x,y,d) Peak. Motion stereo space. parameter space. (x,y,d) Motion stereo space. Parameter space. Motion stereo space. 3D Shape Measurement of Unerwater Objects Using Motion Stereo Hieo SAITO Hirofumi KAWAMURA Masato NAKAJIMA Department of Electrical Engineering, Keio Universit 3-14-1Hioshi Kouhoku-ku Yokohama 223, Japan

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

0607 CAMBRIDGE INTERNATIONAL MATHEMATICS

0607 CAMBRIDGE INTERNATIONAL MATHEMATICS CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Seconary Eucation MARK SCHEME for the May/June 03 series 0607 CAMBRIDGE INTERNATIONAL MATHEMATICS 0607/4 Paper 4 (Extene), maximum

More information

Estimating Velocity Fields on a Freeway from Low Resolution Video

Estimating Velocity Fields on a Freeway from Low Resolution Video Estimating Velocity Fiels on a Freeway from Low Resolution Vieo Young Cho Department of Statistics University of California, Berkeley Berkeley, CA 94720-3860 Email: young@stat.berkeley.eu John Rice Department

More information

0607 CAMBRIDGE INTERNATIONAL MATHEMATICS

0607 CAMBRIDGE INTERNATIONAL MATHEMATICS PAPA CAMBRIDGE CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Seconary Eucation MARK SCHEME for the May/June 0 series CAMBRIDGE INTERNATIONAL MATHEMATICS /4 4 (Extene), maximum

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

NAND flash memory is widely used as a storage

NAND flash memory is widely used as a storage 1 : Buffer-Aware Garbage Collection for Flash-Base Storage Systems Sungjin Lee, Dongkun Shin Member, IEEE, an Jihong Kim Member, IEEE Abstract NAND flash-base storage evice is becoming a viable storage

More information

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.

More information

All-to-all Broadcast for Vehicular Networks Based on Coded Slotted ALOHA

All-to-all Broadcast for Vehicular Networks Based on Coded Slotted ALOHA Preprint, August 5, 2018. 1 All-to-all Broacast for Vehicular Networks Base on Coe Slotte ALOHA Mikhail Ivanov, Frerik Brännström, Alexanre Graell i Amat, an Petar Popovski Department of Signals an Systems,

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

Modifying ROC Curves to Incorporate Predicted Probabilities

Modifying ROC Curves to Incorporate Predicted Probabilities Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Object Recognition Using Colour, Shape and Affine Invariant Ratios

Object Recognition Using Colour, Shape and Affine Invariant Ratios Object Recognition Using Colour, Shape an Affine Invariant Ratios Paul A. Walcott Centre for Information Engineering City University, Lonon EC1V 0HB, Englan P.A.Walcott@city.ac.uk Abstract This paper escribes

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Lesson 11 Interference of Light

Lesson 11 Interference of Light Physics 30 Lesson 11 Interference of Light I. Light Wave or Particle? The fact that light carries energy is obvious to anyone who has focuse the sun's rays with a magnifying glass on a piece of paper an

More information

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks Coorinating Distribute Algorithms for Feature Extraction Offloaing in Multi-Camera Visual Sensor Networks Emil Eriksson, György Dán, Viktoria Foor School of Electrical Engineering, KTH Royal Institute

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Design of Policy-Aware Differentially Private Algorithms

Design of Policy-Aware Differentially Private Algorithms Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

E2EM-X4X1 2M *2 E2EM-X4X2 2M Shielded E2EM-X8X1 2M *2 E2EM-X8X2 2M *1 M30 15 mm E2EM-X15X1 2M *2 E2EM-X15X2 2M

E2EM-X4X1 2M *2 E2EM-X4X2 2M Shielded E2EM-X8X1 2M *2 E2EM-X8X2 2M *1 M30 15 mm E2EM-X15X1 2M *2 E2EM-X15X2 2M Long-istance Proximity Sensor EEM CSM_EEM_DS_E_7_ Long-istance Proximity Sensor Long-istance etection at up to mm enables secure mounting with reuce problems ue to workpiece collisions. No polarity for

More information

Image compression predicated on recurrent iterated function systems

Image compression predicated on recurrent iterated function systems 2n International Conference on Mathematics & Statistics 16-19 June, 2008, Athens, Greece Image compression preicate on recurrent iterate function systems Chol-Hui Yun *, Metzler W. a an Barski M. a * Faculty

More information

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11:

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11: airwise alignment using shortest path algorithms, Gunnar Klau, November 9,, : 3 3 airwise alignment using shortest path algorithms e will iscuss: it graph Dijkstra s algorithm algorithm (GDU) 3. References

More information

Department of Computer Science, POSTECH, Pohang , Korea. (x 0 (t); y 0 (t)) 6= (0; 0) and N(t) is well dened on the

Department of Computer Science, POSTECH, Pohang , Korea. (x 0 (t); y 0 (t)) 6= (0; 0) and N(t) is well dened on the Comparing Oset Curve Approximation Methos Gershon er +, In-Kwon, an Myung-Soo Kim + Department of Computer Science, Technion, IIT, Haifa 32000, Israel Department of Computer Science, POSTECH, Pohang 790-784,

More information

Gabriel Rivera, Chau-Wen Tseng. Abstract. Linear algebra codes contain data locality which can be exploited

Gabriel Rivera, Chau-Wen Tseng. Abstract. Linear algebra codes contain data locality which can be exploited A Comparison of Compiler Tiling Algorithms Gabriel Rivera, Chau-Wen Tseng Department of Computer Science, University of Maryland, College Park, MD 20742 Abstract. Linear algebra codes contain data locality

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

ACE: And/Or-parallel Copying-based Execution of Logic Programs

ACE: And/Or-parallel Copying-based Execution of Logic Programs ACE: An/Or-parallel Copying-base Execution of Logic Programs Gopal GuptaJ Manuel Hermenegilo* Enrico PontelliJ an Vítor Santos Costa' Abstract In this paper we present a novel execution moel for parallel

More information

Verifying performance-based design objectives using assemblybased vulnerability

Verifying performance-based design objectives using assemblybased vulnerability Verying performance-base esign objectives using assemblybase vulnerability K.A. Porter Calornia Institute of Technology, Pasaena, Calornia, USA A.S. Kiremijian Stanfor University, Stanfor, Calornia, USA

More information

Using Ray Tracing for Site-Specific Indoor Radio Signal Strength Analysis 1

Using Ray Tracing for Site-Specific Indoor Radio Signal Strength Analysis 1 Using Ray Tracing for Site-Specific Inoor Raio Signal Strength Analysis 1 Michael Ni, Stephen Mann, an Jay Black Computer Science Department, University of Waterloo, Waterloo, Ontario, NL G1, Canaa Abstract

More information

Appearance Sensing distance Output configuration Operation mode Model. Appearance Sensing distance Output configuration Operation mode Model

Appearance Sensing distance Output configuration Operation mode Model. Appearance Sensing distance Output configuration Operation mode Model Spatter-resistant Proximity Sensor EEQ CSM_EEQ_DS_E Spatter-resistant Fluororesincoate Proximity Sensor Superior spatter resistance. Long Sensing-istance s ae for sensing istances up to mm. Pre-wire Smartclick

More information

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,

More information

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE БСУ Международна конференция - 2 THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE Evgeniya Nikolova, Veselina Jecheva Burgas Free University Abstract:

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications

More information

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs IEEE TRANSACTIONS ON KNOWLEDE AND DATA ENINEERIN, MANUSCRIPT ID Distribute Line raphs: A Universal Technique for Designing DHTs Base on Arbitrary Regular raphs Yiming Zhang an Ling Liu, Senior Member,

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico i Torino Porto Institutional Repository [Proceeing] Automatic March tests generation for multi-port SRAMs Original Citation: Benso A., Bosio A., i Carlo S., i Natale G., Prinetto P. (26). Automatic

More information

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD

FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD Warsaw University of Technology Faculty of Physics Physics Laboratory I P Joanna Konwerska-Hrabowska 6 FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD.

More information

Exercises of PIV. incomplete draft, version 0.0. October 2009

Exercises of PIV. incomplete draft, version 0.0. October 2009 Exercises of PIV incomplete raft, version 0.0 October 2009 1 Images Images are signals efine in 2D or 3D omains. They can be vector value (e.g., color images), real (monocromatic images), complex or binary

More information

Classical Mechanics Examples (Lagrange Multipliers)

Classical Mechanics Examples (Lagrange Multipliers) Classical Mechanics Examples (Lagrange Multipliers) Dipan Kumar Ghosh Physics Department, Inian Institute of Technology Bombay Powai, Mumbai 400076 September 3, 015 1 Introuction We have seen that the

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

6 Gradient Descent. 6.1 Functions

6 Gradient Descent. 6.1 Functions 6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output

More information

Multilevel Paging. Multilevel Paging Translation. Paging Hardware With TLB 11/13/2014. CS341: Operating System

Multilevel Paging. Multilevel Paging Translation. Paging Hardware With TLB 11/13/2014. CS341: Operating System CS341: Operating System Lect31: 21 st Oct 2014 Dr A Sahu Dept o Comp Sc & Engg Inian Institute o Technology Guwahati ain Contiguous Allocation, Segmentation, Paging Page Table an TLB Paging : Larger Page

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

Appearance Sensing distance Output configuration Operation mode Model. Appearance Sensing distance Output configuration Operation mode Model

Appearance Sensing distance Output configuration Operation mode Model. Appearance Sensing distance Output configuration Operation mode Model Spatter-resistant Proximity Sensor EEQ CSM_EEQ_DS_E Spatter-resistant Fluororesincoate Proximity Sensor Superior spatter resistance. Long Sensing-istance s ae for sensing istances up to mm. DC -Wire s.

More information

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 1, NO. 4, APRIL 01 74 Towar Efficient Distribute Algorithms for In-Network Binary Operator Tree Placement in Wireless Sensor Networks Zongqing Lu,

More information

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember 107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,

More information

Module13:Interference-I Lecture 13: Interference-I

Module13:Interference-I Lecture 13: Interference-I Moule3:Interference-I Lecture 3: Interference-I Consier a situation where we superpose two waves. Naively, we woul expect the intensity (energy ensity or flux) of the resultant to be the sum of the iniviual

More information

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques Politehnica University of Timisoara Mobile Computing, Sensors Network an Embee Systems Laboratory ing Techniques What is testing? ing is the process of emonstrating that errors are not present. The purpose

More information

Q. No. 1 Newton postulated his corpuscular theory of light on the basis of

Q. No. 1 Newton postulated his corpuscular theory of light on the basis of Q. No. 1 Newton postulate his corpuscular theory of light on the basis of Newton s rings Option Rectilinear propagation of light Colour through thin films Dispersion of white light into colours. Correct

More information

An FFT-based Method for Attenuation Correction in Fluorescence Confocal Microscopy Roerdink, Johannes; Bakker, M.

An FFT-based Method for Attenuation Correction in Fluorescence Confocal Microscopy Roerdink, Johannes; Bakker, M. University of Groningen An FFT-base Metho for Attenuation Correction in Fluorescence Confocal Microscopy Roerink, Johannes; Bakker, M. Publishe in: Default journal IMPORTANT NOTE: You are avise to consult

More information

Design of Controller for Crawling to Sitting Behavior of Infants

Design of Controller for Crawling to Sitting Behavior of Infants Design of Controller for Crawling to Sitting Behavior of Infants A Report submitte for the Semester Project To be accepte on: 29 June 2007 by Neha Priyaarshini Garg Supervisors: Luovic Righetti Prof. Auke

More information

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory Feature Extraction an Rule Classification Algorithm of Digital Mammography base on Rough Set Theory Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

Chalmers Publication Library

Chalmers Publication Library Chalmers Publication Library All-to-all Broacast for Vehicular Networks Base on Coe Slotte ALOHA This ocument has been ownloae from Chalmers Publication Library (CPL). It is the author s version of a work

More information