Inuence of Cross-Interferences on Blocked Loops: to know the precise gain brought by blocking. It is even dicult to determine for which problem
|
|
- Susan Chapman
- 5 years ago
- Views:
Transcription
1 Inuence of Cross-Interferences on Blocke Loops A Case Stuy with Matrix-Vector Multiply CHRISTINE FRICKER INRIA, France an OLIVIER TEMAM an WILLIAM JALBY University of Versailles, France State-of-the art ata locality optimizing algorithms are targete for local memories rather than for cache memories. Recent work on cache interferences seems to inicate that these phenomena can severely aect blocke algorithms cache performance. Because of cache conicts, it is not possible to know the precise gain brought by blocking. It is even icult to etermine for which problem sizes blocking is useful. Computing the actual optimal block size is icult because cache conicts are highly irregular. In this article, we illustrate the issue of precisely evaluating cross-interferences in blocke loops with blocke matrix-vector multiply. Most signicant interference phenomena are capture because unusual parameters such as array base aresses are being consiere. The techniques use allow us to compute the precise improvement ue to blocking an the threshol value of problem parameters for which the blocke loop shoul be preferre. It is also possible to erive an expression of the optimal block size as a function of problem parameters. Finally, it is shown that a precise rather than an approximate evaluation of cache conicts is sometimes necessary to obtain near-optimal performance. Categories an Subject Descriptors B.3.0 [Memory Structures] General; C.4 [Computer Systems Organization] Performance of Systems moeling techniques; D.3.4 [Programming Languages] Processors General Terms Measurement, Performance Aitional Key Wors an Phrases Blocking, cache conicts (interferences), cache performance, ata locality optimization, numerical coes. INTRODUCTION To ate, ata locality optimizing algorithms [Eisenbeis et al. 990; Ferrante et al. 99; McKinley 992; Porterel 989; Wolf an Lam 99] have been concerne with ecreasing capacity misses using blocking an have mostly ignore the occurrence of conict misses. However, previous stuies [Ferrante et al. 99; Lam et al. 99] showe that conict misses can signicantly alter the behavior of blocke algorithms. More precisely, self-interferences in blocke loops [Lam et al. 99] have been shown to be sensitive to the choice of the optimal block size. A ata locality optimization technique which combines tile size optimization an copying has also been propose [Esseghir 993] as a way to reuce self-interferences in numerical This work was fune by the DGXIII ESPRIT BRA III Project APPARC. Authors' aresses C. Fricker, INRIA, 7853 Le Chesnay, France; Christine.Fricker@inria.fr; O. Temam, PRiSM, University of Versailles, Versailles, France; temam@prism.uvsq.fr; W. Jalby, PRiSM, University of Versailles, Versailles, France; jalby@prism.uvsq.fr.
2 2 DO j=0,n- reg = Y(j) DO j2=0,n- reg += A(j2,j) * X(j2) ENDDO Y(j) = reg ENDDO DO jj2=0,n-,b DO j=0,n- reg = Y(j) DO j2=jj2,min(jj2+b-,n-) reg += A(j2,j) * X(j2) ENDDO Y(j) = reg ENDDO ENDDO Fig.. Blocke an nonblocke matrix vector multiply. loops. Recently, we have evelope a moel for evaluating conict misses in numerical loops [Temam et al. 994] with the purpose of unerstaning cache interference phenomena an preicting the cache performance of a numerical loop nest. Three ierent types of interference misses were istinguishe self-interferences, internal cross-interferences (cross-interferences between two references which subscripts have ientical linear expressions), an external cross-interferences (cross-interferences between any two other references). The most frequent an most icult type of interferences to evaluate are external cross-interferences. We have mentione in Temam et al. [993] that two ierent types of evaluation can be performe approximate or precise, but up to now we have mostly focuse on the approximate evaluation. In this article, precise evaluation of external cross-interferences is shown to be sometimes necessary for computing the near-optimal block size of a numerical loop. Most ata locality optimizing algorithms barely eal with the issue of computing the optimal block size. One of the most elaborate treatments of this problem can be foun in Eisenbeis et al. [990], where the computation of the optimal block size sums up to evaluating the number of capacity misses as a function of the block size, an then ning the block size that minimizes this number. The purpose of the article is twofol provie a etaile illustration of the technique use to erive the precise number of external cross-interference misses an show how the precision of the evaluation of conict misses can aect the etermination of the optimal block size an, further, the performance of the loop. Position of the Problem. The example use to illustrate the ierent points evelope in this article is the classic numerical algebra primitive Matrix-Vector multiply an its blocke version (see Figure ). The target architecture consiere is an 8KB irect-mappe cache with a line size equal to 32 bytes, which are the parameters of several current processors [Kane an Heinrich 992; Sites 992]. All problem parameters are expresse in ouble-precision oating numbers, i.e., 8 bytes, so that a cache size C S of 8KB correspons to C S = 024, an a line size of 32 bytes to = 4. Notations. m enotes the total number of cache misses. m t ; m s enote the number of temporal an spatial misses. m i enotes the number of intrinsic misses. m(t ) enotes the total number of cache misses for array T. The notations m t (T ), m s (T ), m i (T ) can also be euce. Furthermore m(t ; T 2 ) enotes the number of misses of T ue to interferences with T 2.
3 3 Experiments. Throughout the article, the actual number of misses is obtaine through simulations using a simulator evelope for that purpose. 2. ESTIMATING THE NUMBER OF CACHE MISSES Because of paper length constraints, this section is restricte to stuying the external cross-interferences between array A an array X. A treatment of other external cross-interferences in the loop can be foun in Fricker et al. [993]. External crossinterferences basically correspon to the ata reuse by a reference being ushe from cache by another reference, an the two references have subscripts with istinct linear expressions. The set of ata to be reuse by the victim reference is calle the reuse set, an the set of ata interfering with this reuse set is calle the interference set. These sets are ene on the loop level where the reuse occurs. So for arrays X an A in the blocke loop nest, the reuse loop is loop j (for X), an the reuse set (of X) an the interference set (of A) both correspon to a set of B array elements or B= cache lines. The problem sums up to stuying the relative cache position of the two sets an to computing the size of their intersection when they overlap. When the intersection size is expresse in cache lines it exactly correspons to the number of conict misses between the two references. 2. Interferences between X an A Let us now stuy the relative cache position of the reuse set of X an the interference set A. The positions of the beginning of these two sets are respectively R X = x 0 + j 2 R A = a 0 + j 2 + Mj Therefore, the relative position of the interference set with respect to the reuse set is the following R XA = a 0? x 0 + Mj Possible Relative Cache Positions of A an X. The rst problem is to n all the possible relative positions of X an A, i.e., all the possible values of R XA. Since R XA = a 0? x 0 + Mj, the possible locations are (a 0? x 0 + Mj ) mo C S. Let = gc (M; C S ) an r = (a 0?x 0 ) mo C S. Then, (a 0?x 0 +Mj ) mo C S = (r+(m=)j ) mo C S. Therefore, the possible positions are all of the form R XA = (r + ) mo C S ; 2 Z. The set of values of corresponing to istinct cache positions is nite. The istance between two consecutive possible cache positions is, an the number of istinct cache positions is equal to C S =. Cache Positions where Interferences Occur. Let us consier the interval I corresponing to C S = consecutive values of an ene by?c S =2 r + C S =2. For 2 I, interferences occur only if?b r + B, i.e., if the istance in cache between the beginning of the intervals of A an X belongs to [?B; B] (see Figure 2(a)). The previous inequation can be rewritten as (?B? r)=e b(b? r)=c. Let B = B + b with b = B mo. It is certain interferences occur for 2 It is assume here that B C S =2.
4 4 =8 =4 = B Ls Cache X A Miss ratio of X Preicte Dimension N (M=N) (a) (b) Fig. 2. (a) Cross-interferences between A an X. (b) Miss ratio of X. [?B ; B? ], while for =?(B + ) an = B, interferences may occur epening on the relative values of b; r; an (this is ue to the ceiling an oor functions of the above inequation). Computing the Number of Temporal Interferences. As mentione in the previous paragraph, the interferences between A an X recur with a perio of C S =. Therefore, the amount of interferences nees to be compute over one perio an then multiplie by the number of perios. An approximate number of perios is N=(C S =). So, in this paragraph, only a chunk of C S = iterations is consiere, e.g., the interval I. For each value of 2 I, the istance in cache between the beginning of the intervals of X an A is jr + j. So, the overlapping (expresse in cache locations) is equal to (B?jr + j) +, where (x) + = max (x; 0). For 2 [?B ;?], the overlapping is equal to (B + r + ) + = B + r +, an for 2 [0; B? ], it is equal to (B? r? ) + = B? r?. For =?(B + ), the overlapping is equal to (B + r? (B + ) ) + = (b + r? ) +, an for = B, the overlapping is equal to (B? r? B ) + = (b? r) +. For any other value of such that?c S =2 r + C S =2, the overlapping is equal to 0. Consequently for one perio of C S = iterations the number of cache lines that overlap is equal to (b + r? ) + + (b? r) + + P B? B? r? + P? =0 =?B B + r + an since P B? B?r?+P? =0 =?B B +r + =?B 2 +2B B = (B 2?b 2 )=, the total number of temporal interferences of X ue to A is given by m t (X; A) = N B N C S (b+r?) + +(b?r) + + B2?b 2 An intuitive representation of such interferences is inicate on Figure 2(a) (all intervals of A which o not interfere with X have not been represente).! ;
5 5 Average interferences m t (X; A) can be average over all possible values of r which may vary between 0 an?. The expression of the average number of interferences is equal to N B N C S P? r=0 2.2 Total Number of Cache Misses (b + r? ) + + (b? r) + + B2?b 2 = N 2 B C S In this section, the analytical expressions of the ierent sources of cache misses are presente. In theory, it is not possible, for one array, to a simply all the associate expressions because of possible reunancy between cross-interferences. However, these reunancies have been ignore because they prove to be negligible in most cases. Array X. Because Y inuces a negligible number of spatial interferences on array X, the term m s (X; Y ) oes not gure in the expression of m(x). So, with m(x) = m i (X) + m t (X; A) + m s (X; A) + m t (X; Y ); m i (X) = N ; m t (X; A) = N 2 B C S ; m s (X; A) = N 2 C S (? ) 2 ; m t (X; Y ) = N 2 C S ; we obtain m(x) = N B C S C S (? ) 2 C S The variations of m(x) can be very important, essentially because of the variations of m(x; A). The precision of the above estimate is illustrate in Figure 2(b). Array Y. The expression of the total number of misses for Y, m(y ), is the following m i (Y ) + m t (Y; Y ) + min (( 2C S?N ) + ; ) (m t (Y; A) + m t (Y; X)) + m s (Y; A) + m s (Y; X) N with m i (Y ) = N ; m t (Y; Y ) = N?(N?2(N?C S )+ ) + ; m t (Y; A) = N 2 min (; 2B B ); m t (Y; X)) = N 2 ; m s (Y; A) = m s (Y; X) = N 2 ( C S C? ); we obtain m(y ) = N min (; 2B ) + N?(N?2(N?C S )+ ) + B Array A. Because array A exhibits no temporal locality, the terms m t (A; X) an m t (A; Y ) o not appear in the expression of m(a). Besies, Y inuces a negligible number of spatial misses on array A (the argument is the same as for array X), so the term m s (A; Y ) has been remove as well. So, with we obtain m(a) = m i (A) + m s (A; X); m i (A) = N 2 ; m s (A; X) = N 2 C S (? ) 2 ; m(a) = N 2 (? ) 2 C S Blocke Matrix-Vector Multiply. Regaring the whole primitive, the misses of each array are clearly cumulative; therefore it is safe to assert that the expression
6 6 Total miss ratio Preicte Total miss ratio mo Ls = 0 mo Ls = mo Ls = 2 mo Ls = Dimension N (M=N) Block size B (Ls = 4) (a) (b) Fig. 3 (a) Total miss ratio of blocke matrix-vector multiply (r=4). (b) Inuence of semiintrinsic misses on global performance. of m, the total number of misses, is the following m = m(x) + m(y ) + m(a) Because the term m t (X; A) has a ominant impact on the total miss ratio, the total miss ratio is closely relate to the miss ratio of X as the comparison of Figure 3(a) with Figure 2(a) shows. 2.3 Spatial Interferences Temporal vs. Spatial Interferences. The main source of cache misses are temporal interferences on X ue to A m t (X) ' (N 2 B)=(C S ). Similarly, for spatial interferences m s (X) ' ((N 2 )=C S )(? = ) 2. An upper boun for m s (X) is (N 2 )=C S. So, if B is large enough m s (X) m t (X), i.e., spatial interferences are negligible with respect to temporal interferences. Note that, in opposition to temporal interferences, spatial interferences are inepenent of B, an therefore they o not inuence the choice of the optimal block size. As a consequence, spatial interferences will be ignore in the computations of Section 3. Semiintrinsic Misses. In the nonblocke loop, the reference to A is R A = a 0 +j 2 + Mj with 0 j < N an 0 j 2 < N, i.e., N elements are accesse consecutively; then a strie of M is applie (if M = N all elements are consecutive). In the blocke loop, R A = a 0 + j 2 + Mj + Bjj 2, i.e., the strie of M is applie much more frequently, every B elements. If oes not ivie B, or if the block of B elements is not aligne on a cache line, some elements of A are loae that o not belong to this block of B elements, i.e., useless elements. Since such elements will only be use after N iterations of loop j (i.e., they are unlikely to be kept in cache) or have alreay been use, they bree aitional cache misses that can be terme semiintrinsic misses.
7 7 Total miss ratio Ls = 2 Ls = Dimension N (M=N) Fig. 4. Inuence of on the relative importance of cache interferences. Even assuming a 0 mo = 0 (the rst element of A is aligne on a cache line), semiintrinsic misses occur if B mo 6= 0 ( oes not ivie B) an/or M mo 6= 0 (a block is not always aligne on a cache line). As can be seen in Figure 3(b), the optimal performance of the blocke loop can only be reache if these two conitions are fullle. Also, the inuence of on the number of interferences can be seen in Figure OPTIMAL BLOCK SIZE AND OPTIMAL GAIN The benet or gain of blocking for array T is ene by G(T ) = m n (T )? m b (T ) (where m n (T ) is m(t ) for the nonblocke? i.e., stanar? loop, an m b (T ) is m(t ) for the blocke loop). G is the total gain, i.e., G = m n? m b. For all the graphs in this section, the expression of the gain g = m n =m b is preferre because it provies the relative instea of the absolute improvement of miss rates ue to blocking. Still G(T ) has been use in the computations for the sake of simplicity. Also, in the next sections the optimal block size is enote B opt. In Section 3., the values of the optimal block size an the gain, as compute by state-of-the-art ata locality optimizing algorithms, are provie. In Section 3.2, the average gain (an the associate optimal block size) erive from the expressions of Section 2 is compute. The threshol value of N for which blocking is useful is compute in Section 3.3. The ierences between accurate an average evaluation of interferences are highlighte in Section 3.4. In Figure 5 the curves corresponing to the ierent expressions of the gain are plotte. Each curve is explaine in one of the following sections. 3. Theoretical Optimal Block Size an Theoretical Gain To ate, the most elaborate metho for computing the optimal block size in any loop can be foun in Eisenbeis et al. [990], so we will start from that point. In Eisenbeis et al. [990], for each reference, the set of ata to be reuse is calle the reference winow. The principle is to n a block size so that all winows t in cache, an which minimizes the number of cache misses. In Eisenbeis et al. [990], only capacity misses are consiere.
8 8 Gain g = Mn / Mb Precise Average Theoretical B=N N (M=N) Gain g = Mn / Mb Precise Average Theoretical B=min(N,Cs) N (M=N) Gain g = Mn / Mb Precise Average Theoretical B=Cs N (M=N) Gain g = Mn / Mb Precise Average Theoretical B=Cs N (M=N) Fig. 5 Optimal gain, preicte precise optimal gain, preicte average optimal gain, theoretical optimal gain. Let us illustrate this process with blocke matrix-vector multiply. The reference winow corresponing to array Y has a size of cache line. For array X it is equal to B= cache lines. An there is no winow for array A because it is not reuse. No reuse is assume to occur for arrays to which blocking is not applie, i.e., array Y. So the number of cache misses of array Y is equal to N=B N=. The number of misses of array A is equal to N 2 = (compulsory misses). Finally, since interferences are ignore, an the winow of B is assume to t in cache, the number of misses of array X is equal to N=B B=. The optimization problem is then the following B N B C S Minimize m b = N B N + N B B = N 2 B + N So, in this case, the problem is equivalent to maximizing B uner the constraints. If N < C S, then B opt = N, an if N C S, B opt = C S, i.e., B opt = min (N; C S ). In orer to compute the gain, the number of cache misses for the nonblocke
9 9 loop nest must be evaluate. Shortly, the number of capacity misses of X in the nonblocke loop nest is equal to m t (X; X) = N (N? (N? 2(N? C S ) + ) + )=. So, N?(N?2(N?C G = m n? m b = S ) + ) + + N? = N?(N?2(N?C S )+ ) +? N 2 min (N;C S ) N 2 min + N (N;C S ) In the remainer of the article, these values of the optimal block size an the optimal gain are terme the theoretical optimal block size an the theoretical optimal gain. In Figure 5, it can be seen that the gain obtaine with the theoretical optimal block size is very low (lower than.2). Besies, the theoretical gain appears to be a strong mispreiction of both the actual gain an even the gain obtaine with the theoretical block size. The theoretical gain actually correspons to what \shoul happen" if blocking was behaving as preicte by the Winow moel, i.e., if capacity misses were remove an there were no interference miss. Incientally, the theoretical gain inicates the maximum gain that can be theoretically expecte, i.e., the ieal gain. Let us compute this maximum gain When N > 2C S, + ) + g = N N?(N?2(N?C S ) N min 2 + N2 (N;C S ) + N2 L + N S L + N S g ' 2 L N2 S = 2 ; N 2 L + N2 + S C S L C S S so g ' 2. The maximum gain that can be expecte is 2 (i.e., blocking woul ivie by 2 the number of cache misses) in the nonblocke loop X exhibits at most N 2 = cache misses; an A also exhibits N 2 = compulsory misses, while in the blocke loop X ieally exhibits only N= compulsory misses in the best case; an A still exhibits N 2 = cache misses. 3.2 Estimate Average Gain an Corresponing Optimal Block Size 3.2. Estimate Average Gain. For computing the average gain, the expression of the average values of interferences are use. Such average expressions have been erive for both the blocke an the nonblocke loops. Because of paper length constraints, the etails of computations have been omitte (see Fricker et al. [993]). N < C S. G = m n? m b = C S N < 2C S. G = m n? m b N = 2 + N? 2C S N. N 3? C S C S N 2 B min (; 2B ) C S C S B C S N 2 B + 2C S?N N 2 min (; 2B ) + N (2C S?N ) + 2N (N?C S ) C S C S N B C S B G = m n? m b = N 2 + N? N 2 B C S C S B
10 0 Because cache interferences are taken into account in the above average estimates an not in the theoretical expressions of Section 3., new terms appear, or existing terms are moie. For instance, in the rst case (N < C S ), the main new term is N 2 B=(C S ) which correspons to temporal interferences between A an X. Because this term is a function of B, it is going to aect the etermination of the optimal block size. Inee, when N < C S, the expression of the theoretical number of misses of the blocke algorithm (see m b in Section 3.) only contains one term which epens on B N 2 =(B ). Consequently, this term is minimal when the block size is the largest possible; hence B opt = min (N; C S ). Now, in the above average expression two terms epen on B (N 2 B)=(C S ) an (N 2 =(B )) min (; 2B=) which respectively increases an ecreases (or is constant) with B. Therefore, the optimal block size is either equal to a traeo value or to (see the etaile computations in Section 3.2.2). The curve Average in Figure 5 correspons to the average optimal gain. It correspons to the above expressions with B = B opt (except g is use instea of G). It is shown in Section how to erive the expression of B opt in the ierent cases. As can be seen in Figure 5, the precision of the average optimal gain is usually close to the actual optimal gain. Still, when N > C S, the actual gain is perioically slightly higher than the average gain, while the precise estimate of the gain correctly preicts such phenomena (see Figure 5). The main ierence between precise an average estimates is that array base aresses are consiere in the precise estimate. In Figure 5, the base aresses of arrays X an A have been chosen large enough that no intense interference phenomena relate to array placement can occur (r = 52). But, in Section 3.4, it is shown that array base aresses can sometimes have a major inuence on the number of interference misses, in which case the precision of the average estimate can be poor Estimate Optimal Block Size Base on the Average Gain. Let us rst inicate the optimal block size expression obtaine in each case an then provie the etails of computations. When N < C S, we obtain B opt = p C S if < p C S an B opt = if p C S (recall that = gc (M; C S )). With the theoretical expression of Section 3., we obtain that B opt = N. When N C S, B opt is either equal to p C S or p 2C S (? C S =N) while the theoretical optimal block size is equal to C S in this case. Therefore, the theoretical expression of the optimal block size is generally a strong overestimate of the optimal block size, which is conrme by Figure 5. The theoretical expression of Section 3. implies that once an element of X is loae into the cache, it will not be ushe. Therefore, the only constraint on B is that it must t in cache. That is why the number of misses of X (N= ) oes not epen on B. On the other han, the expressions compute in Sections 2 an 3.2. take into account the fact the elements of X can be ushe by elements of A. Consequently, with respect to X, the block size shoul be selecte as small as possible so that the elements of X can be reuse before they can be ushe. Intuitively, it means the reuse istance shoul be small enough that the probability an element of X is ushe before it can be reuse is negligible. That is why the number of misses of X, (N 2 B)=(C S ), increases with B.
11 In the following paragraphs, it is now shown how the expression of the optimal block size can be erive from the expression of the average gain. G is now consiere to be a function of B. It is ierentiate along B so that its variations can be analyze. The optimal value of B, i.e., B opt, is the value that maximizes the gain. The computations are mostly etaile for the rst case. N < C S. Two subcases must be istinguishe 2B= < an 2B=. B < =?N 2 =(C S ), < 0 for this interval of values of B. Therefore the local maximum is reache when B is minimum, i.e., B opt =. The corresponing value of the gain is G max = G(B opt ). B =?N 2 =(C S ) =(B 2 ). > 0 if B > p C S < 0 otherwise. Thus, G increases up to the value B = max ( p C S ; =2) an ecreases afterwar. So B opt2 = max ( p C S ; =2). The maximal value of the gain is G max2 = G(B opt2 ). The maximal gain for this interval of N is the largest of the two gains, i.e., G max = max (G max ; G max2 ). These values must then be compare to n the global optimum B opt among B opt an B opt2. If p C S < =2, then G( ) an G(=2) shoul be compare. We obtain G( )? G( 2 ) = N 2 2C S C S Thus G( ) > G(=2) if > 2, which is assume. Hence B opt =. If p CS =2, then G( ) an G( p C S ) shoul be compare, which gives G( )? G( p 2 C S ) = N 2 ( p?? 2 ) CS C S Thus G( ) > G( p C S ) if 2=( p CS ) > =C S + 2=( ), which is equivalent to > p C S (? l= p C S )? ' p C S. The optimal block size for this interval of N is B opt = p C S if < p C S an B opt = otherwise. C S N < 2C S. The same subcases must be istinguishe. We obtain the following B < =2. B opt = min ( p 2(N? C S )C S =N; =2). B =2. B opt2 = max ( p C S ; =2), an these two local maxima are then compare. Note that p 2(N? C S )C S =N < p C S ; thus three cases must be istinguishe, accoring to the respective positions of an interval [ p 2(N? C S )C S =N; p CS ]. Computations show that the optimal block size is B opt = p C S if < h(n) an B opt = p 2(N? C S )C S =N otherwise, where h(n) = 2(N?C S )N p CS + 2C S?N N 2C S?N N 2N2 p Np 2 + N2 p? N 2(N?C S )C S ( CS CS N C N +) S Note that h(n) = p C S if N = C S, an h(n) = 0 if N = 2C S. 2C S N. Here there are no subcases, an B opt = p C S.
12 2 Gain g = Mn / Mb.3.2. Preicte Theoretical r r < B B B < r an B+r < Ls Cache X A N (M=N) r (a) (b) Fig. 6 (a) Determining when blocking is useful. (b) Variations of interferences between X an A. 3.3 Threshol Value of N In this section, the problem of etermining the threshol value N thr of N for which blocking is protable is briey aresse. Basically, blocking is useful if G > 0. Accoring to the theoretical expression of Section 3., G > 0 if N > C S, i.e., N thr = C S. Now, accoring to the expressions of Section 3.2., if N < C S, G > 0 as soon as N is approximately greater than 2 p C S. In fact, the theoretical expression only consiers capacity misses, so that blocking can only become protable if X is larger than the cache, i.e., if N > C S. However, cache interferences can occur even when capacity misses still o not occur. As explaine in Section 3.2.2, blocking has the eect of reucing the reuse istance of X so that it can be useful for minimizing interferences only. That is why N thr is actually much smaller than C S. This observation is conrme by Figure 6(a) (2 p C S ' 64 for C S = 024). 3.4 Estimate Precise Gain an Corresponing Optimal Block Size In Section 2 it has been shown how to obtain a precise evaluation of cross interferences between two arrays. The precise gain is simply obtaine by cumulating such expressions for all pairs of arrays, as was one for the average gain in Section 3.2. Because of paper length constraints, the full expressions are not provie here. On the other han, the ierences between average an precise gain are highlighte. These ierences occur when = gc (M; C S ) is large. The number of possible cache positions of the blocks of size B of A is equal to C S =. So, if is large, there are few such positions. Consequently, the corresponing cache locations are heavily reference. With respect to X, A appears to be istribute into few cache intervals separate by holes (see Figures 2(a) an 6(b)). Therefore, if array X overlaps with one such interval, interferences between A an X are very intense. Overlapping occurs if the relative cache istance r between A an X is smaller than B (see Figure 6(b), case r < B). Overlapping oes not occur if B < r an B + r <. So array base aresses play a signicant role when is large. This is illustrate
13 Precise Average Precise Average Gain g = Mn / Mb Gain g = Mn / Mb Block size B Block size B Fig. 7. Inuence of array base aresses on the optimal block size ( = 52 an = 4). in Figure 7 where = 52 (N = 52), r = 8, an B is varie. The actual optimal value of B is equal to 8, which is correctly preicte by the precise estimate. On the other han, if the average estimate correctly preicts the optimal gain that can be expecte, it is inepenent of r, an therefore it fails to preict for which value of B interferences will occur. For a similar value of N (N = 56), = 4, an performance variations are inepenent of r. Consequently, the average estimate remains precise for all values of B. Also, the average estimate is poor when is large because array base aresses are not consiere. On the other han, the accurate estimate successfully preicts performance variations. Note that this particularity of external cross-interferences can be exploite. If there are holes between two intervals of A, then B shoul be selecte small enough that an interval of X ts into one such hole. In this case, no cross-interference occurs between A an X. This cannot be achieve when is too small (smaller than ) because the optimal block size nees to be at least equal to in orer to exploit spatial locality. However, if is relatively small, it is not obvious that selecting a very small B will yiel important benets consiering the traeos impose by array Y (see Section 3.2.). Another solution is to ajust the relative base aress r (by ajusting the base aress of A or X) so that B < r. If this is not possible because of potential negative inuence on other loops in the coe, another solution is simply to copy array X in another array with a suitable base aress. 4. APPLICATIONS The techniques presente at the beginning of this article consist in precisely evaluating the number of external cross-interferences between two arrays by rst examining their relative cache positions an then computing the number of overlapping array elements. These techniques have been applie to the three ierent pairs of arrays that can be foun in matrix-vector multiply an which exhibit ierent patterns of relative cache positions. As far as the loop nest epth is small, these techniques can be extene to any pair of arrays. If the relative position epens on many
14 4 inices, the same techniques can be use for the innermost inex or inices, an an average estimate can be use for the outer inices (this was one for the pair X; Y ). Accurate evaluation of cache interferences is important for checking whether restructuring techniques o not inuce negative sie-eects that egrae potential benets. It clearly appears in this article that blocking is a elicate traeo which epens on loop an array parameters. Incluing such techniques in a compiler has not yet been achieve, but it is a possible follow-up to this stuy. A rst implementation coul be limite to average estimates which can be erive relatively easily. Precise estimates are more icult to implement, but a rst solution is to only etect that precise estimates are neee by ientifying high-risk cases. For instance, if the relative position of a pair of arrays only epens on one inex with coecient M, the test woul be simply to check the value of parameter = gc (M; C S ). If is large, intense interferences coul occur epening on the array base aresses, an a conservative (but nonoptimal) attitue woul be to select a block size of in these cases. A more immeiate application of this moel is the evelopment of a linear algebra library nely tune for caches. Though it is not possible to act on array base aresses, block size ajustment an copying provie sucient exibility to exploit fully the ierent cases etaile in this article. 5. CONCLUSIONS Several conclusions can be rawn from this analysis of matrix-vector multiply. First, accurately evaluating external cross-interference misses an eriving an analytical expression of the number of such cache misses are tractable tasks. Secon, the optimal block size, as compute by current ata locality optimizing algorithms, is highly inaccurate, because only capacity misses are consiere. If interference misses are taken into account in the optimization problem, the solution then becomes an accurate evaluation of the optimal block size. Thir, average estimate of external cross-interferences is frequently but not always sucient because, in some cases, array base aresses can strongly inuence the occurrence an intensity of cache interferences. REFERENCES Eisenbeis, C., Jalby, W., Winheiser, D., an Boin, F A strategy for array management in local memory. In Proceeings of the 3r Workshop on Programming Languages an Compilers for Parallel Computing. Irvine, California. Esseghir, K Improving ata locality for caches. M.S. thesis, Univ of Texas, Houston, Tex. Ferrante, J., Sarkar, V., an Thrash, W. 99. On estimating an enhancing cache eectiveness. In Proceeings of the 4th Workshop on Languages an Compilers for Parallel Computing. Santa Clara, California. Fricker, C., Temam, O., an Jalby, W Accurate evaluation of blocke algorithms cache interferences. Tech. rep., Leien Univ., Leien, The Netherlans. Mar. Kane, G. an Heinrich, J MIPS RISC Architecture. Prentice-Hall, Englewoo Clis, N.J. Lam, M., Rothberg, E. E., an Wolf, M. E. 99. The cache performance of blocke algorithms. In 4th International Conference on Architectural Support for Programming Languages an Operating Systems. ACM, New York,
15 5 McKinley, K. S Automatic an interactive parallelization. Ph. D. thesis, Tech. Rep. CRPC-TR9224, Rice Univ, Houston, Tex. Porterfiel, A. K Software Methos for Improvement of Cache Performance on Supercomputer Applications. Ph. D. thesis, Tech. Rep. CRPC-TR89-93, Rice Univ, Houston, Tex. Sites, R. L Alpha Architecture Reference Manual. Digital Press, Befor, Mass. Temam, O., Fricker, C., an Jalby, W Impact of cache interferences on usual numerical ense loop nests. In Proc. IEEE, special issue on Computer Performance Evaluation. Temam, O., Fricker, C., an Jalby, W Cache interference phenomena. In Proceeings of the ACM SIGMETRICS Conference on Measurement an Moeling of Computer Systems. (Nashville, Tenn.). ACM, New York. Wolf, M. an Lam, M. 99. A ata locality optimizing algorithm. In Proceeings of the ACM SIGPLAN '9 Conference on Programming Language Design an Implementation. SIGPLAN Not. 26, 6, Receive January 994; revise June 994; accepte February 995
Impact of cache interferences on usual numerical dense loop. nests. O. Temam C. Fricker W. Jalby. University of Leiden INRIA University of Versailles
Impact of cache interferences on usual numerical ense loop nests O. Temam C. Fricker W. Jalby University of Leien INRIA University of Versailles Niels Bohrweg 1 Domaine e Voluceau MASI 2333 CA Leien 78153
More informationOnline Appendix to: Generalizing Database Forensics
Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that
More informationAlmost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control
Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract
More informationComputer Organization
Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.
More informationLearning Polynomial Functions. by Feature Construction
I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate
More informationIndexing the Edges A simple and yet efficient approach to high-dimensional indexing
Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science
More informationQueueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks
Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication
More informationGeneralized Edge Coloring for Channel Assignment in Wireless Networks
Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science
More informationRandom Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation
DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,
More informationWilliam S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract
Optimization Methos for Calculating Design Imprecision y William S. Law Eri K. Antonsson Engineering Design Research Laboratory Division of Engineering an Applie Science California Institute of Technology
More informationRecitation Caches and Blocking. 4 March 2019
15-213 Recitation Caches an Blocking 4 March 2019 Agena Reminers Revisiting Cache Lab Caching Review Blocking to reuce cache misses Cache alignment Reminers Due Dates Cache Lab (Thursay 3/7) Miterm Exam
More informationYet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien
Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,
More information2-connected graphs with small 2-connected dominating sets
2-connecte graphs with small 2-connecte ominating sets Yair Caro, Raphael Yuster 1 Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel Abstract Let G be a 2-connecte graph. A
More informationCoupling the User Interfaces of a Multiuser Program
Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces
More informationGeneralized Edge Coloring for Channel Assignment in Wireless Networks
TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html
More informationSkyline Community Search in Multi-valued Networks
Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h
More informationHere are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus
More informationBIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES
BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations
More informationConsidering bounds for approximation of 2 M to 3 N
Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for
More informationP. Fua and Y. G. Leclerc. SRI International. 333 Ravenswood Avenue, Menlo Park, CA
Moel Driven Ege Detection P. Fua an Y. G. Leclerc SI International 333 avenswoo Avenue, Menlo Park, CA 9425 (fua@ai.sri.com leclerc@ai.sri.com) Machine Vision an Applications, 3, 199 Abstract Stanar ege
More informationWhen Clusters Meet Partitions: Dennis J.-H. Huang and Andrew B. Kahng. UCLA Computer Science Department, Los Angeles, CA USA
When Clusters Meet Partitions: New Density-Base Methos for Circuit Decomposition Dennis J.-H. Huang an Anrew B. Kahng UCLA Computer Science Department, Los Angeles, CA 90024-596 USA jenhsin@cs.ucla.eu,
More informationLoop Scheduling and Partitions for Hiding Memory Latencies
Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:
More informationWaleed K. Al-Assadi. Anura P. Jayasumana. Yashwant K. Malaiya y. February Colorado State University
Dierential I DDQ Testable Static RAM Architecture Walee K. Al-Assai Anura P. Jayasumana Yashwant K. Malaiya y Technical Report CS-96-102 February 1996 Department of Electrical Engineering/ y Department
More informationMultimodal Stereo Image Registration for Pedestrian Detection
Multimoal Stereo Image Registration for Peestrian Detection Stephen Krotosky an Mohan Trivei Abstract This paper presents an approach for the registration of multimoal imagery for peestrian etection when
More informationLab work #8. Congestion control
TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño
More informationKinematic Analysis of a Family of 3R Manipulators
Kinematic Analysis of a Family of R Manipulators Maher Baili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S. 6597 1, rue e la Noë, BP 92101,
More informationInvestigation into a new incremental forming process using an adjustable punch set for the manufacture of a doubly curved sheet metal
991 Investigation into a new incremental forming process using an ajustable punch set for the manufacture of a oubly curve sheet metal S J Yoon an D Y Yang* Department of Mechanical Engineering, Korea
More informationTransient analysis of wave propagation in 3D soil by using the scaled boundary finite element method
Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite
More informationNon-homogeneous Generalization in Privacy Preserving Data Publishing
Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h
More informationShift-map Image Registration
Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.
More informationd 3 d 4 d d d d d d d d d d d 1 d d d d d d
Proceeings of the IASTED International Conference Software Engineering an Applications (SEA') October 6-, 1, Scottsale, Arizona, USA AN OBJECT-ORIENTED APPROACH FOR MANAGING A NETWORK OF DATABASES Shu-Ching
More information1 Surprises in high dimensions
1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic
More informationLearning Subproblem Complexities in Distributed Branch and Bound
Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University
More informationQuestions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!
EE122 Fall 2013 HW3 Instructions Recor your answers in a file calle hw3.pf. Make sure to write your name an SID at the top of your assignment. For each problem, clearly inicate your final answer, bol an
More informationX y. f(x,y,d) f(x,y,d) Peak. Motion stereo space. parameter space. (x,y,d) Motion stereo space. Parameter space. Motion stereo space.
3D Shape Measurement of Unerwater Objects Using Motion Stereo Hieo SAITO Hirofumi KAWAMURA Masato NAKAJIMA Department of Electrical Engineering, Keio Universit 3-14-1Hioshi Kouhoku-ku Yokohama 223, Japan
More informationBends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways
Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis
More informationClassifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means
Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu
More informationComparison of Methods for Increasing the Performance of a DUA Computation
Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,
More information0607 CAMBRIDGE INTERNATIONAL MATHEMATICS
CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Seconary Eucation MARK SCHEME for the May/June 03 series 0607 CAMBRIDGE INTERNATIONAL MATHEMATICS 0607/4 Paper 4 (Extene), maximum
More informationEstimating Velocity Fields on a Freeway from Low Resolution Video
Estimating Velocity Fiels on a Freeway from Low Resolution Vieo Young Cho Department of Statistics University of California, Berkeley Berkeley, CA 94720-3860 Email: young@stat.berkeley.eu John Rice Department
More information0607 CAMBRIDGE INTERNATIONAL MATHEMATICS
PAPA CAMBRIDGE CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Seconary Eucation MARK SCHEME for the May/June 0 series CAMBRIDGE INTERNATIONAL MATHEMATICS /4 4 (Extene), maximum
More informationOn the Placement of Internet Taps in Wireless Neighborhood Networks
1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that
More informationOn the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems
On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,
More informationNAND flash memory is widely used as a storage
1 : Buffer-Aware Garbage Collection for Flash-Base Storage Systems Sungjin Lee, Dongkun Shin Member, IEEE, an Jihong Kim Member, IEEE Abstract NAND flash-base storage evice is becoming a viable storage
More informationState Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway
State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle
More informationImproving Performance of Sparse Matrix-Vector Multiplication
Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign
More informationChapter 9 Memory Management
Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory
More informationA Classification of 3R Orthogonal Manipulators by the Topology of their Workspace
A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.
More informationAll-to-all Broadcast for Vehicular Networks Based on Coded Slotted ALOHA
Preprint, August 5, 2018. 1 All-to-all Broacast for Vehicular Networks Base on Coe Slotte ALOHA Mikhail Ivanov, Frerik Brännström, Alexanre Graell i Amat, an Petar Popovski Department of Signals an Systems,
More informationAnyTraffic Labeled Routing
AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,
More informationModifying ROC Curves to Incorporate Predicted Probabilities
Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,
More informationRobust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks
Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,
More information6.823 Computer System Architecture. Problem Set #3 Spring 2002
6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem
More informationObject Recognition Using Colour, Shape and Affine Invariant Ratios
Object Recognition Using Colour, Shape an Affine Invariant Ratios Paul A. Walcott Centre for Information Engineering City University, Lonon EC1V 0HB, Englan P.A.Walcott@city.ac.uk Abstract This paper escribes
More informationOffloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization
1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24
More informationLesson 11 Interference of Light
Physics 30 Lesson 11 Interference of Light I. Light Wave or Particle? The fact that light carries energy is obvious to anyone who has focuse the sun's rays with a magnifying glass on a piece of paper an
More informationFrequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises
verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e
More informationSURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH
SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway
More informationA Plane Tracker for AEC-automation Applications
A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)
More informationCoordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks
Coorinating Distribute Algorithms for Feature Extraction Offloaing in Multi-Camera Visual Sensor Networks Emil Eriksson, György Dán, Viktoria Foor School of Electrical Engineering, KTH Royal Institute
More informationEFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER
FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In
More informationDesign of Policy-Aware Differentially Private Algorithms
Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft
More informationTable-based division by small integer constants
Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr
More informationDivide-and-Conquer Algorithms
Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:
More informationE2EM-X4X1 2M *2 E2EM-X4X2 2M Shielded E2EM-X8X1 2M *2 E2EM-X8X2 2M *1 M30 15 mm E2EM-X15X1 2M *2 E2EM-X15X2 2M
Long-istance Proximity Sensor EEM CSM_EEM_DS_E_7_ Long-istance Proximity Sensor Long-istance etection at up to mm enables secure mounting with reuce problems ue to workpiece collisions. No polarity for
More informationImage compression predicated on recurrent iterated function systems
2n International Conference on Mathematics & Statistics 16-19 June, 2008, Athens, Greece Image compression preicate on recurrent iterate function systems Chol-Hui Yun *, Metzler W. a an Barski M. a * Faculty
More informationPairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11:
airwise alignment using shortest path algorithms, Gunnar Klau, November 9,, : 3 3 airwise alignment using shortest path algorithms e will iscuss: it graph Dijkstra s algorithm algorithm (GDU) 3. References
More informationDepartment of Computer Science, POSTECH, Pohang , Korea. (x 0 (t); y 0 (t)) 6= (0; 0) and N(t) is well dened on the
Comparing Oset Curve Approximation Methos Gershon er +, In-Kwon, an Myung-Soo Kim + Department of Computer Science, Technion, IIT, Haifa 32000, Israel Department of Computer Science, POSTECH, Pohang 790-784,
More informationGabriel Rivera, Chau-Wen Tseng. Abstract. Linear algebra codes contain data locality which can be exploited
A Comparison of Compiler Tiling Algorithms Gabriel Rivera, Chau-Wen Tseng Department of Computer Science, University of Maryland, College Park, MD 20742 Abstract. Linear algebra codes contain data locality
More informationPreamble. Singly linked lists. Collaboration policy and academic integrity. Getting help
CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure
More informationACE: And/Or-parallel Copying-based Execution of Logic Programs
ACE: An/Or-parallel Copying-base Execution of Logic Programs Gopal GuptaJ Manuel Hermenegilo* Enrico PontelliJ an Vítor Santos Costa' Abstract In this paper we present a novel execution moel for parallel
More informationVerifying performance-based design objectives using assemblybased vulnerability
Verying performance-base esign objectives using assemblybase vulnerability K.A. Porter Calornia Institute of Technology, Pasaena, Calornia, USA A.S. Kiremijian Stanfor University, Stanfor, Calornia, USA
More informationUsing Ray Tracing for Site-Specific Indoor Radio Signal Strength Analysis 1
Using Ray Tracing for Site-Specific Inoor Raio Signal Strength Analysis 1 Michael Ni, Stephen Mann, an Jay Black Computer Science Department, University of Waterloo, Waterloo, Ontario, NL G1, Canaa Abstract
More informationAppearance Sensing distance Output configuration Operation mode Model. Appearance Sensing distance Output configuration Operation mode Model
Spatter-resistant Proximity Sensor EEQ CSM_EEQ_DS_E Spatter-resistant Fluororesincoate Proximity Sensor Superior spatter resistance. Long Sensing-istance s ae for sensing istances up to mm. Pre-wire Smartclick
More informationA New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity
Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,
More informationTHE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE
БСУ Международна конференция - 2 THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE Evgeniya Nikolova, Veselina Jecheva Burgas Free University Abstract:
More informationA shortest path algorithm in multimodal networks: a case study with time varying costs
A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via
More informationParallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm
NASA/CR-1998-208733 ICASE Report No. 98-45 Parallel Directionally Split Solver Base on Reformulation of Pipeline Thomas Algorithm A. Povitsky ICASE, Hampton, Virginia Institute for Computer Applications
More informationDistributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs
IEEE TRANSACTIONS ON KNOWLEDE AND DATA ENINEERIN, MANUSCRIPT ID Distribute Line raphs: A Universal Technique for Designing DHTs Base on Arbitrary Regular raphs Yiming Zhang an Ling Liu, Senior Member,
More informationPolitecnico di Torino. Porto Institutional Repository
Politecnico i Torino Porto Institutional Repository [Proceeing] Automatic March tests generation for multi-port SRAMs Original Citation: Benso A., Bosio A., i Carlo S., i Natale G., Prinetto P. (26). Automatic
More informationFINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD
Warsaw University of Technology Faculty of Physics Physics Laboratory I P Joanna Konwerska-Hrabowska 6 FINDING OPTICAL DISPERSION OF A PRISM WITH APPLICATION OF MINIMUM DEVIATION ANGLE MEASUREMENT METHOD.
More informationExercises of PIV. incomplete draft, version 0.0. October 2009
Exercises of PIV incomplete raft, version 0.0 October 2009 1 Images Images are signals efine in 2D or 3D omains. They can be vector value (e.g., color images), real (monocromatic images), complex or binary
More informationClassical Mechanics Examples (Lagrange Multipliers)
Classical Mechanics Examples (Lagrange Multipliers) Dipan Kumar Ghosh Physics Department, Inian Institute of Technology Bombay Powai, Mumbai 400076 September 3, 015 1 Introuction We have seen that the
More informationImproving Spatial Reuse of IEEE Based Ad Hoc Networks
mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos
More information6 Gradient Descent. 6.1 Functions
6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output
More informationMultilevel Paging. Multilevel Paging Translation. Paging Hardware With TLB 11/13/2014. CS341: Operating System
CS341: Operating System Lect31: 21 st Oct 2014 Dr A Sahu Dept o Comp Sc & Engg Inian Institute o Technology Guwahati ain Contiguous Allocation, Segmentation, Paging Page Table an TLB Paging : Larger Page
More informationAn Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources
An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract
More informationThreshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides
Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita
More informationIntensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2
This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar
More informationAppearance Sensing distance Output configuration Operation mode Model. Appearance Sensing distance Output configuration Operation mode Model
Spatter-resistant Proximity Sensor EEQ CSM_EEQ_DS_E Spatter-resistant Fluororesincoate Proximity Sensor Superior spatter resistance. Long Sensing-istance s ae for sensing istances up to mm. DC -Wire s.
More informationIEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 1, NO. 4, APRIL 01 74 Towar Efficient Distribute Algorithms for In-Network Binary Operator Tree Placement in Wireless Sensor Networks Zongqing Lu,
More informationfiltering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember
107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,
More informationModule13:Interference-I Lecture 13: Interference-I
Moule3:Interference-I Lecture 3: Interference-I Consier a situation where we superpose two waves. Naively, we woul expect the intensity (energy ensity or flux) of the resultant to be the sum of the iniviual
More informationPolitehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques
Politehnica University of Timisoara Mobile Computing, Sensors Network an Embee Systems Laboratory ing Techniques What is testing? ing is the process of emonstrating that errors are not present. The purpose
More informationQ. No. 1 Newton postulated his corpuscular theory of light on the basis of
Q. No. 1 Newton postulate his corpuscular theory of light on the basis of Newton s rings Option Rectilinear propagation of light Colour through thin films Dispersion of white light into colours. Correct
More informationAn FFT-based Method for Attenuation Correction in Fluorescence Confocal Microscopy Roerdink, Johannes; Bakker, M.
University of Groningen An FFT-base Metho for Attenuation Correction in Fluorescence Confocal Microscopy Roerink, Johannes; Bakker, M. Publishe in: Default journal IMPORTANT NOTE: You are avise to consult
More informationDesign of Controller for Crawling to Sitting Behavior of Infants
Design of Controller for Crawling to Sitting Behavior of Infants A Report submitte for the Semester Project To be accepte on: 29 June 2007 by Neha Priyaarshini Garg Supervisors: Luovic Righetti Prof. Auke
More informationFeature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory
Feature Extraction an Rule Classification Algorithm of Digital Mammography base on Rough Set Theory Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative
More informationMessage Transport With The User Datagram Protocol
Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer
More informationChalmers Publication Library
Chalmers Publication Library All-to-all Broacast for Vehicular Networks Base on Coe Slotte ALOHA This ocument has been ownloae from Chalmers Publication Library (CPL). It is the author s version of a work
More information