Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department o

Size: px
Start display at page:

Download "Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department o"

Transcription

1 Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department of Computer Science and Information Engineering National Central University Chung-Li 32054, Taiwan steven@axp1.csie.ncu.edu.tw sheujp@csie.ncu.edu.tw z Department of Computer and Information Science The Ohio State University Columbus, OH chh@cis.ohio-state.edu Abstract This paper presents compilation techniques to compress holes, which are memory locations mapped by useless template cells and are caused by the non-unit alignment stride in a two-level data-processor mapping. A two-level data-processor mapping provides user to specify data-processor mapping by aligning related array objects with a template, and then distributing the template onto the user-declared abstract processors. In a two-level data-processor mapping, there is a repeated pattern for array elements mapped onto processors. We classify blocks into classes and use a class table to record the attributes of classes for the data distribution. Similarly, data distribution on a processor also has a repeated pattern. We use compression table to record the attributes of the rst data distribution pattern on that processor. By using class table and compression table, hole compression can be easily and eciently achieved. Compressing holes can save memory usage, improve spatial locality and further increase system performance. The proposed method is ecient, stable and easy implement. The experimental results do conrm the advantages of our proposed method over existing methods. Keywords: Communication Set, Distributed-Memory Multicomputers, High Performance Fortran, Hole Compression, Two-Level Data-Processor Mapping. 1 Introduction Distributed-memory multicomputers are superior than shared-memory multiprocessors in cost and scalability. These advantages have made distributed-memory multicomputers be extensively used in scientic and engineering computation. However, the absence of a global shared address causes the diculty to program in distributed-memory multicomputers. Programmers have to not only distribute the computation and data onto processors but also manage communication among processors. Currently, parallelizing compilers try to ll up the gap between users and machines to relieve the burden of such a tedious and error-prone work from This work was supported in part by the NSC of the ROC under Grant #NSC E

2 !HPF$ PROCESSORS P ROC(P )!HPF$ ALIGN A(i) WITH T (s i + o)!hpf$ DISTRIBUTE T (cyclic(x)) ONTO P ROC Fig. 1: HPF-like loop model considered in the paper. users. Many researchers investigate the extension of existing languages to provide users more convenient way to manage data distribution, such as Fortran D [6, 25], HPF (High Performance Fortran) [8, 16], and Vienna Fortran [2, 3], etc. The major characteristic of these languages is to provide users directives to specify data distribution at language level. Generally speaking, data parallel languages support two-level data-processor mapping. A two-level dataprocessor mapping provides user to specify data-processor mapping by aligning related array objects with a template, an abstract index space, and then distributing the template onto the user-declared abstract processors. In distribution phase, three regular data distributions, block, cyclic, and block-cyclic data distributions, are provided by these languages. The distribution that contiguous blocks of array elements are evenly distributed onto processors is block distribution. The cyclic distribution is to distribute each array element onto processors one at a time and in a round-robin fashion. The distribution that blocks of size x are distributed onto processors in a round-robin fashion is block-cyclic distribution and is denoted as cyclic(x). Block-cyclic distribution is known to be the most general data distribution. A Block or a cyclic distribution can be represented by a block-cyclic distribution. A cyclic(d NA P e) distribution is, basically, a block distribution and the cyclic distribution can be represented by cyclic(1), where N A is the number of array elements and P is the number of processors. Consequently, the program model considered in this paper is demonstrated in Fig. 1, where P is the number of processors, s is the alignment stride, o is the alignment oset and x is the distribution block size. Fig. 2 illustrates an example of this model, where P = 4, s = 3, o = 1, and x = 5. The white squares represent the array elements of A and the number in the square is the global index of that array element. The gradations of gray squares represent dierent template cells mapped to dierent processors and the number in the square is the global index of that template cell. A template is an abstract index space and whose elements have no content and occupy no storage [8]. Therefore, in real situation, we concern actually about how array elements are distributed onto processors, not how the template cells are distributed onto processors. Furthermore, if the alignment stride is non-unit, the two-level data-processor mapping will create holes; that is, there are a lot of template cells with which no array element is aligned. These holes imply that the memory locations are never used. Fig. 3(a) illustrates the distribution of data elements onto processors without hole compression. On the contrary Fig. 3(b) illustrates that of with hole compression. Obviously, there are 30 memory spaces that should be allocated by each processor if hole compression is not performed. However, only a few template cells are aligned by array elements. The rest of template cells, which have no array elements aligned with, are never used and are holes. Therefore, after hole compression performed, only 10 memory spaces are needed by each processor. Memory holes caused by the non-unit alignment stride will result in a large amount of memory wastage, even for a small alignment stride. Therefore, only the template cells aligned by array elements need to be mapped to memory locations and all the holes should be removed. Suppose the number of template cells is N T and s is the alignment stride. Thus, only N T =s template cells are aligned by array elements. The NT =s percentage of memory usage is N T 100% and equals 1 s 100%. In other words, the percentage of memory wastage is s?1 s 100%. The larger the alignment stride is, the more the memory usage wastes. Even for the 2

3 Fig. 2: A Two-level data-processor mapping. Array A(i) is aligned with template T (3 i + 1) and the template is then distributed onto 4 processors with cyclic(5) distribution. 3

4 (a) (b) Fig. 3: The distributions of data elements onto processors without and with hole compression. (a) The distribution of data elements onto processors without hole compression. (b) The distribution of data elements onto processors with hole compression. 4

5 least positive non-unit alignment stride 2, it still has 50% waste of memory space. Therefore, compressing holes is quite necessary and important. The processes that map the useful template cells to processors and eliminate useless holes are called hole compression. In addition to increase memory usage, removing holes can also improve spatial locality and, furthermore, achieve higher performance. To eciently achieve the goal of removing holes, this paper presents compilation techniques for compiling two-level data-processor mappings. Observing the distribution of data onto processors, one can nd out that the distribution patterns for every three blocks are identical as shown in Fig. 2. By the observation, we classify all blocks into classes and design a table, named class table, to record the attributes of the rst repeated pattern. On the other hand, from processor viewpoint, the above observation is true as well. Therefore, another table, compression table, are established to record the attributes of the rst repeated data distribution pattern on that processor. Compression table is established according to the class table. We design systematic methods to construct these tables and the time complexity of each construction is O(s) in worst case, where s is the alignment stride. Hole compression can be easily achieved by using compression table accordingly. Table-lookup approach can save a lot of redundant computations since the computations of repetition of xed patterns can be obtained by table lookup instead of recomputations. Experimental results verify the advantages of the proposed approach. Moreover, the proposed approach has high stability against existing methods. The execution time changes little with the variations of the alignment stride and the distribution block size. In addition, from implementation viewpoint, the proposed approach can be easily implemented as well. This paper is organized as follows. On compiling two-level data-processor mapping, class table is useful for summarizing the two-level mapping. Therefore, the structure and characteristics of class table are presented in Section 2. Section 3 introduces how to utilize class table for compressing holes and generating global indices of array elements contained by each processor. Here a useful structure termed as compression table is also introduced. Experimental results to show the advantages of ours over existing methods are provided in Section 4. Section 5 describes the related work. Section 6 concludes the paper and points out the possible direction of future research as well. 2 Characteristics of Class Table As compiling array statements, we have designed a useful structure to summarize the characteristics of access patterns for an array statement, while the data distribution is block-cyclic distribution [28]. Based on the similar concept, a structure named class table is designed in this paper to record the attributes of a two-level data-processor mapping. In this section we briey describe the basic components of class table. Without loss of generality, we assume every numbering system is starting from zero, such as numbering array elements, template cells, and processors, etc. Suppose array element A(i) is aligned with template T at (s i + o), where s is the alignment stride and o is the alignment oset. Two dierent alignment osets o 1 and o 2 lead to isomorphic data-processor mapping if and only if o 1 o 2 (mod s). In order to reuse the same class table for isomorphic data-processor mappings, class table regards the alignment oset o as another reduced one r, where r = (o mod s). For example, suppose array A(i) is aligned with a template T at (3i + 10). The alignment phase is illustrated in Fig. 4. In this example, s = 3 and o = 10. Since 10 > 3, the alignment oset is reduced to 1, which is equal to (10 mod 3). For the reduced alignment, a template cell t has an array element aligned with if and only if t r (mod s), such as the template cells 1; 4; 7; 10; 13; 16;, in this example. For these template cells with which have array elements aligned, a template cell t is active if o t (s (N A? 1) + o); otherwise, it is 5

6 A p p p T Fig. 4: The illustration of active elements and pseudo active elements. The boldfaced numbers are active elements and the italic numbers are pseudo active elements. termed pseudo active, where N A is the number of array elements. For this example, the active template cells are starting from 10 and then each strides 3 and the template cells 1, 4, 7 are pseudo active elements. Fig. 4 also illustrates the active elements and pseudo active elements for this example. The boldfaced numbers are active elements and the italic numbers are pseudo active elements. Note that the pseudo active elements are viewed as the same with active elements when we are generating class table and compression table. However, the pseudo active elements will not be counted when we are generating the compressed local array. For a cyclic(x) distribution, every x template cells forms a block. A block is numbered according to the occurrence of the block in the data-processor mapping. Suppose the numbers of array elements and template cells are N A and N T, respectively. Let N b be the number of blocks. Thus N b = dn T =xe. According to the alignment stride and oset, blocks of the same format can be classied into the same class. In other words, those blocks within which have the same positions of active elements are classied into the same class. For example, consider the two-level data-processor mapping shown in Fig. 2. Blocks 0, 3, 6, 9, have the same positions of active elements. These blocks are classied into the same class. Similarly, blocks 1, 4, 7, 10, can be classied into a class and blocks 2, 5, 8, 11, is another class. The following theorem demonstrates that, for a two-level data-processor mapping, according to the positions of active elements within the blocks, all blocks can be classied into dierent classes and the number of classes is equal to s= gcd(s; x), where gcd(a; b) is the greatest common divisor of a and b. Theorem 1 For any two-level data-processor mapping, suppose array A is aligned with T at a stride s and an oset o and template T is distributed onto processors using cyclic(x) distribution. All template blocks can be classied into s= gcd(s; x) classes. Proof. Since each block is of the same size and the alignment stride is xed on s, we only have to prove that if the position of the rst active element for some block is the same as that of another block, the positions of the rest of active elements within the two blocks are the same. Suppose the position of the rst active element within block b is, 0 < s. We would like to show that the position of the rst active element is still for every period of s= gcd(s; x) blocks. The positions of active elements before or after can be formulated as (( + k s) mod x), where k 2 Z and Z is the set of integers. We claim that ( + k 0 s) mod x =, for some k 0. Therefore, the number of blocks between b and the block whose the rst active element position is the same with b is k0 s x. Since ( + k 0 s) mod x =, it implies that + k 0 s = x q +, where q 2 Z. Thus, k 0 s = x q. Because k 0 and q are integers, the unique solution to k 0 s = x q is k 0 = t lcm(s;x) s and lcm(a; b) is the least common multiple of a and b. Therefore, k0 s x and q = t lcm(s;x) x, where t 2 Z = q = t lcm(s;x) x = t s gcd(s;x). In other words, for every period of s= gcd(s; x) blocks, the rst active elements of these blocks are the same. Thus, for arbitrary two-level data-processor mapping, blocks can be classied into s= gcd(s; x) classes. The theorem is therefore obtained. 2 Let N c be the number of classes. By Theorem 1, N c = s= gcd(s; x). Since all blocks are classied into s= gcd(s; x) classes, we number the class number of each block in lexicographical order and every s= gcd(s; x) 6

7 c A f A l A n Fig. 5: Class table for the example shown in Fig. 2. N c = s= gcd(s; x) /* the number of classes */ r = o mod s /* reduced alignment oset */ DO c = 0 TO (N c? 1) A f (c) = dmax(c x? r; 0)=se A l (c) = b((c + 1) x? r? 1)=sc A n (c) = A l (c)? A f (c) + 1 ENDDO Fig. 6: Class Table Generation algorithm. blocks repeats again. In other words, block b belongs to class (b mod N c ). Blocks b 1 and b 2 belong to the same class if and only if b 1 b 2 (mod N c ). Blocks fb; (b + 1); (b + 2); : : : ; (b + N c? 1) j b 0 (mod N c )g are a period in terms of class and we call such a period a class cycle. For a class cycle, a class table is used for recording the rst, the last and the number of active elements on a class. For a class c, A f (c) represents the order of occurrence in a class cycle for the rst active element in c, A l (c) the order of occurrence in a class cycle for the last active element in c, and A n (c) the number of active elements within c. For the example shown in Fig. 2, the class table is shown in Fig. 5. In this example, blocks can be classied into 3 classes. The blocks 0, 3, 6, 9,, are class 0, the blocks 1, 4, 7, 10,, are class 1, and the blocks 2, 5, 8, 11,, are class 2. Blocks 0, 1, 2 form a class cycle and blocks 3, 4, 5 form another class cycle, and so on. For the rst class in a class cycle, the occurrences of the rst and the last active elements are 0 and 1, respectively, and the number of active elements in this class is 2. Thus, A f (0) = 0, A l (0) = 1, and A n (0) = 2. Likewise, A f (1) = 2, A l (1) = 2, and A n (1) = 1 for class 1 and A f (2) = 3, A l (2) = 4, and A n (2) = 2 for class 2. The algorithm to generate a class table is presented in Fig. 6 and is termed Class Table Generation algorithm. The major factors aecting the construction of a class table are the alignment stride s, alignment oset o, and the distribution block size x. The time complexity of Class Table Generation algorithm is O(s= gcd(s; x)). In addition to the original meanings described above, class table also contains a lot of additional information implicitly. For a class c, A f (c) implies total number of active elements contained from class 0 to class (c? 1) and (A l (c) + 1) implies total number of active elements contained from 0 up to class c. Therefore, the number of active elements contained in a cycle is equal to A l (N c? 1) + 1. Let A c be the number of active elements contained in a class cycle. Thus, A c = A l (N c? 1) + 1. Note that, if the block size x is larger than the alignment stride s, each block contains at least one active element. Thus, A l (c) is always larger than or equal to A f (c), where c is the class number of that block. However, if the block size is smaller than the alignment stride, each block contains at most one active element and there are blocks that contain no active element at all. For any block that contains no active element, A f (c) will be one larger than A l (c). Lemma 1 demonstrates the phenomenon. 7

8 Lemma 1 For any block b, A f (c) = A l (c) + 1 () A n (c) = 0; where c is the class number of b. Proof. ()): Since A n (c) = A l (c)? A f (c) + 1 and A f (c) = A l (c) + 1; obviously, A n (c) = 0. ((): Since A n (c) = A l (c)? A f (c) + 1 and A n (c) = 0; clearly, A f (c) = A l (c) The construction of a class table from a two-level data-processor mapping has been introduced. In the following section, how to use the class table to construct a compression table to extract information from a repeated data distribution pattern on a processor will be proposed. The generation of the compressed local array for a processor by using the compression table will be described next. The transformations between global and local indices are addressed as well. 3 Hole Compression HPF supports programmers directives to explicitly specify regular data distribution. It adopts two-level data-processor mapping. That is, related array objects are rst aligned with each other or aligned with a template; this group of arrays is then distributed onto the user-declared abstract processors. A two-level data-processor mapping causes a lot of useless holes if the alignment stride is non-unit. Memory holes result in memory wastage and degrade system performance. Therefore, this section proposes a compilation technique to totally eliminate these useless holes and systematically generate the sequence of array elements exactly obtained by each processor. 3.1 Construction of Compression Table Suppose a two-level data-processor mapping is of the form shown in Fig. 1. By Theorem 1, for arbitrary twolevel data-processor mapping, all blocks can be classied into N c classes, where N c = s= gcd(s; x). Therefore, for every lcm(n c ; P ) blocks, the same data distribution pattern for all processors will repeat again. In other words, blocks within a data distribution pattern can be viewed as dierent, but are identical for every data distribution pattern. Hence, we only have to consider the rst data distribution pattern and the rest of data distribution patterns can be done likewise. As a result, a compression table records the information to generate the compressed local array for each processor on the rst data distribution pattern. Similar to the class table, compression table also regards an alignment oset o as another reduced alignment oset r if o s, where r = (o mod s). Let the number of blocks in a data distribution pattern be N pb. Thus N pb = lcm(n c ; P ). The number of blocks on each processor within a data distribution pattern is, thus, equal to N pb =P, which can also be written as N c = gcd(n c ; P ). Let the number of blocks on a processor within a data distribution pattern be N p pb, then N p pb = N c= gcd(n c ; P ). That is, on each processor, each data distribution pattern contains N p pb blocks. We number a block on a processor within a data distribution pattern according to the order of occurrence of that block in the data distribution pattern. For the example shown in Fig. 2, a data distribution pattern contains 12(= N pb ) blocks. Blocks 0 to 11 form the rst data distribution pattern and blocks 12 to 23 are within the second data distribution pattern. Blocks 0 and 4 are the rst and the second occurred blocks on processors 0, blocks 1 and 5 are the rst and the second occurred blocks on processor 1 within the rst data distribution pattern, and blocks 12 and 13 are the rst occurred blocks respectively on processor 0 and 8

9 Fig. 7: The representations of the notations class, class cycle, data distribution pattern, and occurrence of a block within a data distribution pattern. 1 within the second data distribution pattern, and so on. The representations of the notations class, class cycle, data distribution pattern and occurrence of a block within a data distribution pattern are illustrated in Fig. 7. For a processor, data distributions among dierent data distribution patterns are identical. The only dierence between the rst data distribution pattern and other data distribution patterns is the indices of every pair of corresponding cells. However, for every corresponding cells, their indices are dierent in only a xed oset. Hence, for a processor, only how to compress holes for the rst data distribution pattern needs to be considered and, for the rest of data distribution patterns, only the xed oset needs to be evaluated. For every block allocated on processor p in the rst data distribution pattern, if the following three items can be evaluated, we can systematically generate the compressed local array. Thus, hole compression can be easily achieved. C p g, the global index of the rst array element in a block on processor p, C p n, the number of array elements in a block on processor p, C p l, the local index in the compressed local array for the rst array element in a block on processor p. Therefore, to facilitate hole compression, we design a structure, termed compression table, to characterize blocks in a data distribution pattern on each processor. For a block in processor p, the compression table records the following three items: Cg p, Cn, p and C p l. Take the compression table of processor p 0 as an example. Block b 0 is the rst occurred block within the rst data distribution pattern on processor p 0. The global index of the rst array element in block b 0 is 0, the number of array elements in the block is 2, and the local index of array element A(0) in the compressed local array is 0. Thus, Cg(0) 0 = 0, Cn(0) 0 = 2, and Cl 0 (0) = 0. The second occurrence of block within the rst data distribution pattern on processor p 0 is block b 4. The global index of the rst array element in block b 4 is 7, the number of array elements in the block is 1, and the local index of A(7) in the compressed local array is 2. Therefore, Cg(1) 0 = 7, Cn(1) 0 = 1, and Cl 0(1) = 2. Similarly, C0 g(2) = 13, Cn(2) 0 = 2, and Cl 0 (2) = 3. The 9

10 occurrence Cg 0 Cn 0 Cl occurrence Cg 1 Cn 1 Cl (a) The compression table of p 0. (b) The compression table of p 1. occurrence Cg 2 Cn 2 Cl occurrence Cg 3 Cn 3 Cl (c) The compression table of p 2. (d) The compression table of p 3. Fig. 8: Compression tables for processor p 0 and p 1 in the example shown in Fig. 2. (a) The compression table of processor p 0. (b) The compression table of processor p 1. (c) The compression table of processor p 2. (d) The compression table of processor p 3. compression tables of processor p 0, p 1, p 2, and p 3, for the example shown in Fig. 2 are illustrated in Fig. 8 (a), (b), (c), and (d), respectively. How to systematically construct a compression table for processor p is described as follows. Since all blocks are classied into N c classes, every N c blocks form a class cycle. For any block b, there are b b N c c class cycles appeared before b. Furthermore, each class cycle contains A c active elements. Hence, there are (b b N c ca c ) active elements in these class cycles. From Section 2, A f (c) implicitly implies the number of active elements from class 0 to class (c? 1) in a class cycle. As a result, for any block b, the global index of the rst array element in the block can be obtained by b b N c ca c +A f (c), where c is the class number of b. Therefore, Cg p = b b N c c A c + A f (c). On the other hand, Cn p can be obtained by A n (c) and C p l can be evaluated by the local index of the rst array element in the previous block plus the number of active elements contained by the previous block. Of course, the local index of the rst array element in the rst block on the rst data distribution pattern is 0. Note that C p l can also represent the number of active elements occurred before the block in a data distribution pattern. The algorithm to generate a compression table is demonstrated in Fig. 9. The time complexity of Compression Table Generation algorithm is O(N c = gcd(n c ; P )), where N c is the number of classes and P is the number of processors. By Section 2, N c = s= gcd(s; x). Hence, the worst case of Compression Table Generation algorithm is O(s), as much as that of Class Table Generation algorithm. Note that if the alignment stride is smaller than or equal to the distribution block size, (s x), each block contains at least one active element. In such a condition, the rst array element in a block obtained by the above calculation is exactly the rst array element of that block. On the contrary, if the alignment stride is larger than the distribution block size, (s > x), a block may contain no active element. The rst array element of the empty block, a block contains no active element, obtained by the above calculation is a pseudo array element. However, the pseudo array element does not aect the correctness of our proposed approach. We shall show the fact in the following section. 10

11 b = p; nelem = 0; addr = 0; /* initialization */ N p pb = N c gcd(n c;p ) DO occurrence = 0 TO (N p pb? 1) c = b mod N c /* the class number of block b */ Cg p (occurrence) = b b N c c A c + A f (c) Cn(occurrence) p = A n (c) C p l (occurrence) = addr + nelem b = b + P /*the next block on processor p */ nelem = Cn(occurrence) p /* keep the current value for the next C p l */ addr = C p l (occurrence) /* keep the current value for the next Cp l */ ENDDO Fig. 9: Compression Table Generation algorithm. 3.2 Generation of Compressed Local Arrays Using the compression table to generate the compressed local array on a processor is proposed in this subsection. Dierent alignment stride and oset cause dierent data-processor mapping. For simplifying discussion, we rst consider a two-level data-processor mapping without considering the pseudo active elements. That is, we assume o < s. However, a compressed local array should not include the pseudo active elements if pseudo active elements do exist. Therefore, it follows the discussions of the general two-level data-processor mapping containing pseudo active elements Special Case of Two-Level Data-Processor Mapping Generally speaking, the main factors aecting a two-level data-processor mapping are the alignment stride s, alignment oset o, distribution block size x and the number of processors P. However, two dierent alignment osets o 1 and o 2 lead to isomorphic data-processor mapping if and only if o 1 o 2 (mod s). Hence, the constructions of class table and compression table regard two alignment osets as the same if the two alignment osets lead to isomorphic data-processor mapping. That is, if o s, we shift the alignment oset from o to r as we are generating class table and compression table, where r equals (o mod s). In this subsection, to simplify the discussion of the generation of compressed local array, we do not take the pseudo active elements into consideration. Specically, we assume that the alignment oset o is smaller than the alignment stride s. On the other hand, since, in our proposed method, a basic unit to generate the compressed local array is a data distribution pattern, thus we also require that array elements be distributed onto multiple of data distribution patterns. To do so can avoid evaluating the pseudo active elements occurred in back of the last active element (s (N A? 1) + o). Fig. 2 is an example of such a two-level data-processor mapping. Again, here we consider that array A(i) is aligned with T (si+o) and T is distributed onto P processors using cyclic(x) distribution. As previously stated, the data distribution among dierent data distribution patterns are identical except the indices of the corresponding elements. Let the dierence of the global indices of two corresponding array elements on two contiguous data distribution patterns be global indexing oset. Therefore, if one array element is known, the corresponding array element on the previous/next data distribution pattern can be obtained by subtracting/adding the global indexing oset from/to the global index of that array element. Hence, the most ecient approach to generating the compressed local array for 11

12 Algorithm: Hole Compression algorithm for the special case of two-level data-processor mapping. Input: Compression table and global indexing oset. Output: A p, the compressed local array on processor p. Procedure: Step 1. Generate the compressed local array for the rst data distribution pattern on processor p by using the compression table. Step 2. Generate A p for the rest of data distribution patterns according to the global indexing oset. Fig. 10: A simple Hole Compression algorithm for processor p in the special case of two-level data-processor mapping. processor p is to rst generate the compressed local array for the rst data distribution pattern according to the compression table, and then, for the following data distribution patterns, the compressed local array can be generated according to previously generated compressed local array and the global indexing oset. Consider the two-level data-processor mapping shown in Fig. 2 and take the generation of the compressed local array for processor p 0 as an example. The compressed local array for the rst data distribution pattern is A(0; 1; 7; 13; 14). Accordingly, the compressed local array for the second data distribution pattern is A(20; 21; 27; 33; 34) since the global indexing oset is 20. Thus, the compressed local array of processor p 0 is A(0; 1; 7; 13; 14; 20; 21; 27; 33; 34). A simple Hole Compression algorithm for the special case of two-level data-processor mapping is shown in Fig. 10. The algorithm shown in Fig. 10 is straightforward but less ecient. Based on the similar idea, we propose another ecient algorithm to generate the compressed local array for the special case of two-level dataprocessor mapping. Let the dierence of the local indices in a compressed local array for two corresponding array elements on two contiguous data distribution patterns be local indexing oset. As the dierences of the global and the local indices of two corresponding array elements on two contiguous data distribution patterns are xed, which are equal to the global and the local indexing osets, respectively, the global indices of the compressed local array elements on the latter data distribution pattern can be obtained by adding the global indexing oset to the global indices of the corresponding compressed local array elements on the former data distribution pattern. Similarly, the local indices in the compressed local array for the compressed local array elements on the latter data distribution pattern can be obtained by adding the local indexing oset to the local indices of the corresponding compressed local array elements on the former data distribution pattern. Once the global and local indexing osets for two corresponding array elements on two contiguous data distribution patterns are gured out, the compressed local array for processor p can be eciently generated as follows. For the blocks of the same occurrence on every data distribution pattern, the global index of the rst array element can be obtained by compression table, and for the remaining array elements at the same position in each block on each data distribution pattern, its index can be obtained by adding the global indexing oset to the global index of the previous corresponding array element. Similarly, the local index of the rst array element in the rst occurred block on the rst data distribution pattern can also be obtained by the compression table. For the other corresponding array elements, the local indices can be obtained by adding the local indexing oset to the local index of the previous corresponding array element. Fig. 11 illustrates the ecient algorithm to generate the compressed local array for processor p in the special case of two-level data-processor mapping. 12

13 Algorithm: Hole Compression algorithm for the special case of two-level data-processor mapping. Input: Compression table, global and local indexing osets. Output: A p, the compressed local array on processor p. Procedure: Step 1. for the blocks of the same occurrence on every data distribution pattern do Fill in A p according to the compression table, global and local indexing osets. Fig. 11: Hole Compression algorithm for processor p in the special case of two-level data-processor mapping. Again, take the generation of the compressed local array for processor p 0 in Fig. 2 as an example. The global index of the rst array element in block b 0 on the rst data distribution pattern is 0, which can be obtained by looking up the compression table of processor p 0. The local index of A(0) is 0, which can also be obtained by the compression table. For the corresponding array elements on the second data distribution pattern in this example, the global index of that array element is 20, which is equal to 20+0, where 20 is the global indexing oset and 0 is the global index of the previous corresponding array element. The local index of A(20) is 5, which equals 5 + 0, where 5 is the local indexing oset and 0 is the local index of the previous corresponding array element. That is, A(20) is mapped to A p0 (5). By the same way, we can obtained the compressed local array A p0 = (0; 1; 7; 13; 14; 20; 21; 27; 33; 34). Implementation Issues The main concept to generate the compressed local array for the special case of two-level data-processor mapping has been discussed in the previous section. However, some implementation issues should be addressed. Since the data distributions among dierent data distribution patterns are identical except the indices of the corresponding elements, the key to generating the compressed local array is to evaluate the global indexing oset. Evidently the dierence of the global indices of two corresponding array elements on two contiguous data distribution patterns is the number of active elements in a data distribution pattern. Let A ptn be the number of active elements in a data distribution pattern. Thus A ptn is the global indexing oset. The number of active elements in a data distribution pattern can be evaluated by A ptn = d(n pb x)=se, where N pb is the number of blocks in a data distribution pattern. For instance, in the example shown in Fig. 2, the rst blocks in the rst and the second data distribution patterns are blocks 0 and 12, respectively. The rst array elements in blocks 0 and 12 are 0 and 20, respectively. The dierence between the global indices of the two corresponding array elements is 20, which is equal to A ptn. The global indices of two corresponding array elements on two contiguous data distribution patterns are xed dierence. Similarly, the local indices in a processor's compressed local array for two corresponding array elements on two contiguous data distribution patterns are also xed dierence. We term this xed dierence local indexing oset. Clearly, the dierence of the local indices in a processor's compressed local array for two corresponding array elements on two contiguous data distribution patterns equals the number of active elements on a data distribution pattern allocated onto that processor. Let A p ptn be the number of active elements allocated onto processor p in a data distribution pattern. Obviously, A p ptn is the local 13

14 indexing oset. From compression table, A p ptn = C p n(n p pb? 1) + Cp l (N p pb? 1): For processor p 0 in the example shown in Fig. 2, the number of active elements on a data distribution pattern is 5. The rst array elements on the rst and the second data distribution patterns allocated onto processor p 0 are 0 and 20, respectively. The local index of array element 0 in compressed local array is 0 and that for array element 20 is 5. The dierence of the local indices in the compressed local array of p 0 for these two corresponding array elements is 5, which is equal to A p ptn. Suppose there are N ptn data distribution patterns in the special case of two-level data-processor mapping. The number of data distribution patterns N ptn can be obtained by N ptn = N b =N pb, where N b is the number of blocks and N pb is the number of blocks in a data distribution pattern. Since the array elements are distributed onto multiple of data distribution patterns, the number of compressed local array elements on processor p can be gured out as follows. Let the number of compressed local array elements on processor p be N p A. Then N p A = N ptn A p ptn : The code to generate the compressed local array on processor p for the special case of two-level data-processor mapping is shown in Appendix A General Case of Two-Level Data-Processor Mapping The concept used by the special case of two-level data-processor mapping is the same for the general case of two-level data-processor mapping. However, the pseudo active elements are taken into consideration for the general case of two-level data-processor mapping. For example, Fig. 12 is a general two-level data-processor mapping. In this data-processor mapping, array A has 30 elements indexed from 0 to 29. Array element A(i) is aligned with a template T at a stride 3 and an oset 28. Template T is distributed onto 4 processors using cyclic(5) distribution. The two data-processor mappings shown in Figs. 2 and 12 are isomorphic since the alignment strides, distribution block sizes and the numbers of processors of two mappings are identical, and the alignment osets 1 28 (mod 3). The class table and compression table used by the data-processor mapping shown in Fig. 2 are the same for that in Fig. 12. The compressed local array of processor p 0 in Fig. 2 is A p0 = A(0; 1; 7; 13; 14; 20; 21; 27; 33; 34), which has been introduced in previous section. However, in Fig. 12, taking the pseudo active elements into consideration, the compressed local array is turned to A p0 = A(4; 5; 11; 12; 18; 24; 25). Hole Compression algorithm for the special case of two-level data-processor mapping ignores the eect caused by the pseudo active elements. In the general case of two-level data-processor mapping, the rst and the last data distribution patterns may have pseudo active elements. These two data distribution patterns should be considered separately. However, for the rest of data distribution patterns between the rst and the last data distribution patterns, there is no pseudo active element in these data distribution patterns. Hence the generation of compressed local array for these data distribution patterns can adopt the Hole Compression algorithm for the special case of data-processor mapping. The evaluations of the global and local indexing osets in the special case of two-level data-processor mapping are also important for general two-level data-processor mapping, where the global and local indexing osets are respectively the dierences of the global indices and the local indices in a compressed local array for two corresponding array elements on two contiguous data distribution patterns. In addition, for processor p, we still have to count 14

15 Fig. 12: A general two-level data-processor mapping. Array A(i) is aligned with template T (3 i + 28) and the template is then distributed onto 4 processors with cyclic(5) distribution, assuming N A =

16 Algorithm: Hole Compression algorithm for the general two-level data-processor mapping. Input: Compression table, N p A, global and local indexing osets. Output: A p, the compressed local array on processor p. Procedure: Step 1. Generate the rst A p ptn compressed local array elements according to the compression table and the number of pseudo active elements. Step 2. Generate the remainder (bn p A =Ap ptnc? 1) A p ptn compressed local array elements. Every A p ptn elements are obtained by adding the global indexing oset to the previous A p ptn elements. Step 3. Generate the last (N p A mod Ap ptn) compressed local array elements according to the previous A p ptn compressed local array elements and the global indexing oset. Fig. 13: Hole Compression algorithm for processor p on a general two-level data-processor mapping. the number of pseudo active elements on p before the rst active element o and that after the last active element (s (N A? 1) + o) within the block where the last active element is allocated. Consider the generation of compressed local array of processor p 0 for the general case of two-level data-processor mapping shown in Fig. 12. Since the data-processor mapping shown in Fig. 12 is isomorphic to that in Fig. 2, the compression tables used by the two data-processor mappings are identical and have been shown in Fig. 8. The global and local indexing osets are 20 and 5, respectively, which are the same as those in the special case of two-level data-processor mapping shown in Fig. 2. The compressed local array of processor p 0 without considering the impact caused by the pseudo active elements is A p0 = A(0; 1; 7; 13; 14; 20; 21; 27; 33; 34), as discussed in Section Take the pseudo active elements into consideration. There are totally 9 pseudo active elements appeared before o in the two-level dataprocessor mapping, which are indexed as 1, 4, 7, 10, 13, 16, 19, 22, and 25. It implies, in this example, all the global indices of array elements have to subtract 9 from their original indices in the special case of two-level data-processor mapping to get their real indices in the general case. Thus, A p0 is turned to A(?9;?8;?2; 4; 5; 11; 12; 18; 24; 25). In addition, there are three pseudo active elements on processor p 0 ( 1, 4, and 22). As a result, the previous three array elements should be kicked out. Thus, the compressed local array of processor p 0 for the general two-level data-processor mapping in Fig. 12 is A p0 = A(4; 5; 11; 12; 18; 24; 25). However, if the number of compressed local array on processor p is known, the compressed local array on processor p can be generated more eciently. Let the number of compressed local array on processor p be N p A. Suppose the local indexing oset is Ap ptn. It implies that the global indices of array elements in the compressed local array will dier in a xed oset, global indexing oset, for a period of A p ptn elements. For example, in the two-level data-processor mapping shown in Fig. 12, the global and local indexing osets are 20 and 5, respectively. If the global index of the rst compressed local array element is 4, the global index of the sixth(= 1+5) compressed local array element is 24(= 4+20). Thus we can generate the compressed local array for the rst A p ptn elements by using the compression table and the number of pseudo active elements. After that, we can generate the next A p ptn elements according to the previous A p ptn elements and the global indexing oset until the last (N p A mod Ap ptn) elements. Finally, we can generate the last (N p A mod Ap ptn) elements accordingly. The algorithm to generate the compressed local array for the general case of two-level data-processor mapping is demonstrated in Fig

17 Although the concept of Hole Compression algorithm for the general case of two-level data-processor mapping is easy, some implementation issues should be addressed. The details of the algorithm will also be explained in the following subsection. Implementation Issues For a processor p, only the blocks which contain active elements do we need to concern about. Hence the block which contains active elements is termed active block. Let b p f and bp l respectively denote the rst and the last active blocks that processor p contains. Thus b p f and bp l can be gured out as follows. The block numbers of blocks contained by processor p can be written in the form of (p + k P ), where k 2 Z + and Z + is the set of positive integers. Since the rst and the last active elements are o and s (N A? 1) + o, they are allocated to blocks bo=xc and b(s (N A? 1) + o)=xc, respectively. Therefore, b p f should be greater than or equal to bo=xc and b p l should be smaller than or equal to b(s(n A?1)+o)=xc. Suppose b p f = (p+k f P ) and b p l = (p + k l P ), where k f and k l 2 Z +. Thus, (p + k f P ) bo=xc and (p + k l P ) b(s (N A? 1) + o)=xc. We can obtain that k f = d(bo=xc? p)=p e and k l = b(b(s (N A? 1) + o)=xc? p)=p c. As a result, b p f = p + d(bo=xc? p)=p e P; b p l = p + b(b(s (N A? 1) + o)=xc? p)=p c P: For any block in processor p, the block will belong to a unique data distribution pattern and occurrence in that data distribution pattern. Therefore, for any block b in processor p, b can be decomposed into two factors and, where and are the data distribution pattern number and the occurrence in the distribution pattern where b belongs, respectively. and can be respectively obtained by bb=n pb c and b(b mod N pb )=P c, where N pb is the number of blocks in a data distribution pattern and N pb = lcm(n c ; P ), which has been introduced in Section 3.1. On the other hand, given (; ) p, it will correspond to a unique block number. That is, given (; ) p, a unique block number b can be obtained via b = p + N pb + P: Thus, we term (; ) p addressing pair. Accordingly, the addressing pairs for the rst and the last active blocks that contained by processor p, b p f and bp l, are represented by ( f ; f ) p and ( l ; l ) p, respectively. When we are generating the class table and the compression table, pseudo active elements are regarded as the same with active elements. However, as we are generating the compressed local array, these pseudo active elements have to be deducted out. Hence, in addition to the evaluations of the global and local indexing osets, we have to gure out the number of pseudo active elements allocated on processor p and occurred before the rst active element o. In order not to over generate the compressed local array elements, we still have to calculate the number of pseudo active elements occurred in the last active block b p l on processor p. Let PA p h denote the number of pseudo active elements allocated on processor p and occurred before the rst active element o and PA p t denote the number of pseudo active elements occurred in the last active block b p l. The values of PA p h and PAp t can be calculated as follows. To calculate PA p h, we have to decide whether the rst active element o is allocated on processor p or not. If the rst active element o is not allocated on p, pseudo active elements only occur in the blocks before the rst active block b p f. Therefore, PAp h = f A p ptn + C p l ( f ), where A p ptn is the number of active elements allocated on processor p in a data distribution pattern. Otherwise, which means that the rst active element o is allocated on processor p, thus, in addition to the blocks before the rst active block b p f, the pseudo active elements occurred in the rst active block also need to be considered. Hence, PA p h = f A p ptn +C p l ( f )+b(o 17

18 mod x)=sc. As a result, PA p h can be obtained by ( PA p h = f A p ptn + C p l ( f ) if o 62 p; f A p ptn + C p l ( f ) + b(o mod x)=sc if o 2 p: For the example shown in Fig. 12, PA p h for processor p 0 can be evaluated as follows. The rst active block on p 0 is b 8 and the addressing pair for the block is (0; 2) p0. Since the rst active element 28 is not allocated on processor p 0, the pseudo active elements on p 0 only occur in the blocks before the rst active block b 8, that is, blocks b 0 and b 4. Accordingly, PA p0 h = Cp0 l (2) = 3, where A p ptn = 5. As regards to PA p h for processor p 1, the rst active block on p 1 is b 5 and the addressing pair of this block is (0; 1) p1. Since the rst active element 28 is allocated on p 1, the pseudo active elements occur not only in the blocks before the rst active block but also in the rst active block. Thus, PA p1 h = Cp1 l (1) + 1 = 2. To evaluate PA p t, we also have to decide whether the last active element (s (N A? 1) + o) is allocated on processor p or not. If the last active element (s (N A? 1) + o) is not allocated on p, there is no pseudo active element in the last active block. Therefore, PA p t = 0. For the example shown in Fig. 12, the values of PA p t for processors p 0, p 1 and p 2 are 0 because the last active element 115 is not allocated on p 0, p 1 and p 2. On the other hand, if the last active element is allocated on p, the number of pseudo active elements in the last active block b p l of p 3 shown in Fig. 12, since the last active element 115 is allocated on p 3, thus PA p3 t where the addressing pair of b p3 l is (1; 2) p3. Consequently, PA p t can be obtained by PA p t = can be calculated by PA p t = Cn( p l )? b((s (N A? 1) + o) mod x)=sc? 1. For the case = Cn p3 (2)? 0? 1 = 1, ( 0 if (s (N A? 1) + o) 62 p; Cn( p l )? b((s (N A? 1) + o) mod x)=sc? 1 if (s (N A? 1) + o) 2 p: Since the rst and the last active blocks on processor p and the corresponding addressing pairs can be evaluated, the number of active elements on processor p can be obtained accordingly. For processor p, the number of active elements occurred from the rst block up to the last active block is l A p ptn+c p l ( l)+cn( p l ). However, the pseudo active elements are also counted. Therefore, the number of pseudo active elements should be deducted out. As a result, the number of active elements allocated on processor p is N p A and can be obtained by N p A = l A p ptn + C p l ( l) + C p n( l )? PA p h? PAp t : The code to generate the compressed local array on processor p for the general two-level data-processor mapping is shown in Appendix B. We explain the ideas of the algorithm shown in Fig. 13 by using the following example. Consider the general two-level data-processor mapping shown in Fig. 12. Take the generation of the compressed local array for processor p 0 as an example. We have to evaluate the number of active elements allocated on processor p 0 rst. The rst and the last active blocks on processor p 0 are b 8 and b 20, respectively. That is, b p0 f = b 8 and b p0 l = b 20. The addressing pairs ( f ; f ) p0 and ( l ; l ) p0 are (0; 2) p0 and (1; 2) p0, respectively. The values of PA p h and PAp t for processor p 0 have been described in previous paragraphs and PA p0 h = 3 and PA p0 t = 0. The number of active elements on processor p 0 is N p0 A = Cp0 l (2) + Cn p0 (2)? 3? 0 = 7, where A p0 ptn = 5. Since A p0 ptn = 5, according to Hole Compression algorithm shown in Fig. 13, the rst step is to generate 5 compressed local array elements. The compression table records the information to generate A p ptn compressed local array elements when pseudo active elements are not considered. However, there are 9 pseudo active elements occurred before the rst active element o in this example. Hence all the global indices 18

1 Elementary number theory

1 Elementary number theory Math 215 - Introduction to Advanced Mathematics Spring 2019 1 Elementary number theory We assume the existence of the natural numbers and the integers N = {1, 2, 3,...} Z = {..., 3, 2, 1, 0, 1, 2, 3,...},

More information

Statement-Level Communication-Free. Partitioning Techniques for. National Central University. Chung-Li 32054, Taiwan

Statement-Level Communication-Free. Partitioning Techniques for. National Central University. Chung-Li 32054, Taiwan Appeared in the Ninth Worshop on Languages and Compilers for Parallel Comping, San Jose, CA, Aug. 8-0, 996. Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers Kuei-Ping

More information

1 Elementary number theory

1 Elementary number theory 1 Elementary number theory We assume the existence of the natural numbers and the integers N = {1, 2, 3,...} Z = {..., 3, 2, 1, 0, 1, 2, 3,...}, along with their most basic arithmetical and ordering properties.

More information

Line Graphs and Circulants

Line Graphs and Circulants Line Graphs and Circulants Jason Brown and Richard Hoshino Department of Mathematics and Statistics Dalhousie University Halifax, Nova Scotia, Canada B3H 3J5 Abstract The line graph of G, denoted L(G),

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Efficient Prefix Computation on Faulty Hypercubes

Efficient Prefix Computation on Faulty Hypercubes JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 17, 1-21 (21) Efficient Prefix Computation on Faulty Hypercubes YU-WEI CHEN AND KUO-LIANG CHUNG + Department of Computer and Information Science Aletheia

More information

Math 302 Introduction to Proofs via Number Theory. Robert Jewett (with small modifications by B. Ćurgus)

Math 302 Introduction to Proofs via Number Theory. Robert Jewett (with small modifications by B. Ćurgus) Math 30 Introduction to Proofs via Number Theory Robert Jewett (with small modifications by B. Ćurgus) March 30, 009 Contents 1 The Integers 3 1.1 Axioms of Z...................................... 3 1.

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Weak Dynamic Coloring of Planar Graphs

Weak Dynamic Coloring of Planar Graphs Weak Dynamic Coloring of Planar Graphs Caroline Accurso 1,5, Vitaliy Chernyshov 2,5, Leaha Hand 3,5, Sogol Jahanbekam 2,4,5, and Paul Wenger 2 Abstract The k-weak-dynamic number of a graph G is the smallest

More information

classify all blocks into classes and use a class table to record the memory accesses of the ærst repetitive pattern. By using the class table, they de

classify all blocks into classes and use a class table to record the memory accesses of the ærst repetitive pattern. By using the class table, they de Eæcient Address Generation for Aæne Subscripts in Data-Parallel Programs Kuei-Ping Shih Department of Computer Science and Information Engineering National Central University Chung-Li 32054, Taiwan Email:

More information

AXIOMS FOR THE INTEGERS

AXIOMS FOR THE INTEGERS AXIOMS FOR THE INTEGERS BRIAN OSSERMAN We describe the set of axioms for the integers which we will use in the class. The axioms are almost the same as what is presented in Appendix A of the textbook,

More information

Discrete Mathematics Lecture 4. Harper Langston New York University

Discrete Mathematics Lecture 4. Harper Langston New York University Discrete Mathematics Lecture 4 Harper Langston New York University Sequences Sequence is a set of (usually infinite number of) ordered elements: a 1, a 2,, a n, Each individual element a k is called a

More information

IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP?

IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP? Theoretical Computer Science 19 (1982) 337-341 North-Holland Publishing Company NOTE IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP? Nimrod MEGIDDO Statistics Department, Tel Aviv

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Process Allocation for Load Distribution in Fault-Tolerant. Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering

Process Allocation for Load Distribution in Fault-Tolerant. Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering Process Allocation for Load Distribution in Fault-Tolerant Multicomputers y Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering **Dept. of Electrical Engineering Pohang University

More information

Revised version, February 1991, appeared in Information Processing Letters 38 (1991), 123{127 COMPUTING THE MINIMUM HAUSDORFF DISTANCE BETWEEN

Revised version, February 1991, appeared in Information Processing Letters 38 (1991), 123{127 COMPUTING THE MINIMUM HAUSDORFF DISTANCE BETWEEN Revised version, February 1991, appeared in Information Processing Letters 38 (1991), 123{127 COMPUTING THE MINIMUM HAUSDORFF DISTANCE BETWEEN TWO POINT SETS ON A LINE UNDER TRANSLATION Gunter Rote Technische

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

Solutions to the Second Midterm Exam

Solutions to the Second Midterm Exam CS/Math 240: Intro to Discrete Math 3/27/2011 Instructor: Dieter van Melkebeek Solutions to the Second Midterm Exam Problem 1 This question deals with the following implementation of binary search. Function

More information

9.5 Equivalence Relations

9.5 Equivalence Relations 9.5 Equivalence Relations You know from your early study of fractions that each fraction has many equivalent forms. For example, 2, 2 4, 3 6, 2, 3 6, 5 30,... are all different ways to represent the same

More information

Linear Quadtree Construction in Real Time *

Linear Quadtree Construction in Real Time * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 1917-1930 (2010) Short Paper Linear Quadtree Construction in Real Time * CHI-YEN HUANG AND YU-WEI CHEN + Department of Information Management National

More information

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF

ELEMENTARY NUMBER THEORY AND METHODS OF PROOF CHAPTER 4 ELEMENTARY NUMBER THEORY AND METHODS OF PROOF Copyright Cengage Learning. All rights reserved. SECTION 4.8 Application: Algorithms Copyright Cengage Learning. All rights reserved. Application:

More information

arxiv: v2 [cs.ds] 18 May 2015

arxiv: v2 [cs.ds] 18 May 2015 Optimal Shuffle Code with Permutation Instructions Sebastian Buchwald, Manuel Mohr, and Ignaz Rutter Karlsruhe Institute of Technology {sebastian.buchwald, manuel.mohr, rutter}@kit.edu arxiv:1504.07073v2

More information

Stability in ATM Networks. network.

Stability in ATM Networks. network. Stability in ATM Networks. Chengzhi Li, Amitava Raha y, and Wei Zhao Abstract In this paper, we address the issues of stability in ATM networks. A network is stable if and only if all the packets have

More information

3186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 9, SEPTEMBER Zero/Positive Capacities of Two-Dimensional Runlength-Constrained Arrays

3186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 9, SEPTEMBER Zero/Positive Capacities of Two-Dimensional Runlength-Constrained Arrays 3186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 51, NO 9, SEPTEMBER 2005 Zero/Positive Capacities of Two-Dimensional Runlength-Constrained Arrays Tuvi Etzion, Fellow, IEEE, and Kenneth G Paterson, Member,

More information

MOST attention in the literature of network codes has

MOST attention in the literature of network codes has 3862 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010 Efficient Network Code Design for Cyclic Networks Elona Erez, Member, IEEE, and Meir Feder, Fellow, IEEE Abstract This paper introduces

More information

However, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t

However, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t FAST CALCULATION OF GEOMETRIC MOMENTS OF BINARY IMAGES Jan Flusser Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodarenskou vez 4, 82 08 Prague 8, Czech

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Orthogonal art galleries with holes: a coloring proof of Aggarwal s Theorem

Orthogonal art galleries with holes: a coloring proof of Aggarwal s Theorem Orthogonal art galleries with holes: a coloring proof of Aggarwal s Theorem Pawe l Żyliński Institute of Mathematics University of Gdańsk, 8095 Gdańsk, Poland pz@math.univ.gda.pl Submitted: Sep 9, 005;

More information

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PAUL BALISTER Abstract It has been shown [Balister, 2001] that if n is odd and m 1,, m t are integers with m i 3 and t i=1 m i = E(K n) then K n can be decomposed

More information

CHAPTER 8. Copyright Cengage Learning. All rights reserved.

CHAPTER 8. Copyright Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS Copyright Cengage Learning. All rights reserved. SECTION 8.3 Equivalence Relations Copyright Cengage Learning. All rights reserved. The Relation Induced by a Partition 3 The Relation

More information

Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have

Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have been red in the sequence up to and including v i (s) is deg(v)? s(v), and by the induction hypothesis this sequence

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

Matlab and Coordinate Systems

Matlab and Coordinate Systems Matlab and Coordinate Systems Math 45 Linear Algebra David Arnold David-Arnold@Eureka.redwoods.cc.ca.us Abstract In this exercise we will introduce the concept of a coordinate system for a vector space.

More information

Part 2: Balanced Trees

Part 2: Balanced Trees Part 2: Balanced Trees 1 AVL Trees We could dene a perfectly balanced binary search tree with N nodes to be a complete binary search tree, one in which every level except the last is completely full. A

More information

Math Introduction to Advanced Mathematics

Math Introduction to Advanced Mathematics Math 215 - Introduction to Advanced Mathematics Number Theory Fall 2017 The following introductory guide to number theory is borrowed from Drew Shulman and is used in a couple of other Math 215 classes.

More information

Properly even harmonious labelings of disconnected graphs

Properly even harmonious labelings of disconnected graphs Available online at www.sciencedirect.com ScienceDirect AKCE International Journal of Graphs and Combinatorics 12 (2015) 193 203 www.elsevier.com/locate/akcej Properly even harmonious labelings of disconnected

More information

11 1. Introductory part In order to follow the contents of this book with full understanding, certain prerequisites from high school mathematics will

11 1. Introductory part In order to follow the contents of this book with full understanding, certain prerequisites from high school mathematics will 11 1. Introductory part In order to follow the contents of this book with full understanding, certain prerequisites from high school mathematics will be necessary. This firstly pertains to the basic concepts

More information

CLOCK DRIVEN SCHEDULING

CLOCK DRIVEN SCHEDULING CHAPTER 4 By Radu Muresan University of Guelph Page 1 ENGG4420 CHAPTER 4 LECTURE 2 and 3 November 04 09 7:51 PM CLOCK DRIVEN SCHEDULING Clock driven schedulers make their scheduling decisions regarding

More information

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract Implementations of Dijkstra's Algorithm Based on Multi-Level Buckets Andrew V. Goldberg NEC Research Institute 4 Independence Way Princeton, NJ 08540 avg@research.nj.nec.com Craig Silverstein Computer

More information

Introduction to Modular Arithmetic

Introduction to Modular Arithmetic Randolph High School Math League 2014-2015 Page 1 1 Introduction Introduction to Modular Arithmetic Modular arithmetic is a topic residing under Number Theory, which roughly speaking is the study of integers

More information

Recognizing Interval Bigraphs by Forbidden Patterns

Recognizing Interval Bigraphs by Forbidden Patterns Recognizing Interval Bigraphs by Forbidden Patterns Arash Rafiey Simon Fraser University, Vancouver, Canada, and Indiana State University, IN, USA arashr@sfu.ca, arash.rafiey@indstate.edu Abstract Let

More information

Matlab and Coordinate Systems

Matlab and Coordinate Systems Math 45 Linear Algebra David Arnold David-Arnold@Eureka.redwoods.cc.ca.us Abstract In this exercise we will introduce the concept of a coordinate system for a vector space. The map from the vector space

More information

The 4/5 Upper Bound on the Game Total Domination Number

The 4/5 Upper Bound on the Game Total Domination Number The 4/ Upper Bound on the Game Total Domination Number Michael A. Henning a Sandi Klavžar b,c,d Douglas F. Rall e a Department of Mathematics, University of Johannesburg, South Africa mahenning@uj.ac.za

More information

Excerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.

Excerpt from Art of Problem Solving Volume 1: the Basics 2014 AoPS Inc. Chapter 5 Using the Integers In spite of their being a rather restricted class of numbers, the integers have a lot of interesting properties and uses. Math which involves the properties of integers is

More information

On k-dimensional Balanced Binary Trees*

On k-dimensional Balanced Binary Trees* journal of computer and system sciences 52, 328348 (1996) article no. 0025 On k-dimensional Balanced Binary Trees* Vijay K. Vaishnavi Department of Computer Information Systems, Georgia State University,

More information

Discharging and reducible configurations

Discharging and reducible configurations Discharging and reducible configurations Zdeněk Dvořák March 24, 2018 Suppose we want to show that graphs from some hereditary class G are k- colorable. Clearly, we can restrict our attention to graphs

More information

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract An Ecient Approximation Algorithm for the File Redistribution Scheduling Problem in Fully Connected Networks Ravi Varadarajan Pedro I. Rivera-Vega y Abstract We consider the problem of transferring a set

More information

Module 2: Classical Algorithm Design Techniques

Module 2: Classical Algorithm Design Techniques Module 2: Classical Algorithm Design Techniques Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Module

More information

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o Reconstructing a Binary Tree from its Traversals in Doubly-Logarithmic CREW Time Stephan Olariu Michael Overstreet Department of Computer Science, Old Dominion University, Norfolk, VA 23529 Zhaofang Wen

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

11.1 Facility Location

11.1 Facility Location CS787: Advanced Algorithms Scribe: Amanda Burton, Leah Kluegel Lecturer: Shuchi Chawla Topic: Facility Location ctd., Linear Programming Date: October 8, 2007 Today we conclude the discussion of local

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

Integers and Mathematical Induction

Integers and Mathematical Induction IT Program, NTUT, Fall 07 Integers and Mathematical Induction Chuan-Ming Liu Computer Science and Information Engineering National Taipei University of Technology TAIWAN 1 Learning Objectives Learn about

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

Lemma (x, y, z) is a Pythagorean triple iff (y, x, z) is a Pythagorean triple.

Lemma (x, y, z) is a Pythagorean triple iff (y, x, z) is a Pythagorean triple. Chapter Pythagorean Triples.1 Introduction. The Pythagorean triples have been known since the time of Euclid and can be found in the third century work Arithmetica by Diophantus [9]. An ancient Babylonian

More information

Experiments on string matching in memory structures

Experiments on string matching in memory structures Experiments on string matching in memory structures Thierry Lecroq LIR (Laboratoire d'informatique de Rouen) and ABISS (Atelier de Biologie Informatique Statistique et Socio-Linguistique), Universite de

More information

A Simplied NP-complete MAXSAT Problem. Abstract. It is shown that the MAX2SAT problem is NP-complete even if every variable

A Simplied NP-complete MAXSAT Problem. Abstract. It is shown that the MAX2SAT problem is NP-complete even if every variable A Simplied NP-complete MAXSAT Problem Venkatesh Raman 1, B. Ravikumar 2 and S. Srinivasa Rao 1 1 The Institute of Mathematical Sciences, C. I. T. Campus, Chennai 600 113. India 2 Department of Computer

More information

Edge disjoint monochromatic triangles in 2-colored graphs

Edge disjoint monochromatic triangles in 2-colored graphs Discrete Mathematics 31 (001) 135 141 www.elsevier.com/locate/disc Edge disjoint monochromatic triangles in -colored graphs P. Erdős a, R.J. Faudree b; ;1, R.J. Gould c;, M.S. Jacobson d;3, J. Lehel d;

More information

Topology Homework 3. Section Section 3.3. Samuel Otten

Topology Homework 3. Section Section 3.3. Samuel Otten Topology Homework 3 Section 3.1 - Section 3.3 Samuel Otten 3.1 (1) Proposition. The intersection of finitely many open sets is open and the union of finitely many closed sets is closed. Proof. Note that

More information

MDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science

MDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science MDP Routing in ATM Networks Using the Virtual Path Concept 1 Ren-Hung Hwang, James F. Kurose, and Don Towsley Department of Computer Science Department of Computer Science & Information Engineering University

More information

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap Storage-Ecient Finite Field Basis Conversion Burton S. Kaliski Jr. 1 and Yiqun Lisa Yin 2 RSA Laboratories 1 20 Crosby Drive, Bedford, MA 01730. burt@rsa.com 2 2955 Campus Drive, San Mateo, CA 94402. yiqun@rsa.com

More information

Valmir C. Barbosa. Programa de Engenharia de Sistemas e Computac~ao, COPPE. Caixa Postal Abstract

Valmir C. Barbosa. Programa de Engenharia de Sistemas e Computac~ao, COPPE. Caixa Postal Abstract The Interleaved Multichromatic Number of a Graph Valmir C. Barbosa Universidade Federal do Rio de Janeiro Programa de Engenharia de Sistemas e Computac~ao, COPPE Caixa Postal 685 2945-970 Rio de Janeiro

More information

Tilings of the Euclidean plane

Tilings of the Euclidean plane Tilings of the Euclidean plane Yan Der, Robin, Cécile January 9, 2017 Abstract This document gives a quick overview of a eld of mathematics which lies in the intersection of geometry and algebra : tilings.

More information

Parameterized Complexity of Independence and Domination on Geometric Graphs

Parameterized Complexity of Independence and Domination on Geometric Graphs Parameterized Complexity of Independence and Domination on Geometric Graphs Dániel Marx Institut für Informatik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. dmarx@informatik.hu-berlin.de

More information

MITOCW watch?v=kvtlwgctwn4

MITOCW watch?v=kvtlwgctwn4 MITOCW watch?v=kvtlwgctwn4 PROFESSOR: The idea of congruence was introduced to the world by Gauss in the early 18th century. You've heard of him before, I think. He's responsible for some work on magnetism

More information

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz Compiler and Runtime Support for Programming in Adaptive Parallel Environments 1 Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz UMIACS and Dept. of Computer Science University

More information

Introduction to Programming in C Department of Computer Science and Engineering\ Lecture No. #02 Introduction: GCD

Introduction to Programming in C Department of Computer Science and Engineering\ Lecture No. #02 Introduction: GCD Introduction to Programming in C Department of Computer Science and Engineering\ Lecture No. #02 Introduction: GCD In this session, we will write another algorithm to solve a mathematical problem. If you

More information

31.6 Powers of an element

31.6 Powers of an element 31.6 Powers of an element Just as we often consider the multiples of a given element, modulo, we consider the sequence of powers of, modulo, where :,,,,. modulo Indexing from 0, the 0th value in this sequence

More information

Parallel Pipeline STAP System

Parallel Pipeline STAP System I/O Implementation and Evaluation of Parallel Pipelined STAP on High Performance Computers Wei-keng Liao, Alok Choudhary, Donald Weiner, and Pramod Varshney EECS Department, Syracuse University, Syracuse,

More information

Pick s Theorem and Lattice Point Geometry

Pick s Theorem and Lattice Point Geometry Pick s Theorem and Lattice Point Geometry 1 Lattice Polygon Area Calculations Lattice points are points with integer coordinates in the x, y-plane. A lattice line segment is a line segment that has 2 distinct

More information

An Eternal Domination Problem in Grids

An Eternal Domination Problem in Grids Theory and Applications of Graphs Volume Issue 1 Article 2 2017 An Eternal Domination Problem in Grids William Klostermeyer University of North Florida, klostermeyer@hotmail.com Margaret-Ellen Messinger

More information

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA A Decoder-based Evolutionary Algorithm for Constrained Parameter Optimization Problems S lawomir Kozie l 1 and Zbigniew Michalewicz 2 1 Department of Electronics, 2 Department of Computer Science, Telecommunication

More information

Segmentation. Multiple Segments. Lecture Notes Week 6

Segmentation. Multiple Segments. Lecture Notes Week 6 At this point, we have the concept of virtual memory. The CPU emits virtual memory addresses instead of physical memory addresses. The MMU translates between virtual and physical addresses. Don't forget,

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

General properties of staircase and convex dual feasible functions

General properties of staircase and convex dual feasible functions General properties of staircase and convex dual feasible functions JÜRGEN RIETZ, CLÁUDIO ALVES, J. M. VALÉRIO de CARVALHO Centro de Investigação Algoritmi da Universidade do Minho, Escola de Engenharia

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION

SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION Copyright Cengage Learning. All rights reserved. SECTION 5.5 Application: Correctness of Algorithms Copyright Cengage Learning. All rights reserved.

More information

One of the most important areas where quantifier logic is used is formal specification of computer programs.

One of the most important areas where quantifier logic is used is formal specification of computer programs. Section 5.2 Formal specification of computer programs One of the most important areas where quantifier logic is used is formal specification of computer programs. Specification takes place on several levels

More information

2 The Fractional Chromatic Gap

2 The Fractional Chromatic Gap C 1 11 2 The Fractional Chromatic Gap As previously noted, for any finite graph. This result follows from the strong duality of linear programs. Since there is no such duality result for infinite linear

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

A NOTE ON INCOMPLETE REGULAR TOURNAMENTS WITH HANDICAP TWO OF ORDER n 8 (mod 16) Dalibor Froncek

A NOTE ON INCOMPLETE REGULAR TOURNAMENTS WITH HANDICAP TWO OF ORDER n 8 (mod 16) Dalibor Froncek Opuscula Math. 37, no. 4 (2017), 557 566 http://dx.doi.org/10.7494/opmath.2017.37.4.557 Opuscula Mathematica A NOTE ON INCOMPLETE REGULAR TOURNAMENTS WITH HANDICAP TWO OF ORDER n 8 (mod 16) Dalibor Froncek

More information

Tiling Rectangles with Gaps by Ribbon Right Trominoes

Tiling Rectangles with Gaps by Ribbon Right Trominoes Open Journal of Discrete Mathematics, 2017, 7, 87-102 http://www.scirp.org/journal/ojdm ISSN Online: 2161-7643 ISSN Print: 2161-7635 Tiling Rectangles with Gaps by Ribbon Right Trominoes Premalatha Junius,

More information

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong.

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong. Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong SAR, China fyclaw,jleeg@cse.cuhk.edu.hk

More information

EXTREME POINTS AND AFFINE EQUIVALENCE

EXTREME POINTS AND AFFINE EQUIVALENCE EXTREME POINTS AND AFFINE EQUIVALENCE The purpose of this note is to use the notions of extreme points and affine transformations which are studied in the file affine-convex.pdf to prove that certain standard

More information

6th Bay Area Mathematical Olympiad

6th Bay Area Mathematical Olympiad 6th Bay Area Mathematical Olympiad February 4, 004 Problems and Solutions 1 A tiling of the plane with polygons consists of placing the polygons in the plane so that interiors of polygons do not overlap,

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract Algorithms for an FPGA Switch Module Routing Problem with Application to Global Routing Shashidhar Thakur y Yao-Wen Chang y D. F. Wong y S. Muthukrishnan z Abstract We consider a switch-module-routing

More information

Module 11. Directed Graphs. Contents

Module 11. Directed Graphs. Contents Module 11 Directed Graphs Contents 11.1 Basic concepts......................... 256 Underlying graph of a digraph................ 257 Out-degrees and in-degrees.................. 258 Isomorphism..........................

More information

CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK

CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK CS 6402 DESIGN AND ANALYSIS OF ALGORITHMS QUESTION BANK Page 1 UNIT I INTRODUCTION 2 marks 1. Why is the need of studying algorithms? From a practical standpoint, a standard set of algorithms from different

More information

Eulerian subgraphs containing given edges

Eulerian subgraphs containing given edges Discrete Mathematics 230 (2001) 63 69 www.elsevier.com/locate/disc Eulerian subgraphs containing given edges Hong-Jian Lai Department of Mathematics, West Virginia University, P.O. Box. 6310, Morgantown,

More information

Multiway Blockwise In-place Merging

Multiway Blockwise In-place Merging Multiway Blockwise In-place Merging Viliam Geffert and Jozef Gajdoš Institute of Computer Science, P.J.Šafárik University, Faculty of Science Jesenná 5, 041 54 Košice, Slovak Republic viliam.geffert@upjs.sk,

More information

The Structure of Bull-Free Perfect Graphs

The Structure of Bull-Free Perfect Graphs The Structure of Bull-Free Perfect Graphs Maria Chudnovsky and Irena Penev Columbia University, New York, NY 10027 USA May 18, 2012 Abstract The bull is a graph consisting of a triangle and two vertex-disjoint

More information

Dierential-Linear Cryptanalysis of Serpent? Haifa 32000, Israel. Haifa 32000, Israel

Dierential-Linear Cryptanalysis of Serpent? Haifa 32000, Israel. Haifa 32000, Israel Dierential-Linear Cryptanalysis of Serpent Eli Biham, 1 Orr Dunkelman, 1 Nathan Keller 2 1 Computer Science Department, Technion. Haifa 32000, Israel fbiham,orrdg@cs.technion.ac.il 2 Mathematics Department,

More information

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence

More information

Data Dependences and Parallelization

Data Dependences and Parallelization Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information

arxiv: v1 [math.co] 13 Aug 2017

arxiv: v1 [math.co] 13 Aug 2017 Strong geodetic problem in grid like architectures arxiv:170803869v1 [mathco] 13 Aug 017 Sandi Klavžar a,b,c April 11, 018 Paul Manuel d a Faculty of Mathematics and Physics, University of Ljubljana, Slovenia

More information

ON SWELL COLORED COMPLETE GRAPHS

ON SWELL COLORED COMPLETE GRAPHS Acta Math. Univ. Comenianae Vol. LXIII, (1994), pp. 303 308 303 ON SWELL COLORED COMPLETE GRAPHS C. WARD and S. SZABÓ Abstract. An edge-colored graph is said to be swell-colored if each triangle contains

More information

Reverse Polish notation in constructing the algorithm for polygon triangulation

Reverse Polish notation in constructing the algorithm for polygon triangulation Reverse Polish notation in constructing the algorithm for polygon triangulation Predrag V. Krtolica, Predrag S. Stanimirović and Rade Stanojević Abstract The reverse Polish notation properties are used

More information