MANY signal processing systems, particularly in the multimedia

Size: px
Start display at page:

Download "MANY signal processing systems, particularly in the multimedia"

Transcription

1 1304 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 Signal Assignment to Hierarchical Memory Organizations for Embedded Multidimensional Signal Processing Systems Florin Balasa, Hongwei Zhu, and Ilie I. Luican Abstract The storage requirements of the array-dominated and loop-organized algorithmic specifications running on embedded systems can be significant. Employing a data memory space much larger than needed has negative consequences on the energy consumption, latency, and chip area. Finding an optimized storage of the usually large arrays from these algorithmic specifications is an essential task of memory allocation. This paper proposes an efficient algorithm for mapping multidimensional arrays to the data memory. Similarly to [1], it computes bounding windows for live elements in the index space of arrays, but this algorithm is several times faster. More important, since this algorithm works not only for entire arrays, but also parts of arrays like, for instance, array references or, more general, sets of array elements represented by lattices [2], this signal-to-memory mapping technique can be also applied in hierarchical memory architectures. Index Terms Lattice, memory allocation, multidimensional signal processing, polytope, signal-to-memory mapping. I. INTRODUCTION MANY signal processing systems, particularly in the multimedia and telecom domains, are synthesized to execute mainly data-dominated applications. Their behavior is described in a high-level programming language, where the code is typically organized in sequences of loop nests having as boundaries (usually affine) functions of loop iterators, conditional instructions where the arguments may be data-dependent and/or data-independent (relational and/or logic expressions of affine functions of loop iterators). The data structures are multidimensional arrays; the indices of the array references are affine functions of loop iterators. The specifications with these characteristics are often called affine specifications [3]. In these targeted VLSI systems, data transfer and storage have a significant impact on both the system performance and the major cost parameters power consumption and chip area. During the system development process, the designer must often focus first on the exploration of the memory subsystem in order to achieve a cost optimized end-product [3]. The assignment and mapping of the multidimensional signals (arrays) Manuscript received June 09, 2007; revised January 19, First published March 16, 2009; current version published August 19, F. Balasa is with the Department of Computer Science and Information Systems, Southern Utah University, Cedar City, UT USA ( fbalasa@cs.uic.edu). H. Zhu is with the Physical IP Division, ARM, Inc., San Jose, CA USA ( davzhu01@arm.com). I. I. Luican is with the Department of Computer Science, University of Illinois at Chicago, Chicago, IL USA ( iluica2@uic.edu). Digital Object Identifier /TVLSI from these behavioral specifications to the data memory is an essential memory management task. The goals of this latter operation are the following: 1) to ascertain that any distinct scalar signals (array elements) simultaneously alive are mapped to distinct storage locations; 1 2) to use mapping functions simple enough in order to ensure an address generation hardware of a reasonable complexity; 3) to map the arrays from the behavioral specification into an amount of data storage as small as possible. Employing a data memory space much larger than needed has several negative consequences: the energy consumption per access increases with the memory size [3], as well as the data access latency [4]; in addition, larger memories occupy more chip area. A. Previous Works Several mapping models have been proposed in the past, trading off between the last two goals (that is, accepting a certain excess of storage to ensure a less complex address generation hardware) while ascertaining that the first goal is strictly satisfied. De Greef et al. choose one of the canonical linearizations of the array (a permutation of its dimensions), followed by a modulo operation that wraps the set of virtual memory locations into a smaller set of actual physical locations [5]. Since for an -dimensional array there are permutations of the indices, and since, in addition, the elements in a given dimension can be mapped in the increasing or decreasing order of the respective index, the number of analyzed linearizations increases fast with the signal dimension. Lefebvre and Feautrier, addressing parallelization of static control programs, developed in [6] an intraarray storage approach based on modular mapping, as well. They first compute the lexicographically maximal time delay between the write and the last read operations, which is a super-approximation of the distance between conflicting index vectors (i.e., whose corresponding array elements are simultaneously alive). Then, the modulo operands are computed successively as follows: the modulo operand, applied on the first array index, is set to 1 1 The lifetime of a scalar signal is the time interval between the clock cycles when the scalar is produced or written, and when it is read for the last time, i.e., consumed, during the code execution. Two scalars are simultaneously alive if their lifetimes do overlap. Obviously, in such a case, they must occupy different memory locations; otherwise, they can share the same location /$ IEEE

2 BALASA et al.: SIGNAL ASSIGNMENT TO HIERARCHICAL MEMORY ORGANIZATIONS 1305 plus the maximal difference between the first indices over the conflicting index vectors; the modulo operand of the second index is set to 1 plus the maximal difference between the second indices over the conflicting index vector, when the first indices are equal, and so on. Quilleré and Rajopadhye studied the problem of memory reuse for systems of recurrence equations, a computation model used to represent algorithms to be compiled into circuits [7]. In their model, the loop iterators first undergo an affine mapping (into a linear space of smallest dimension what they call a projection ) before modulo operations are applied to the array indices. In order to avoid the inconvenience of analyzing different linearization schemes like in [5], Tronçon et al. proposed [1] to reduce the size of an -dimensional array mapped to the data memory by computing an -dimensional bounding window, whose elements can be used as operands in modulo operations that redirect all accesses to the array. Darte et al. proposed a lattice-based mathematical framework for intraarray mapping, establishing a correspondence between valid linear storage allocations and integer lattices called strictly admissible relative to the set of differences of the conflicting indices [8]. They proposed heuristic techniques for building strictly admissible integer lattices, thus building valid storage allocations. B. Overview of the Proposed Framework and the Contributions of This Work The assignment algorithm proposed in this paper has a similar general goal as Tronçon s mapping model [1]: for each -dimensional array in the specification code, it computes a maximal bounding window whose dimensions are used to redirect the accesses to the array. Any array element is mapped into the window ; in its turn, the window is mapped into the physical memory (relative to a base address) by a typical canonical linearization, like row or column concatenation for 2-D arrays. Each window dimension is computed as the maximum difference in absolute value between the th indices of any two array elements simultaneously alive, plus 1. This ensures that any two array elements simultaneously alive are mapped to distinct memory locations. The amount of data memory required for storing (after mapping) the array is the volume of the window, that is,. In the illustrative example shown in Fig. 1, the window corresponding to the 2-D array has the dimensions. Indeed, it can be easily verified that, independent of the iteration, the maximum distance between the first (respectively, second) indices of two alive elements cannot exceed 3 (respectively, 5). For instance, the simultaneously alive elements and (see Fig. 1) have both indices the farthest apart from each other. Therefore, a memory access to the element can be safely (that is, without any mapping conflict) redirected to. The computation algorithm employed in [1] operates in two stages: a liveness analysis establishes a set of program points Fig. 1. Array space of signal A. The points represent the A-elements A[index ][index ] which are produced (and consumed as well) in the loop nest. The points to the left of the dashed line represent the elements produced till the end of the iteration (i =7;j =9), the black points being the elements still alive (i.e., produced and still needed in the next iterations) while the circles representing elements already dead (i.e., not needed as operands any more). The light points to the right of the dashed line are A-elements still unborn (to be produced in the next iterations). in the code where the live sets of array elements are computed; this step is followed by the proper windows computation, based on incremental increases of the windows dimensions and checks for the existence of mapping conflicts. 2 In contrast, the approach described in this paper is based on a different computation methodology, which is significantly faster. This efficient methodology based on projections of lattices [2] is the first contribution of the paper and it is explained in Sections II and III. More important than the increase of the algorithmic efficiency, our computation framework allows significant extensions of the mapping model which are not shared, to the best of our knowledge, by any other previous work in the field. These extensions are explained below. As already mentioned, in the illustrative example from Fig. 1, the window of signal has the dimensions. It follows that the storage requirement after mapping for the 2-D signal is memory locations. It can be easily noticed that, actually, only 19 locations would be really needed since not more than 19 -elements can be simultaneously alive: see, again, Fig. 1, where the 19 black dots represent the live elements at the end of the iteration. While the previous works compute only the sizes of the array window, they are unable to provide any consistent information on the quality of their models, that is, how large is the oversize of the computed mapping windows in comparison to the minimal array windows and, also, what is the oversize of the storage requirement after mapping relative to the minimum amount of data memory necessary for the code execution. While a minimum array window is difficult to achieve in practice (in many cases, requiring a significantly more complex hardware), a signal-to-memory mapping model actually 2 More implementation details on this technique will be given in Section VI.

3 1306 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 trades-off an excess of storage against a less complex address generation hardware. Our methodology allows to compute this amount of extra storage after mapping, providing useful information for the system-level exploration design phase. This is the second major contribution of the paper, and algorithmic details are provided in Section IV. None of the works addressing the problem of signal-tomemory assignment during the memory allocation design phase [9] discusses the applicability of their proposed mapping model in the context of a multilayer memory organization. On the other hand, the works in the area of memory management that consider memory hierarchy (see Section V for a brief overview) do not address the mapping problem. Since our algorithm works not only for entire arrays, but also parts of arrays like, for instance, array references or, more general, sets of array elements represented by lattices [2], this signal-to-memory mapping technique can be also applied in a multilayer memory hierarchy. This is the third major contribution of the paper. The extension of our mapping model in a two-level memory hierarchy (off-chip memory and on-chip scratch-pad memory) is discussed in Section V. Finally, Section VI presents basic implementation aspects and discusses the experimental results, while Section VII summarizes the main conclusions of this work. II. WINDOW OF AN ARRAY REFERENCE In this section, a simpler problem will be addressed: given an array reference of an -dimensional signal, in the scope of a nest of loops having the iterators, and where the indices are affine functions of the iterators, compute the dimensions of the bounding window for live elements (as explained in Section I-B), assuming that all the -elements of the array reference are simultaneously alive. The technique for solving this problem will be illustrated on the following. Example 1: The iterators satisfy a set of linear inequalities derived from the loop boundaries and conditional expressions:. These inequalities define what is usually called a -polytope. 3 The indices of the array reference are given by an affine vector mapping defined on Such a set is usually called a (linearly bounded) lattice [2] and it constitutes the array space or index space of the array reference. 3 In general, this iterator space can be a finite set of Z-polytopes; these can be considered disjoint (since, otherwise, a polytope decomposition can be applied). More generally, the problem to be solved is the following: Given the linearly bounded lattice where is an index vector and is an iterator vector, compute the projection spans of the lattice on all the coordinate axes. The general idea for solving this problem is to find a transformation such that the extreme values of first iterator correspond to the extreme values of, say, the th index, for every value of. In this way, the problem reduces to computing the integer projection of a -polytope, this latter problem being well-studied [11], [12]. Algorithm 1: Computation of the bounding window of a lattice Step 1 The th index has the expression:, where is the th row of the matrix in (1). Let be a unimodular matrix 4 bringing to the Hermite Normal Form [2]: (if the row is null, then the window reduces to one point: ). Step 2 After applying the unimodular transformation, the new iterator -polytope becomes:. Step 3 Compute the extreme values of (denoted and ) by projecting the -polytope on the first axis [11]. Then, and. The th window dimension. Performing this 3-step algorithm for, the bounding window of the lattice is obtained. Example 1 (Continued): The algorithm will be illustrated projecting the array reference from Example 1 on the first axis and finding the extreme values of the first index, where. The unimodular matrix (see, e.g., [2] for building ) brings to the Hermite Normal Form:. Since, the initial iterator space (see Example 1) becomes. Eliminating from these inequalities with a Fourier Motzkin technique [13], the extreme values of the exact shadow [11] of on the first axis are and, and those extreme points are valid projections (i.e., there exists s such that and satisfy the inequalities defining ). Since and, it follows immediately that and. Projecting the lattice on the second axis, the second row of the affine mapping is and the unimodular matrix is. With a similar computation, it follows that and. Therefore, the bounding window of the array reference is. This algorithm will be used in the signal-to-memory mapping approach, explained in the next section. 4 A square matrix with integer elements whose determinant is 61. (1)

4 BALASA et al.: SIGNAL ASSIGNMENT TO HIERARCHICAL MEMORY ORGANIZATIONS 1307 Fig. 2. (a) Example 2: An affine specification derived from a motion detection kernel. (b) Decomposition of the index space of the signal A[x][y] into disjoint lattices; the arcs in the graph show inclusion relations. III. GLOBAL FLOW OF THE SIGNAL-TO-MEMORY MAPPING ALGORITHM In the previous section, we computed the bounding window for the live array elements covered by a given array reference. However, the assumption we made that all the elements in the array reference are alive does not always hold during the execution of the algorithmic specification. In the illustrative example from Fig. 2(a), the array reference from the assignment marked with, covering 10,465 elements, is entirely alive at the beginning of the loop nest, but it is not any more entirely alive when the execution of the loop nest is over: since 161 of its elements (, ) were consumed for the last time during the loop nest execution, only 10,304 -elements are still alive (hence, needing storage) at the end of the loop nest. Even more revealing, the array reference from the same starred assignment covers 409,825 -elements. If all the elements were simultaneously alive, the necessary bounding window would be, since there is only one possible value (32) for the first index, 97 possible values (between 32 and 128) for the second index, and 4225 possible values (between 1 and 4225) for the third index. However, a closer look shows that, actually, only one single element is alive at a time: each -element produced in a certain iteration is immediately consumed in the next iteration, being also covered by the array reference. So, a window would suffice: each access to the two array references could be safely redirected to the same memory location. The global flow of the mapping algorithm, computing the bounding windows of the arrays while taking into account the life span of the elements, is described below. The overall size of these windows yields the (one-layer) data memory requirement. Moreover, the window dimensions are used to map the array elements to the physical memory locations using modulo operations, as explained in Section I-B. Algorithm 2: Computation of the bounding windows of all the arrays in the algorithmic specification Step 1 Extract the array references from the given algorithmic specification and decompose the array references for every indexed signal into disjoint lattices. The decomposition of the array references will allow to find out which parts of an array reference are staying alive during the execution of a loop nest and which parts are consumed. This decomposition into disjoint lattices used also in [10] can be performed analytically, by recursively intersecting the array references of every indexed (multidimensional) signal in the code. Let be an indexed signal in the algorithmic specification and let be the initial set of s lattices (these are the lattices of the array references of ). An inclusion graph is gradually built; this is a directed acyclic graph whose nodes are lattices of and whose arcs denote inclusion relations between the respective sets. (For instance, an arc from node to node shows that the lattice in included in.) A high-level pseudo-code of the

5 1308 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 decomposition of given below: initialize the inclusion graph lattice; s array references into disjoint lattices is creating one node for each repeat { for (each pair of lattices in the set ) if ( && ) // if there is a path in from node to if { (1) if set ; // add an arc in from node to elsif set ; // add an arc in from node to (2) elsif (there exists already equivalent 5 to ) { if set ; if set ; } (3) else { add the new lattice to and add a node in ; set ; set ; } // add arcs, compute ; // yields a set of lattices each new lattice in the difference is added to, and corresp. new nodes with arcs towards are added in ; compute ; // yields a set of lattices each new lattice in the difference is added to, and corresp. new nodes with arcs towards are added in ; } } until (no new lattice can be added to the set ). While the intersection of two bounded lattices is itself a lattice, the difference may consist of several bounded lattices. These operations with lattices are described in detail and illustrated in [10]. At the beginning, the inclusion graph of the indexed signal contains only the lattices of the corresponding array references as nodes. Every pair of lattices between which no inclusion relation is known so far (i.e., between which there is currently no path in the graph) is intersected. If the intersection produces a nonempty lattice, there are basically three possibilities: 1) the resulting lattice is one of the intersection operands: in this case, inclusion arcs between the corresponding nodes must be added to the graph; 2) an equivalent linearly bounded lattice exists already in the set : in this case, arcs must be added only if necessary between the nodes corresponding to the equivalent lattice and to the operands; 3) the resulting lattice is a new element of the set : a new node is appended to the graph, along with arcs towards the two operands of the intersection. The inclusion graph is used on one hand to speed up the decomposition (for instance, if the intersection results to be empty, there is no sense of trying to intersect with the lattices included in since those intersections will be empty 5 Two linearly bounded lattices of the same indexed signal are equivalent if they represent the same set of indices, e.g., fx = i + jj0 i 2; 0 j 2g and fx = ij0 i 4g are equivalent. as well) and, on the other hand, to determine the structure of each array reference in terms of disjoint lattices: at the end, all the nodes in the graph without incident arcs represent disjoint lattices. Fig. 2(b) shows such an inclusion graph and the result of the decomposition of the six ( bold ) array references of the 2-D signal in the illustrative example from Fig. 2(a). The graph displays the inclusion relations between the lattices of. The six bold nodes represent the six array references of signal in the code. For instance, the node represents the lattice of from the first loop nest. The nodes are also labeled with the size of the corresponding lattice, that is, the number of array elements covered by it. In this example, and is also a bounded lattice denoted in Fig. 2(b). However, the difference is not a lattice due to the nonconvexity of this set, so the resulting set had to be decomposed further. At the end of the decomposition, contains the 17 lattices ; each of the 11 nodes without incident arcs represents a disjoint lattice in the partition of the array space of. Every array reference in the code is now either a disjoint lattice itself (like,, and ), or a union of disjoint lattices (for instance, ). Step 2 Compute preliminary windows (underestimations of ) for each indexed signal taking into account the live elements at the boundaries between the loop nests. Let be an indexed signal in the algorithmic specification. A high-level pseudo-code of the computation of s preliminary window is given as follows: for (each disjoint lattice ) // The set of disjoint lattices // was determined at Step 1 for (each dimension of signal ) compute and of using Algorithm 1; for (each boundary between the loop nests and ){ let be the set of disjoint lattices of, alive at boundary (i.e., produced before the boundary and consumed after it); for (each dimension of signal ){ ; ; ;// is the th side // of s bounding window at boundary } } for (each dimension of signal ) ; // is now the th side of s window over all boundaries First, the extreme values of the projections of every disjoint lattice are computed using Algorithm 1. Exemplifying for the lattice (Fig. 2(b)), ; and. After the decomposition of the array references, one can easily find out the loop nests where each disjoint lattice was produced and consumed (i.e., used as an operand for the last time). 6 For instance, the lattice 6 The algorithmic specifications are in the single-assignment form: each array element is written at most once, but it can be read an arbitrary number of times.

6 BALASA et al.: SIGNAL ASSIGNMENT TO HIERARCHICAL MEMORY ORGANIZATIONS 1309 belongs to the input signal, so it is alive when the execution of the code is initiated. is consumed in the first loop nest since it is included only in the array reference (the lattice ) in the assignment line marked with in Fig. 2(a). Let us consider, for instance, the boundary between the first and the second loop nest ( since the start of the code is boundary zero and single instructions outside nested loops can be considered nests of depth zero). Then, since these lattices are alive across the boundary, being consumed in the third loop nest. It follows that and ; consequently, s bounding window relative to the chosen boundary is. But since all the disjoint lattices are alive before the start of the first loop nest, then the bounding window for that boundary is larger:. Obviously, the largest window must be selected. Similarly, the signal will have a window since only one -element is alive between the blocks of code (,,,or ). The window of the 3-D signal is for the time being since no -element is alive at any boundary between blocks. Step 3 Update the bounding window (compute the exact values of ) for each indexed signal. The window sides computed for each index at Step 2 are maximal when every loop nest in the code either produces or consumes (but not both!) the array s elements. This works, e.g., for signal in Example 2, where every loop nest only consumes -elements[see Fig. 2(a)]. On the other hand, this does not work for and whose elements are both produced and consumed in the same loop nest. In this situation, the values of may be underestimates and, hence, may need to be increased. Let be a disjoint lattice of an indexed signal which is consumed (read for the last time) in a certain (normalized) loop nest where -elements are produced as well. Then: for (each -element covered by the lattice ){ compute the iteration vector when the -element is consumed; 7 determine the disjoint lattices of partially produced until the -element is consumed in iteration ; determine the disjoint lattices of partially consumed until the -element is consumed in iteration ; for (each dimension of signal ) for (each lattice in ){ compute and using Algorithm 1; ;// may increase } } The guiding idea is that local or global maxima of are reached immediately before the consumption of an -element, 7 This requires the computation of the maximum iterator vector relative to the lexicographic order [10]. If the lattice L is included in several array references, the overall maximum iterator vector is considered. which may entail a shrinkage of some side of the bounding window encompassing the live elements. There are quite frequent situations when array elements are produced and consumed at every iteration. Such cases occur when array references have one-to-one correspondences between the iterators and the indices. For instance, in Example 2, each iterator vector corresponds to a unique (produced) array element and a unique (consumed) element (see the assignment marked with in the first loop nest). Since each iteration will produce and consume one element, the bounding window of is. Similarly, each iterator corresponds to unique -elements and which yields a window. The storage requirement for Example 2 is obtained by summing up the individual window sizes: memory locations. It can be shown that this storage requirement is really the minimum one, therefore, the mapping model we adopted based on the bounding windows of simultaneously alive array elements was very effective for this code. However, this is not always the case: as already explained in Section I-B, signal A s window in the illustrative example from Fig. 1 is, therefore locations, whereas the minimum storage window of is only 19 locations. IV. MINIMUM STORAGE REQUIREMENT OF AN INDEXED SIGNAL As emphasized in Section I-B, the other previous works addressing signal-to-memory mapping made only relative evaluations, comparing the performances of different mapping models based on some set of benchmark applications. They were unable to provide information on how large is the oversize of their resulting array windows in comparison to the minimum array windows, or how large is the oversize of their overall memory allocation solution versus the minimum data storage requirement for the entire application. In contrast, our model provides such information, allowing thus a better performance evaluation of the mapping model. Our polyhedral framework can compute the optimal window for each multidimensional array in the specification, therefore, the optimal memory sharing between the elements of the same array (intraarray memory sharing). For instance, the signal from the illustrative example in Fig. 3 needs a minimum window of 1,752 memory locations since there are at most 1,752 -elements simultaneously alive as computed by the tool. Similarly, the minimum window (or the optimal memory sharing) of signal is 3,104 storage locations. However, since the elements of the arrays and can share the same locations if their lifetimes are disjoint, the minimum storage requirement is, actually, 4,536 locations, 8 less than the sum of the minimum windows of and. The minimum data storage requirement of the entire (procedural) algorithmic specification (therefore, the optimal memory sharing between all the array elements in the code (interarray memory sharing) is determined, as well [10]. While the previous works make only rel- 8 A detail of the memory trace generated by our tool is displayed in Fig. 3.

7 1310 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 Fig. 3. Code example and its memory trace illustrating the optimal intra- and interarray memory sharing. ative evaluations, comparing the performance of one mapping model versus other models or versus the total number of array elements from the specification code, our framework can provide an accurate evaluation of the absolute efficiency (in terms of data storage) of the mapping model employed and, more general, of any other mapping model. When computing the minimum array window (optimal memory sharing between the elements of a same array), an approach similar to Algorithm 2 is used with one key difference: instead of lattice windows, the exact lattice sizes are used instead. While the lattice size is a tight (reachable) lower-bound of the data memory needed to store the lattice, the window of the lattice although typically larger offers, in addition, a mapping solution (based on modulo computations) for its elements, which is acceptable from address generation point of view. Algorithm 3: Computation of the exact size of a lattice Fig D polytope (quadrilateral) representing the iterator space in Example 3. approach, which makes it appropriate for algorithmic specifications typical to multidimensional signal processing applications. Example 3: There are two distinct situations to be considered: Case 1 The affine vector function of the lattice is a one-to-one mapping. A sufficient condition is that the rank of matrix in (1) be equal to its number of columns. Then, the computation of the number of distinct signal indices (i.e., the amount of memory necessary to store the array elements covered by the lattice) is, hence, reduced to counting the number of iterator vectors in the iterator polytope in (1). In such a situation, a computation technique based on Barvinok s decomposition of a simplicial cone into unimodular cones [14] is used. Note that counting the points having integer coordinates in -polyhedra can be done in several ways: there are methods based on Ehrhart polynomials like, for instance, [15], or even much simpler adapting the Fourier Motzkin technique [13]. We decided to employ the current approach because of its scalability, which will become apparent by the end of this section. An example illustrating both the concepts and the technique is given below. Although the theoretical background is not explained in detail (see [14], [16] for more theoretical insight), this example is intended to illustrate the main steps of the computation flow. Moreover, it will clearly show the scalability of the Since the rank of matrix is 2, the vector function is a one-to-one mapping of the iterator space into the index space. It follows that the computation of the number of array elements covered by is equivalent to counting the number of points of integer coordinates in the iterator polytope shown in Fig. 4. This latter operation is done as explained below. Step 1 Find the vertices of the iterator polytope Fig. 4) and their supporting cones. (see Let be linearly independent integer vectors. The (rational polyhedral) cone generated by the rays is the set. For instance, the set of points inside the angle (see Fig. 4) is a 2-D cone generated by the rays and with the point taken as origin. There is a supporting cone corresponding to

8 BALASA et al.: SIGNAL ASSIGNMENT TO HIERARCHICAL MEMORY ORGANIZATIONS 1311 each vertex of a polyhedron. For instance, the supporting cone of the vertex (Fig. 4), denoted, is the one generated by the rays and. A cone is called unimodular if the matrix of the rays is unimodular. Given the inequalities defining the iterator space, the vertices and the rays are computed using the reverse search algorithm [17]. The supporting cones corresponding to the vertices of the iterator polytope, as well as their generating rays shown below as column vectors, are if the vertex of the cone is then ; if the ray then. The generating function of any cone is obtained by making the summation of the functions of all the unimodular component cones. From (2) and (4), the generating functions of the cones,, and are, respectively: Step 2 Decompose the supporting cones into unimodular cones [14]. The last two cones in our example and are not unimodular. Their decomposition is given as follows: 9 Step 3 Find out the generating function of the iterator polytope. Given a -polytope in the -dimensional space, for each point in, a term can be introduced. The function (or, simply, ) is defined as the generating function of the polytope [14]. For example, the generating function of the triangle with the vertices (0, 0), (1, 2), (2, 0) is, each monomial term corresponding to one of the lattice points inside or on the border of the triangle: e.g., corresponds to the point (1,2), and so on. By evaluating at, we get the number of points in [14]. For instance, if then, which is the number of points of integer coordinates in the triangle. Writing as a sum of monomial terms would be impractical for large polytopes. Fortunately, can be compactly written as an algebraic sum of rational functions, each term corresponding to one of the unimodular cones in the decomposition of the supporting cones of the polytope vertices (Steps 1 and 2). Every unimodular cone whose vertex has integer coordinates has associated a generating function [14] of the form where (, being the coordinates of ), and the product is over all the generating rays. For instance, 9 The computation is complex: the cones are polarized, then decomposed, then polarized back. This double dualization process eliminates the lower dimension component cones [16]. (2) (3) (4) (5) When not all the coordinates of vertex are integer (as in the case of vertex ), but still rational numbers (which is the case in practically all the multimedia algorithms), the generating function has the same form (5), but is the lattice point in the fundamental parallelepiped of the unimodular cone [14]. In this case, can be determined as follows [16]. Let be the matrix whose columns are the rays of the cone. Solve the linear system of equations, where is the coordinate vector of the vertex ( for ). Let be the solution of the system. Then the coordinates of vertex in the generating function are the elements of the vector, where. This technique applied to the two unimodular cones (3) components of yields and, respectively. The generating function of the cone is The sum of the rational functions is the generating function of the whole quadrilateral in Fig. 4. Step 4 Compute the number of points having integer coordinates from the generating function the whole polytope. In order to obtain a one-variable generating function,we make the substitutions and, where and are integers chosen such that no factor in the denominators of the terms of becomes zero. In this example, we choose. With the substitution, the generating function of the iterator polytope in Fig. 4 becomes: After eliminating the negative exponents in the denominators, and after factorizing, we substitute, obtaining rational terms of the form, where and are polynomials, and is the dimension of the iterator space of

9 1312 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 If and, the coefficients of the quotient can be obtained recursively as follows: and It can be proven [16] that the algebraic sum of the coefficients (since here the space dimension is 2) after the polynomial divisions in all the terms of is the number of points in the -polytope. In this example, the 6 coefficients (one for each term of ) are. Their sum is 21, which is indeed the number of lattice points inside or on the border of the quadrilateral in Fig. 4, and it is also the number of memory locations to store the array reference since the vector function is a one-to-one mapping from the iterator space to the index space, as already explained. Assume now that the line is shifted to the right, keeping its slope unchanged, until the coordinates of become (107,1), instead of (7,1). The equation of that line becomes (instead of ), and the new vertex will have the coordinates (214/3, 217/3). The computation effort necessary to count the number of points in the new, much larger, -polytope is not affected by the significant increase in its size. Indeed, the 4 supporting cones are generated by the same rays, the decompositions are the same, the generating functions are almost the same. The only difference appears at the numerators of and due to the modifications of the coordinates of these vertices. For instance, the numerator of becomes since the new coordinates of are (107,1), while the numerators of the two terms of become and, respectively. The number of points in the new -polytope is 3,921. Besides its excellent scalability, the technique sketched above, illustrated for a 2-D polytope (a quadrilateral), works for an arbitrary number of dimensions. Therefore, it is well-suited to address the large size of array references typical to telecom and multimedia applications. Case 2 The affine vector function of the lattice is not a one-to-one mapping. When the rank of matrix is smaller than (the number of columns of ), then is brought to the Hermite Normal Form [2] postmultiplying it by a unimodular matrix. The size of the lattice is, afterwards, obtained counting the number of points having integer coordinates from the projection of the -polytope on along the first coordinates [11], [12]. Example 4: The rank of matrix is, hence less than the number of columns of ; in this case, the vector function for may not be a one-to-one mapping. Indeed, the iterator vectors and are mapped to the same index. The unimodular transformation bringing at the Hermite Normal Form, modifies the initial iterator space into (where is the new iterator vector after the transformation ) whose exact 1-D projection (since ) is, obtained eliminating in the inequalities above [13]. The points in the dark shadow [11] correspond to lattice points in the modified iterator space. Checking the values of outside the dark shadow, and 2 result to be invalid projections: replacing these values in the modified iterator space, no integer solution for can be found. Conversely, and 3 are valid projection, since and (3,1) belong to the modified iterator space. Therefore, storing requires 13 locations: the index can take all the values between 0 and 14, except 1 and 2. The global flow of the computation of the minimum storage requirement of an indexed signal is similar to the flow of Algorithm 2 computing the bounding windows (see Section III). Actually, their implementations are merged together in the same framework for the sake of efficiency. In this way, the designer can simultaneously get a mapping solution together with information on the solution quality in terms of data storage. Algorithm 4: Computation of the minimum storage requirement for a given array in the algorithmic specification Step 1 (similar to Step 1 from Algorithm 2) Extract the array references of from the given algorithmic specification and decompose them into disjoint lattices. Step 2 Compute (an underestimation of) the minimum storage requirement for signal (denoted ) taking into account only the live -elements at the boundaries between the loop nests. for (each disjoint lattice ) // The set of disjoint lattices // was determined at Step 1 compute, the size of the lattice, using Algorithm 3; for (each boundary between the loop nests and ){ let be the set of disjoint lattices of, alive at boundary (i.e., produced before the boundary and consumed after it); ; // The minimum storage requirement of at boundary } ; // (underestimate of) // min. storage requirement of over all loop boundaries Step 3 Compute the exact value of. after Step 2 is the exact value of the minimum storage when all the -elements are either produced or consumed (but not both!) in the loop nests. If this is not the case, a local or global maximum of is reached immediately before the consumption of some -element. Step 3 actually investigates all the local maxima within that loop nest, adjusting upwards. Let be a disjoint lattice which is

10 BALASA et al.: SIGNAL ASSIGNMENT TO HIERARCHICAL MEMORY ORGANIZATIONS 1313 consumed in a (normalized) loop nest where produced as well. Then -elements are for (each -element covered by the lattice ){ compute the iteration vector when the -element is read for the last time; determine the disjoint lattices of partially produced until the -element is consumed in iteration ; determine the disjoint lattices of partially consumed until the -element is consumed in iteration ; if } V. MAPPING SIGNALS INTO A HIERARCHICAL MEMORY SUBSYSTEM In embedded communication and multimedia processing system, the on-chip memory is often complemented by an external (off-chip) memory for storing the large amounts of data during the execution of the application. The memory subsystem is typically a major contributor to the overall energy budget of the entire system. The energy cost can be reduced and the system performance enhanced by introducing an optimized custom memory hierarchy that exploits the temporal locality of the data accesses [3]. Savings of dynamic energy (which expands only when memory accesses occur) at the level of the whole memory subsystem can be mainly obtained by accessing frequently used data from smaller on-chip memories rather than from large background (off-chip) memories. As on-chip storage, the scratch-pad memories (SPMs) software-controlled static or dynamic random-access memories, more energy-efficient than caches are widely used in embedded systems, in which the flexibility of caches in terms of workload adaptability is often unnecessary, whereas power consumption and cost play a much more critical role. Different from caches, the SPM occupies one distinct part of the virtual address space with the rest of the address space occupied by the main memory. The consequence is that there is no need to check for the availability of the data in the SPM. Hence, the SPM does not possess a comparator and the miss/hit acknowledging circuitry. This contributes to a significant energy (as well as area) reduction. Another consequence is that in cache memory systems, the mapping of data to the cache is done during run-time, whereas in SPM-based systems it can be done at compilation time. The problem on energy-efficient assignment of signals to the on- and off-chip memories has been studied since the late nineties. These previous works focused on partitioning the arrays into copy candidates and on the optimal selection and mapping of these into the memory hierarchy [18] [21]. The general idea was to identify the data (arrays or parts of arrays) that are most frequently accessed in each loop nest. Copying these heavily accessed data from the large off-chip memory to a smaller on-chip memory can potentially save energy (since most accesses will take place on the smaller copy and not on the large, more energy consuming, original array) and also ; improve performance. Many different possibilities exist for deciding on which parts of the arrays should be copy candidates and, also, for selecting among the candidates those which will be instantiated as copies and their assignment to the different memory layers. For instance, Kandemir et al. analyze and exploit the temporal locality by inserting local copies [19]. Their layer assignment builds a separate hierarchy per loop nest and then combines them into a single hierarchy. However, the approach lacks a global view on the (part of) arrays lifetimes in applications having imperfect nested loops. Brockmeyer et al. use the steering heuristic of assigning the arrays having the highest access number over size ratio to the highest memory layer first, followed by incremental reassignments [20]. They take into account the relative lifetime differences between arrays and between the scalars covered by each array. However, it is not clear whether the copy candidates can be also parts of arrays instead of entire arrays (and if so, how they identify these parts) since the access patterns are, in general, not uniform. Hu et al. can use parts of arrays as copies, but they typically use cuts along the array dimensions [21] (like rows and columns of matrices), using the index space of the array references and also their intersections. Different from these previous works, our framework identifies with precision those parts of arrays more intensely accessed. These parts represented as lattices are assigned to the on-chip scratch-pad memory, achieving a reduction of the dynamic energy consumption due to memory accesses. Since the bounding window of any lattice can be computed using Algorithm 1, those lattices covering the most accessed elements of the arrays are mapped to the SPM and Algorithm 2 can find their bounding windows. The same Algorithm 2 will compute the bounding windows for the off-chip memory, as well. Algorithm 5: Hierarchical (two-layer) energy-aware memory allocation for an index signal A for (each disjoint lattice ) compute (Algorithm 3) the number of memory accesses to ; assign to the SPM a subset of disjoint lattices having the highest average access values, such that ; // the total size of the lattices assigned to SPM should not exceed its maximum size compute using Algorithm 2 the bounding window of the lattices in the subset ; // Mapping solution for SPM compute using Algorithm 2 the bounding window of the lattices in ; // Mapping for the off-chip mem. For instance, consider the lattice derived from the illustrative example in Fig. 2(a). As is included in the array references from all the three loop nests and it also coincides with the operand from the first loop nest (see Fig. 5), the contributions of memory accesses for all these four array references must be computed. Since the lattice of the first array reference is,

11 1314 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 Fig. 5. Computation of memory accesses for the disjoint lattices. A15 is included in four array references: A1, A2, A3, and itself. The number of accesses from each array reference is 5249, 5249, 5249, and 409,825. The total numbers of accesses are listed together with the average numbers of accesses per A-element. the expressions of the iterators mapping the array elements into are. The size of this lattice is 5,249 (computed with Algorithm 3 as explained in Section IV), representing the number of memory accesses to due to the first array reference. Performing similar computations, the distribution of accesses in the whole array space is computed. The lattices,, and covering roughly the center of the array space of signal are the most accessed array parts (see Fig. 5): 425,572 memory accesses (the total number of accesses being 2,458,950) for each of the three lattices,, and ; therefore, an average of 4, accesses for each -element covered by them. 10 Assuming,, and are copied in the scratch-pad memory (the subset ), their bounding windows computed with Algorithm 1 (since all their elements are simultaneously alive) are for each lattice. Moreover, since all these three lattices are simultaneously alive, the overall bounding window for them is. Obviously, this window is much smaller than the window of the whole array. The -elements stored in the on-chip SPM should use the mapping function, whereas the -elements stored off-chip do not need any mapping function, since the window is as large as the entire array space of. This allocation solution with two memory layers implies, besides the background memory of 10,787 locations, another memory of locations in the SPM, closer to the processor. The increase of data memory space is compensated here by savings of dynamic energy of about 51.5%, according to the CACTI power model [22]. Indeed, assuming typical SPM and off-chip memory energy values for read accesses of nj and 3.57 nj (for memory sizes as in this illustrative example), the dynamic energy consumption caused by accesses to the array would decrease from 8.75 mj to 4.25 mj according to the power model [22]. 10 Although the number of memory accesses is higher for A11 and A12, these lattices are significantly larger (4830 A-elements each) and, hence, the average number of accesses per element is much lower. VI. EXPERIMENTAL RESULTS A software tool performing the mapping of the multidimensional arrays from a given algorithmic specification (expressed in a subset of the C language) into the data memory has been implemented in C++, incorporating the algorithms described in this paper. Table I summarizes part of the results of our experiments. Columns 2 and 3 display the numbers of array references and array elements in the specification code. Column 4 shows the data memory sizes (i.e., the total window sizes of the arrays 11 ) after the signal-to-memory mapping, computed as explained in Section III. Column 5 displays the sum of the minimum array windows (optimized intraarray memory sharing), computed as explained in Section IV. Column 6 shows the minimum storage requirements of the application codes, when the memory sharing between elements of different arrays is optimized as well [10]. For a better understanding of the meaning of the data shown in this table, the first tests are the illustrative examples from this paper. The rest of the benchmark tests are signal processing applications and typical algebraic kernels used in this domain. The benchmarks used are the following: 1) a motion detection algorithm used in the transmission of real-time video signals on data networks [3]; 2) a real-time regularity detection algorithm used in robot vision; 3) a 2-D Gaussian blur filter from a medical image processing application which extracts contours from tomography images in order to detect brain tumors; 4) a singular value decomposition (SVD) updating algorithm [23] used in spatial division multiplex access (SDMA) modulation in mobile communication receivers, in beamforming, and Kalman filtering; 5) an autocorrelation algorithm used in GSM; 6) the kernel of a voice coding application essential component of a mobile radio terminal. 11 If two entire arrays have disjoint lifetimes, their bounding windows can be mapped starting at the same base address. Then, the maximum of the two window sizes is added to the other window sizes. Such situations, or even more general ones, do not occur in the current tests; so, the sum of the window sizes is a reasonable measure for the total storage requirement after mapping.

12 BALASA et al.: SIGNAL ASSIGNMENT TO HIERARCHICAL MEMORY ORGANIZATIONS 1315 TABLE I EXPERIMENTAL RESULTS. COLUMN 4 DISPLAYS THE SIZE OF THE DATA MEMORY AFTER MAPPING (THE TOTAL SIZE OF THE BOUNDING WINDOWS OF THE INDEXED SIGNALS). COLUMNS 5 AND 6 DISPLAY THE TOTAL SIZE OF THE MINIMUM ARRAY WINDOWS (AS EXPLAINED IN Section IV) AND, RESPECTIVELY, THE MINIMUM SIZE OF THE DATA MEMORY [10] Different from all the other signal-to-memory mapping techniques, this computation methodology allows to determine the additional storage implied by the mapping model by computing the minimum window (storage requirement) of each individual array and the minimum data storage for the execution of the whole code (the last two columns in the result table), providing thus useful information for the initial design phase of systemlevel exploration. The importance of the data in the last two columns is mainly theoretical, serving for the evaluation of the allocation results (column 4). A minimum data storage can indeed be obtained, but there is a price to be paid since the typical consequence is a very complex control and address generation hardware. As already explained in Section I-B, by means of the signal-to-memory mapping model, an excess of storage in memory allocation is traded-off against a less complex address generation hardware (practically all the mapping models using modulo computations). This tool has the unique feature of yielding a measure of this compromise the amount of extra storage. 12 Table I shows that there are applications, like the motion detection, where this mapping model gives very good solutions, the bounding windows being equal to the optimal array windows, and the amount of memory after mapping being almost equal to the minimum storage. On the other hand, there are also applications where the bounding window sizes result significantly larger than the optimal size of the data memory (e.g., about two times larger for the SVD updating). In such a situation, the result only indicates a more difficult case which needs further study and improvement. For instance, performing code transformations (like loop merging) on the regularity detection code, we succeeded to decrease the memory size after mapping from 4,353 to only 657 locations (see Table I). Func- 12 De Greef et al. display the minimum storage for those benchmarks having a more reduced amount of array elements; they obtained those data by symbolic simulation, but this method proved infeasible for larger examples [5]. tionally equivalent codes can yield very different data storage lower-bounds [10] and, also, mapping solutions. Our tool can help the designer to explore (in storage point of view) different equivalent implementations of the application. Another possibility of improving the mapping results is to try also other models when several tools are available (since there is no perfect mapping anyway!). Applying to the SVD updating application De Greef s mapping model [5], a reduction of the memory size after mapping of about 28% was obtained, in spite of a significant increase of the computation time. There are situations when the linearization-based model [5] yields identical bounding windows as this model (e.g., motion and regularity detection). This is caused by the linear or rectangular lattices in the decomposition of the array space (like, for instance, in Example 2 from Fig. 2), whose windows are identical in both models. This approach yields better results than [5] in situations where the size of the arrays can be reduced to windows in any dimension (because any canonical linearization contains a number of unused elements) and, also, in situations where there are signals with holes in their array space. On the other hand, there are cases like the SVD updating where the linearization model can yield better results (see columns 2 and 3 in Table II). Again, the positive aspect is that this tool is able to measure the quality of the mapping, helping the designer to take a decision, whereas the other works in the field do not provide similar information. Although our approach employs a mapping strategy similar to [1], the computation methodology is entirely different. Tronçon et al. perform first a liveness analysis. During this phase, a set of program points is firstly decided (for instance, at the beginning and at the end of each loop body). Every program point is annotated with a set of -polyhedra, whose integer solutions specify sets of live array elements. In this way, all the live array elements in the chosen program points can be determined. Afterwards, an evaluation of the lower-bounds of window sides is performed. Starting with, all the integer solu-

13 1316 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 2009 TABLE II EXPERIMENTAL RESULTS. COLUMNS 2 AND 3 SHOW THE MEMORY SIZES APPLYING DIFFERENT MAPPING ALGORITHMS. COLUMNS 4 AND 5 DISPLAY THE CPU TIMES OBTAINED RUNNING IMPLEMENTATIONS OF DE GREEF S [5] AND TRONÇON S [1] ALGORITHMS.COLUMN 6 SHOWS THE CPU TIMES OBTAINED RUNNING OUR TOOL. THE TESTS HAVE BEEN CARRIED OUT ON A PC WITH A 1.85-GHz ATHLON XP PROCESSOR tions differing in the th coordinate of the computed -polyhedra are tested whether or not. The equality implies that a potential mapping conflict was found and, consequently, the window side should be incremented. Subsequent upward adjustments of are performed for all the chosen program points till no mapping conflict occurs. In contrast, our lattice-oriented methodology requires significantly fewer program points (only between loop nests). In addition, our computation of the window sides is mostly based on lattice projections rather than on incremental adjustments and enumerations of the integer points in -polyhedra. These are the reasons why our computation methodology is significantly faster and has a better scalability in terms of number of array elements. The algorithm [5] is even slower than [1] since the number of canonical linearizations to be analyzed increases quickly with the dimension of the signal. The experimental results in Table II (the last three columns) confirm our expectations based on theoretical insight that our mapping algorithm is regularly several times faster. VII. CONCLUSION sthis paper has addressed the problem of memory allocation for (embedded) multidimensional signal processing systems. The paper proposes a signal-to-memory mapping algorithm based on the computation of bounding windows for the simultaneously alive array elements. Based on a methodology operating with lattices, the technique can be also applied in multilayer memory hierarchies. The software tool implementing this algorithm can additionally compute the minimum window for each multidimensional array in the specification, as well as the minimum storage requirements, providing an accurate evaluation of the efficiency of the mapping model. This is a unique feature which is not encountered in any other works addressing similar topics. ACKNOWLEDGMENT The authors would like to thank their colleagues from IMEC and K.U. Leuven Prof. F. Catthoor, Prof. M. Bruynooghe, Dr. E. De Greef, Dr. Q. Fu, and Dr. K. C. Shashidhar for helpful discussions and for sharing with them some of the benchmark tests used in this paper. REFERENCES [1] R. Tronçon, M. Bruynooghe, G. Janssens, and F. Catthoor, Storage size reduction by in-place mapping of arrays, in Verification, Model Checking and Abstract Interpretation, ser. Lecture Notes Comput. Sci., A. Coresi, Ed. New York: Springer, 2002, pp [2] A. Schrijver, Theory of Linear and Integer Programming. New York: Wiley, [3] F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle, Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Boston, MA: Kluwer, [4] J. Ramanujam, J. Hong, M. Kandemir, and A. Narayan, Reducing memory requirements of nested loops for embedded systems, in Proc. 38th ACM/IEEE Design Aut. Conf., Jun. 2001, pp [5] E. De Greef, F. Catthoor, and H. De Man, A. Krikelis, Ed., Memory size reduction through storage order optimization for embedded parallel multimedia applications, Special Issue on Parallel Processing and Multimedia, Parallel Comput., vol. 23, no. 12, pp , Dec [6] V. Lefebvre and P. Feautrier, Automatic storage management for parallel programs, Parallel Comput., vol. 24, pp , [7] F. Quilleré and S. Rajopadhye, Optimizing memory usage in the polyhedral model, ACM Trans. Programm. Lang. Syst., vol. 22, no. 5, pp , [8] A. Darte, R. Schreiber, and G. Villard, Lattice-based memory allocation, IEEE Trans. Comput., vol. 54, no. 10, pp , Oct [9] S. Meftali, F. Garsalli, F. Rousseau, and A. A. Jerraya, An optimal memory allocation for application-specific multiprocessor system-onchip, in Proc. Int. Symp. System Synthesis, Montreal, QC, Canada, Oct. 2001, pp [10] F. Balasa, H. Zhu, and I. I. Luican, Computation of storage requirements for multidimensional signal processing applications, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 4, pp , Apr [11] W. Pugh, A practical algorithm for exact array dependence analysis, Commun. ACM, vol. 35, no. 8, pp , Aug [12] S. Verdoolaege, K. Beyls, M. Bruynooghe, and F. Catthoor, Experiences with enumeration of integer projections of parametric polytopes, in Proc. Compiler Construction: 14th Int. Conf., R. Bodik, Ed., 2005, vol. 3443, pp , Springer. [13] G. B. Dantzig and B. C. Eaves, Fourier Motzkin elimination and its dual, J. Combin. Theory A, vol. 14, pp , 1973.

14 BALASA et al.: SIGNAL ASSIGNMENT TO HIERARCHICAL MEMORY ORGANIZATIONS 1317 [14] A. I. Barvinok and J. Pommersheim, An algorithmic theory of lattice points in polyhedra, in New Perspectives in Algebraic Combinatorics. Cambridge, U.K.: Cambridge Univ. Press, 1999, pp [15] P. Clauss, Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: Applications to analyze and transform scientific programs, in Proc. ACM Int. Conf. Supercomputing, 1996, pp [16] J. A. De Loera, R. Hemmecke, J. Tauzer, and R. Yoshida, Effective lattice point counting in rational convex polytopes, J. Symbol. Comput., vol. 38, no. 4, pp , [17] D. Avis, LRS: A revised implementation of the reverse search vertex enumeration algorithm, in Polytopes Combinatorics and Computation, G. Kalai and G. Ziegler, Eds. Berlin, Germany: Birkhäuser-Verlag, 2000, pp [18] S. Wuytack, J.-P. Diguet, F. Catthoor, and H. De Man, Formalized methodology for data reuse exploration for low-power hierarchical memory mappings, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 4, pp , Dec [19] M. Kandemir and A. Choudhary, Compiler-directed scratch-pad memory hierarchy design and management, in Proc. 39th ACM/IEEE Design Automation Conf., Las Vegas, NV, Jun. 2002, pp [20] E. Brockmeyer, M. Miranda, H. Corporaal, and F. Catthoor, Layer assignment techniques for low energy in multilayered memory organisations, in Proc. 6th ACM/IEEE Design and Test in Europe Conf., Munich, Germany, Mar. 2003, pp [21] Q. Hu, A. Vandecapelle, M. Palkovic, P. G. Kjeldsberg, E. Brockmeyer, and F. Catthoor, Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications, in Proc. Asia-South Pacific Design Automation Conf., Yokohama, Japan, Jan. 2006, pp [22] G. Reinman and N. P. Jouppi, CACTI: An Integrated Cache Timing and Power Model COMPAQ Western Research Lab, [23] M. Moonen, P. Van Dooren, and J. Vandewalle, An SVD updating algorithm for subspace tracking, SIAM J. Matrix Anal. Appl., vol. 13, no. 4, pp , Florin Balasa received the Ph.D. degree in computer science from the Polytechnical University of Bucharest, Bucharest, Romania, in 1994, and the Ph.D. degree in electrical engineering from the Katholieke Universiteit Leuven, Leuven, Belgium, in He is currently an Associate Professor of Computer Science at Southern Utah University, Cedar City. His research interests are mainly focused on memory management algorithms for digital signal processing, high-level synthesis and physical design automation, optimization techniques, and applications in CAD VLSI. Dr. Balasa was a recipient of the National Science Foundation CAREER Award. Hongwei Zhu received the B.S. degree in electrical engineering from Xi an Jiaotong University, Xi an, China, in 1996, the M.S. degree in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2001, and the Ph.D. degree in computer science from the University of Illinois at Chicago, in He is currently a software engineer in the Design Automation Group of the Physical IP Department at ARM, Inc., San Jose, CA. His research interests are mainly focused on memory management for real-time multidimensional signal processing and combinatorial optimization in CAD VLSI. Ilie I. Luican received the B.S. and M.S. degrees in computer science from the Polytechnical University of Bucharest (PUB), Bucharest, Romania, in 2002 and 2003, respectively. He is currently pursuing the Ph.D. degree in the Department of Computer Science, University of Illinois at Chicago. His research interests are mainly focused on high-level synthesis and memory management for real-time multidimensional signal processing.

Mapping Multi-Dimensional Signals into Hierarchical Memory Organizations

Mapping Multi-Dimensional Signals into Hierarchical Memory Organizations Mapping Multi-Dimensional Signals into Hierarchical Memory Organizations Hongwei Zhu Ilie I. Luican Florin Balasa Dept. of Computer Science, University of Illinois at Chicago, {hzhu7,iluica2,fbalasa}@uic.edu

More information

Signal Assignment Model for the Memory Management of Multidimensional Signal Processing Applications

Signal Assignment Model for the Memory Management of Multidimensional Signal Processing Applications J Sign Process Syst (2011) 63:51 65 DOI 10.1007/s11265-009-0386-8 Signal Assignment Model for the Memory Management of Multidimensional Signal Processing Applications Florin Balasa Ilie I. Luican Hongwei

More information

IN TELECOM and real-time multimedia applications including

IN TELECOM and real-time multimedia applications including IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 4, APRIL 2007 447 Computation of Storage Requirements for Multi-Dimensional Signal Processing Applications Florin Balasa,

More information

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian

More information

Reducing the Dynamic Energy Consumption in the Multi-Layer Memory of Embedded Multimedia Processing Systems

Reducing the Dynamic Energy Consumption in the Multi-Layer Memory of Embedded Multimedia Processing Systems Reducing the Dynamic Energy Consumption in the Multi-Layer Memory of Embedded Multimedia Processing Systems Ilie I. Luican Hongwei Zhu Florin Balasa Dhiraj K. Pradhan Dept. of Computer Science, University

More information

Memory Size Computation for Multimedia Processing Applications

Memory Size Computation for Multimedia Processing Applications Memory Size Computation for Multimedia rocessing Applications Hongwei hu Ilie I Luican Florin alasa Dept of Computer Science, University of Illinois at Chicago, Chicago, IL 667, USA Abstract In real-time

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini DM545 Linear and Integer Programming Lecture 2 The Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. 2. 3. 4. Standard Form Basic Feasible Solutions

More information

Investigating Mixed-Integer Hulls using a MIP-Solver

Investigating Mixed-Integer Hulls using a MIP-Solver Investigating Mixed-Integer Hulls using a MIP-Solver Matthias Walter Otto-von-Guericke Universität Magdeburg Joint work with Volker Kaibel (OvGU) Aussois Combinatorial Optimization Workshop 2015 Outline

More information

Legal and impossible dependences

Legal and impossible dependences Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral

More information

Some Advanced Topics in Linear Programming

Some Advanced Topics in Linear Programming Some Advanced Topics in Linear Programming Matthew J. Saltzman July 2, 995 Connections with Algebra and Geometry In this section, we will explore how some of the ideas in linear programming, duality theory,

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

MATH 890 HOMEWORK 2 DAVID MEREDITH

MATH 890 HOMEWORK 2 DAVID MEREDITH MATH 890 HOMEWORK 2 DAVID MEREDITH (1) Suppose P and Q are polyhedra. Then P Q is a polyhedron. Moreover if P and Q are polytopes then P Q is a polytope. The facets of P Q are either F Q where F is a facet

More information

Interleaving Schemes on Circulant Graphs with Two Offsets

Interleaving Schemes on Circulant Graphs with Two Offsets Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

A Connection between Network Coding and. Convolutional Codes

A Connection between Network Coding and. Convolutional Codes A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

arxiv: v1 [math.co] 25 Sep 2015

arxiv: v1 [math.co] 25 Sep 2015 A BASIS FOR SLICING BIRKHOFF POLYTOPES TREVOR GLYNN arxiv:1509.07597v1 [math.co] 25 Sep 2015 Abstract. We present a change of basis that may allow more efficient calculation of the volumes of Birkhoff

More information

Polytopes Course Notes

Polytopes Course Notes Polytopes Course Notes Carl W. Lee Department of Mathematics University of Kentucky Lexington, KY 40506 lee@ms.uky.edu Fall 2013 i Contents 1 Polytopes 1 1.1 Convex Combinations and V-Polytopes.....................

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if

POLYHEDRAL GEOMETRY. Convex functions and sets. Mathematical Programming Niels Lauritzen Recall that a subset C R n is convex if POLYHEDRAL GEOMETRY Mathematical Programming Niels Lauritzen 7.9.2007 Convex functions and sets Recall that a subset C R n is convex if {λx + (1 λ)y 0 λ 1} C for every x, y C and 0 λ 1. A function f :

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Lecture 3. Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets. Tepper School of Business Carnegie Mellon University, Pittsburgh

Lecture 3. Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets. Tepper School of Business Carnegie Mellon University, Pittsburgh Lecture 3 Corner Polyhedron, Intersection Cuts, Maximal Lattice-Free Convex Sets Gérard Cornuéjols Tepper School of Business Carnegie Mellon University, Pittsburgh January 2016 Mixed Integer Linear Programming

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Lecture 2 - Introduction to Polytopes

Lecture 2 - Introduction to Polytopes Lecture 2 - Introduction to Polytopes Optimization and Approximation - ENS M1 Nicolas Bousquet 1 Reminder of Linear Algebra definitions Let x 1,..., x m be points in R n and λ 1,..., λ m be real numbers.

More information

ACTUALLY DOING IT : an Introduction to Polyhedral Computation

ACTUALLY DOING IT : an Introduction to Polyhedral Computation ACTUALLY DOING IT : an Introduction to Polyhedral Computation Jesús A. De Loera Department of Mathematics Univ. of California, Davis http://www.math.ucdavis.edu/ deloera/ 1 What is a Convex Polytope? 2

More information

PITSCO Math Individualized Prescriptive Lessons (IPLs)

PITSCO Math Individualized Prescriptive Lessons (IPLs) Orientation Integers 10-10 Orientation I 20-10 Speaking Math Define common math vocabulary. Explore the four basic operations and their solutions. Form equations and expressions. 20-20 Place Value Define

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Lecture 9 Basic Parallelization

Lecture 9 Basic Parallelization Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning

More information

Pacific Journal of Mathematics

Pacific Journal of Mathematics Pacific Journal of Mathematics SIMPLIFYING TRIANGULATIONS OF S 3 Aleksandar Mijatović Volume 208 No. 2 February 2003 PACIFIC JOURNAL OF MATHEMATICS Vol. 208, No. 2, 2003 SIMPLIFYING TRIANGULATIONS OF S

More information

Matching and Planarity

Matching and Planarity Matching and Planarity Po-Shen Loh June 010 1 Warm-up 1. (Bondy 1.5.9.) There are n points in the plane such that every pair of points has distance 1. Show that there are at most n (unordered) pairs of

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1 UNIT I INTRODUCTION CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1. Define Graph. A graph G = (V, E) consists

More information

THEORY OF LINEAR AND INTEGER PROGRAMMING

THEORY OF LINEAR AND INTEGER PROGRAMMING THEORY OF LINEAR AND INTEGER PROGRAMMING ALEXANDER SCHRIJVER Centrum voor Wiskunde en Informatica, Amsterdam A Wiley-Inter science Publication JOHN WILEY & SONS^ Chichester New York Weinheim Brisbane Singapore

More information

Rigidity, connectivity and graph decompositions

Rigidity, connectivity and graph decompositions First Prev Next Last Rigidity, connectivity and graph decompositions Brigitte Servatius Herman Servatius Worcester Polytechnic Institute Page 1 of 100 First Prev Next Last Page 2 of 100 We say that a framework

More information

Lattice-Based Memory Allocation

Lattice-Based Memory Allocation Lattice-Based Memory Allocation Gilles Villard CNRS, Laboratoire LIP, ENS Lyon France Joint work with Alain Darte (CNRS, LIP) and Rob Schreiber (HP Labs) Int. Conf. Compilers, Architecture and Synthesis

More information

Interactive Math Glossary Terms and Definitions

Interactive Math Glossary Terms and Definitions Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Addend any number or quantity being added addend + addend = sum Additive Property of Area the

More information

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008 LP-Modelling dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven January 30, 2008 1 Linear and Integer Programming After a brief check with the backgrounds of the participants it seems that the following

More information

Convex Hulls in Three Dimensions. Polyhedra

Convex Hulls in Three Dimensions. Polyhedra Convex Hulls in Three Dimensions Polyhedra Polyhedron 1.A polyhedron is the generalization of a 2- D polygon to 3-D A finite number of flat polygonal faces The boundary or surface of a polyhedron - Zero-dimensional

More information

EXTREME POINTS AND AFFINE EQUIVALENCE

EXTREME POINTS AND AFFINE EQUIVALENCE EXTREME POINTS AND AFFINE EQUIVALENCE The purpose of this note is to use the notions of extreme points and affine transformations which are studied in the file affine-convex.pdf to prove that certain standard

More information

Math 5593 Linear Programming Lecture Notes

Math 5593 Linear Programming Lecture Notes Math 5593 Linear Programming Lecture Notes Unit II: Theory & Foundations (Convex Analysis) University of Colorado Denver, Fall 2013 Topics 1 Convex Sets 1 1.1 Basic Properties (Luenberger-Ye Appendix B.1).........................

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

MA4254: Discrete Optimization. Defeng Sun. Department of Mathematics National University of Singapore Office: S Telephone:

MA4254: Discrete Optimization. Defeng Sun. Department of Mathematics National University of Singapore Office: S Telephone: MA4254: Discrete Optimization Defeng Sun Department of Mathematics National University of Singapore Office: S14-04-25 Telephone: 6516 3343 Aims/Objectives: Discrete optimization deals with problems of

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

4 Integer Linear Programming (ILP)

4 Integer Linear Programming (ILP) TDA6/DIT37 DISCRETE OPTIMIZATION 17 PERIOD 3 WEEK III 4 Integer Linear Programg (ILP) 14 An integer linear program, ILP for short, has the same form as a linear program (LP). The only difference is that

More information

MOST attention in the literature of network codes has

MOST attention in the literature of network codes has 3862 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010 Efficient Network Code Design for Cyclic Networks Elona Erez, Member, IEEE, and Meir Feder, Fellow, IEEE Abstract This paper introduces

More information

Math 7 Glossary Terms

Math 7 Glossary Terms Math 7 Glossary Terms Absolute Value Absolute value is the distance, or number of units, a number is from zero. Distance is always a positive value; therefore, absolute value is always a positive value.

More information

γ 2 γ 3 γ 1 R 2 (b) a bounded Yin set (a) an unbounded Yin set

γ 2 γ 3 γ 1 R 2 (b) a bounded Yin set (a) an unbounded Yin set γ 1 γ 3 γ γ 3 γ γ 1 R (a) an unbounded Yin set (b) a bounded Yin set Fig..1: Jordan curve representation of a connected Yin set M R. A shaded region represents M and the dashed curves its boundary M that

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties. Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square

More information

FACES OF CONVEX SETS

FACES OF CONVEX SETS FACES OF CONVEX SETS VERA ROSHCHINA Abstract. We remind the basic definitions of faces of convex sets and their basic properties. For more details see the classic references [1, 2] and [4] for polytopes.

More information

The Simplex Algorithm

The Simplex Algorithm The Simplex Algorithm Uri Feige November 2011 1 The simplex algorithm The simplex algorithm was designed by Danzig in 1947. This write-up presents the main ideas involved. It is a slight update (mostly

More information

Math 414 Lecture 2 Everyone have a laptop?

Math 414 Lecture 2 Everyone have a laptop? Math 44 Lecture 2 Everyone have a laptop? THEOREM. Let v,...,v k be k vectors in an n-dimensional space and A = [v ;...; v k ] v,..., v k independent v,..., v k span the space v,..., v k a basis v,...,

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM EXERCISES Prepared by Natashia Boland 1 and Irina Dumitrescu 2 1 Applications and Modelling 1.1

More information

Convex Hull Representation Conversion (cddlib, lrslib)

Convex Hull Representation Conversion (cddlib, lrslib) Convex Hull Representation Conversion (cddlib, lrslib) Student Seminar in Combinatorics: Mathematical Software Niklas Pfister October 31, 2014 1 Introduction In this report we try to give a short overview

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

6th Bay Area Mathematical Olympiad

6th Bay Area Mathematical Olympiad 6th Bay Area Mathematical Olympiad February 4, 004 Problems and Solutions 1 A tiling of the plane with polygons consists of placing the polygons in the plane so that interiors of polygons do not overlap,

More information

Calculation of extended gcd by normalization

Calculation of extended gcd by normalization SCIREA Journal of Mathematics http://www.scirea.org/journal/mathematics August 2, 2018 Volume 3, Issue 3, June 2018 Calculation of extended gcd by normalization WOLF Marc, WOLF François, LE COZ Corentin

More information

Minimizing Memory Strides using Integer Transformations of Parametric Polytopes

Minimizing Memory Strides using Integer Transformations of Parametric Polytopes Minimizing Memory Strides using Integer Transformations of Parametric Polytopes Rachid Seghir, Vincent Loechner and Benoît Meister ICPS/LSIIT UMR CNRS-ULP 7005 Pôle API, Bd Sébastien Brant F-67400 Illkirch

More information

INTRODUCTION TO THE HOMOLOGY GROUPS OF COMPLEXES

INTRODUCTION TO THE HOMOLOGY GROUPS OF COMPLEXES INTRODUCTION TO THE HOMOLOGY GROUPS OF COMPLEXES RACHEL CARANDANG Abstract. This paper provides an overview of the homology groups of a 2- dimensional complex. It then demonstrates a proof of the Invariance

More information

Performance Evaluation of a Novel Direct Table Lookup Method and Architecture With Application to 16-bit Integer Functions

Performance Evaluation of a Novel Direct Table Lookup Method and Architecture With Application to 16-bit Integer Functions Performance Evaluation of a Novel Direct Table Lookup Method and Architecture With Application to 16-bit nteger Functions L. Li, Alex Fit-Florea, M. A. Thornton, D. W. Matula Southern Methodist University,

More information

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse Tutorial Outline Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse exams. Math Tutorials offer targeted instruction,

More information

Linear Loop Transformations for Locality Enhancement

Linear Loop Transformations for Locality Enhancement Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation

More information

GEOMETRIC TOOLS FOR COMPUTER GRAPHICS

GEOMETRIC TOOLS FOR COMPUTER GRAPHICS GEOMETRIC TOOLS FOR COMPUTER GRAPHICS PHILIP J. SCHNEIDER DAVID H. EBERLY MORGAN KAUFMANN PUBLISHERS A N I M P R I N T O F E L S E V I E R S C I E N C E A M S T E R D A M B O S T O N L O N D O N N E W

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

Introduction Rasterization Z-buffering Shading. Graphics 2012/2013, 4th quarter. Lecture 09: graphics pipeline (rasterization and shading)

Introduction Rasterization Z-buffering Shading. Graphics 2012/2013, 4th quarter. Lecture 09: graphics pipeline (rasterization and shading) Lecture 9 Graphics pipeline (rasterization and shading) Graphics pipeline - part 1 (recap) Perspective projection by matrix multiplication: x pixel y pixel z canonical 1 x = M vpm per M cam y z 1 This

More information

ON SWELL COLORED COMPLETE GRAPHS

ON SWELL COLORED COMPLETE GRAPHS Acta Math. Univ. Comenianae Vol. LXIII, (1994), pp. 303 308 303 ON SWELL COLORED COMPLETE GRAPHS C. WARD and S. SZABÓ Abstract. An edge-colored graph is said to be swell-colored if each triangle contains

More information

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions

Integers & Absolute Value Properties of Addition Add Integers Subtract Integers. Add & Subtract Like Fractions Add & Subtract Unlike Fractions Unit 1: Rational Numbers & Exponents M07.A-N & M08.A-N, M08.B-E Essential Questions Standards Content Skills Vocabulary What happens when you add, subtract, multiply and divide integers? What happens when

More information

Pebble Sets in Convex Polygons

Pebble Sets in Convex Polygons 2 1 Pebble Sets in Convex Polygons Kevin Iga, Randall Maddox June 15, 2005 Abstract Lukács and András posed the problem of showing the existence of a set of n 2 points in the interior of a convex n-gon

More information

A Course in Convexity

A Course in Convexity A Course in Convexity Alexander Barvinok Graduate Studies in Mathematics Volume 54 American Mathematical Society Providence, Rhode Island Preface vii Chapter I. Convex Sets at Large 1 1. Convex Sets. Main

More information

Mathematics. Linear Programming

Mathematics. Linear Programming Mathematics Linear Programming Table of Content 1. Linear inequations. 2. Terms of Linear Programming. 3. Mathematical formulation of a linear programming problem. 4. Graphical solution of two variable

More information

Algebra 2 Semester 2 Final Exam Study Outline Semester 2 Final Exam Study Tips and Information

Algebra 2 Semester 2 Final Exam Study Outline Semester 2 Final Exam Study Tips and Information Algebra 2 Semester 2 Final Exam Study Outline 2013 Semester 2 Final Exam Study Tips and Information The final exam is CUMULATIVE and will include all concepts taught from Chapter 1 through Chapter 13.

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

4. Simplicial Complexes and Simplicial Homology

4. Simplicial Complexes and Simplicial Homology MATH41071/MATH61071 Algebraic topology Autumn Semester 2017 2018 4. Simplicial Complexes and Simplicial Homology Geometric simplicial complexes 4.1 Definition. A finite subset { v 0, v 1,..., v r } R n

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information

in this web service Cambridge University Press

in this web service Cambridge University Press 978-0-51-85748- - Switching and Finite Automata Theory, Third Edition Part 1 Preliminaries 978-0-51-85748- - Switching and Finite Automata Theory, Third Edition CHAPTER 1 Number systems and codes This

More information

Mathematics Scope & Sequence Algebra I

Mathematics Scope & Sequence Algebra I Mathematics Scope & Sequence 2016-17 Algebra I Revised: June 20, 2016 First Grading Period (24 ) Readiness Standard(s) Solving Equations and Inequalities A.5A solve linear equations in one variable, including

More information

arxiv: v1 [math.co] 15 Dec 2009

arxiv: v1 [math.co] 15 Dec 2009 ANOTHER PROOF OF THE FACT THAT POLYHEDRAL CONES ARE FINITELY GENERATED arxiv:092.2927v [math.co] 5 Dec 2009 VOLKER KAIBEL Abstract. In this note, we work out a simple inductive proof showing that every

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

LECTURE 1 Basic definitions, the intersection poset and the characteristic polynomial

LECTURE 1 Basic definitions, the intersection poset and the characteristic polynomial R. STANLEY, HYPERPLANE ARRANGEMENTS LECTURE Basic definitions, the intersection poset and the characteristic polynomial.. Basic definitions The following notation is used throughout for certain sets of

More information

HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS

HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS An Undergraduate Research Scholars Thesis by DENISE IRVIN Submitted to the Undergraduate Research Scholars program at Texas

More information

The strong chromatic number of a graph

The strong chromatic number of a graph The strong chromatic number of a graph Noga Alon Abstract It is shown that there is an absolute constant c with the following property: For any two graphs G 1 = (V, E 1 ) and G 2 = (V, E 2 ) on the same

More information

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity Marc Uetz University of Twente m.uetz@utwente.nl Lecture 5: sheet 1 / 26 Marc Uetz Discrete Optimization Outline 1 Min-Cost Flows

More information

Framework for Design of Dynamic Programming Algorithms

Framework for Design of Dynamic Programming Algorithms CSE 441T/541T Advanced Algorithms September 22, 2010 Framework for Design of Dynamic Programming Algorithms Dynamic programming algorithms for combinatorial optimization generalize the strategy we studied

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Kavish Gandhi April 4, 2015 Abstract A geodesic in the hypercube is the shortest possible path between two vertices. Leader and Long

More information

Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations

Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations 2326 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 59, NO 10, OCTOBER 2012 Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations Romuald

More information

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs Graphs and Network Flows IE411 Lecture 21 Dr. Ted Ralphs IE411 Lecture 21 1 Combinatorial Optimization and Network Flows In general, most combinatorial optimization and integer programming problems are

More information

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction Rakhi S 1, PremanandaB.S 2, Mihir Narayan Mohanty 3 1 Atria Institute of Technology, 2 East Point College of Engineering &Technology,

More information

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PAUL BALISTER Abstract It has been shown [Balister, 2001] that if n is odd and m 1,, m t are integers with m i 3 and t i=1 m i = E(K n) then K n can be decomposed

More information

Monotone Paths in Geometric Triangulations

Monotone Paths in Geometric Triangulations Monotone Paths in Geometric Triangulations Adrian Dumitrescu Ritankar Mandal Csaba D. Tóth November 19, 2017 Abstract (I) We prove that the (maximum) number of monotone paths in a geometric triangulation

More information

Stable sets, corner polyhedra and the Chvátal closure

Stable sets, corner polyhedra and the Chvátal closure Stable sets, corner polyhedra and the Chvátal closure Manoel Campêlo Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Brazil, mcampelo@lia.ufc.br. Gérard Cornuéjols Tepper

More information

Communication-Minimal Tiling of Uniform Dependence Loops

Communication-Minimal Tiling of Uniform Dependence Loops Communication-Minimal Tiling of Uniform Dependence Loops Jingling Xue Department of Mathematics, Statistics and Computing Science University of New England, Armidale 2351, Australia Abstract. Tiling is

More information