Optimizing Sparse Matrix Operations on GPUs using Merge Path

Size: px
Start display at page:

Download "Optimizing Sparse Matrix Operations on GPUs using Merge Path"

Transcription

1 21 IEEE 29th International Parallel and Distributed Proessing Symposium Optimizing Sparse Matrix Operations on GPUs using Merge Path Steven Dalton, Luke Olson Department of Computer Siene University of Illinois at Urbana-Champaign Urbana, IL, {dalton6, Sean Baxter, Duane Merrill, Mihael Garland Nvidia Researh Santa Clara, CA, 9 {sbaxter, dmerrill, mgarland}@nvidia.om Abstrat Irregular omputations on large workloads are a neessity in many areas of omputational siene. Mapping these omputations to modern parallel arhitetures, suh as GPUs, is partiularly hallenging beause the performane often depends ritially on the hoie of data-struture and algorithm. In this paper, we develop a parallel proessing sheme, based on Merge Path partitioning, to ompute segmented row-wise operations on sparse matries that exposes parallelism at the granularity of individual nonzeros entries. Our deomposition ahieves ompetitive performane aross many diverse problems while maintaining preditable behavior dependent only on the omputational work and ameliorates the impat of irregularity. We evaluate the performane of three sparse kernels: SpMV, SpAdd and SpGEMM. We show that our proessing sheme for eah kernel yields omparable performane to other shemes in many ases and our performane is highly orrelated, nearly 1, to the omputational work irrespetive of the underlying struture of the matries. Keywords-parallel; gpu; sparse matrix-matrix; sorting; I. INTRODUCTION Operations that are naturally deomposed into a fixed number of equally sized omponents are inherently amenable to parallel proessing. However, proessing operations in whih the workload is highly irregular and data-dependent is substantially more hallenging. For finegrained parallel proessing environments, suh as GPUs, the issues of effetively deomposing and mapping irregular work to omputational units is amplified due to the hierarhial struture of the threads in the arhiteture and the relatively small working set size of eah individual thread. In this work we study the performane of three operations on sparse matries, sparse matrix-vetor multipliation (SpMV), sparse matrix-sparse matrix addition (SpAdd), and sparse matrix-sparse matrix multipliation (SpGEMM) and propose several strategies to ahieve preditable performane, meaning the proessing time is highly orrelated with the total work and independent of the underlying struture of the input matries. The primary hallenge of omputations involving sparse matries originates from workload imbalanes indued by data segmentation. To address irregularity many methods have been introdued that struture memory aesses to improve the performane of sparse matrix operations. Arguably the most studied and optimized, in terms of datastrutures and algorithms, is SpMV. Peak SpMV performane is ahieved by oupling sparsity pattern analysis with speialized, and in some ases exoti, storage shemes tuned for a partiular lass of matries. The drawbaks of this approah are the need to perform analysis of the input matrix and the use of storage formats that are not amenable for use in other operations. For a partiular set of matries a speialized algorithm-storage pair may yield optimal performane on a given arhiteture but a highly tuned pair may be invalid or exhibit unpreditable behavior when applied to general matries. We onsider these approahes as sparsity or segmentation aware sine they inorporate matrix struture into the implementation. In this work we avoid the use of segmentation aware algorithms or speialized data strutures. Our approah plaes load-balaning first and performs segmented rowwise operations progressively by introduing intermediate work. To ahieve this we study balaned deompositions of eah sparse matrix workload and perform proessing at the granularity of nonzero entries during all phases. In the ase of SpAdd and SpGEMM our proessing sheme builds on highly regular merge-based sorting routines to partition work into smaller units of roughly equal size for parallel proessing. In ontrast with traditional metris of interest, suh as giga-floating point operations per seond (GFLOPs/s), our method is not designed to ahieve the best absolute performane ompared to other segmentation aware algorithms. Indeed we show the performane of our approah may be notably lower than other methods in some ases. However, we also show that our method is easily preditable for a wide diversity of inputs. The ontributions of this work are as follows: Balaned deompositions of three sparse matrix kernels: SpMV, SpAdd, and SpGEMM. Extension of merge-path partitioning to perform set unions /1 $ IEEE DOI 1.119/IPDPS

2 II. BACKGROUND Modern GPUs are representative of a lass of proessors that seek to maximize the performane of massively data parallel workloads by employing hundreds of hardwaresheduled threads organized into tens of proessor ores. Suh arhitetures are onsidered throughput-oriented in ontrast to traditional, lateny-oriented, CPU ores whih seek to minimize the time to omplete individual tasks [1]. GPUs organize proessors into a hierarhy of omputational bloks starting from a fixed size group of threads, known as warp, exeuting in a SIMD fashion. Warps are aggregated into large bloks sharing a fast loal memory buffer, shared memory, exeuting within a thread blok or onurrent thread array (CTA). CTAs are further grouped into a larger set, known as a grid, exeuting a single program. This hierarhial exeution model exhibits the highest utilization when omputations are regularly-strutured and evenly distributed among CTAs within a grid. Sorting is a important lass of omputational algorithms and effiient GPU implementations of radix and merge sort have been presented [2], [3]. Though radix sorting an array of N elements, with an average key-length k, may ahieve lower omplexity bounds, O(kN), than omparison based methods, O(N logn), it has limited appliability and fails to exploit approximate sorted-ness of the input sequene. Reently merge path has been introdued as a effiient means of improving the performane of merge based sorting algorithms [4], []. Merge path, illustrated in Figure 1a, operates by partitioning the inputs, A and B, among parallel threads in a way that provides: 1) Equal amounts of work for all threads. 2) Minimal ommuniation and synhronization between parallel threads. SpMV operations are at the ore of many sparse iterative solvers and as suh have been the fous of a large body of researh to optimize performane [6], [7]. Bell et al. give a omprehensive overview of SpMV onsiderations on the GPU and propose a novel hybrid format to balane the tradeoff of performane and generality [8]. Subsequent researh on GPU SpMV have onsidered sparsity aware adaptive proessing and various algorithmi enhanements to redue memory traffi [9] [11]. Sequential SpGEMM algorithms [12], rely on a large amount, O(n), of temporary storage to effiiently store and redue unique entries on any row of the output matrix C R m n. Although tehniques have been developed to expand the salability and generality of suh parallel deompositions [13], the level of parallelism remains too oarse-grained for GPU aeleration. One novel GPU SpGEMM algorithm deomposes the operation into expansion, sorting, and ompression (ESC) [14]. The ESC algorithm was improved by proessing C row-wise aording to the distribution of work, whih is a measure of the number of produts, omputed during analysis of the input matries [1]. Though highly effiient for sparse-matrix pairs generating intermediate produts readily proessed in shared memory their approah did not impat the performane of long intermediate rows proessed in global memory. To address long intermediate rows Liu et al. presented a mergebased proessing sheme that distributed entries on long intermediate rows evenly for parallel proessing in shared memory whih ahieved notable performane improvement for ertain matries [16]. In this work we explore segmentation oblivious methods to proess general redutions on sparse matries in order to redue the impat of irregularity while maintaining performane. For ompliated deompositions involving sparse matrix pairs, SpAdd and SpGEMM, we leverage the merge path strategy to exploit the partial ordering of the input and perfetly balane proessing of the intermediate work aross CTAs. III. SPARSE MATRIX OPERATIONS Given two matries A R m p, B R p n, and a vetor x R p 1 then SpMV, SpAdd, and SpGEMM ompute Ax, A + B, A B, respetively. In ontrast with general (dense) matrix-matrix operations, A and B are assumed to be sparse. Typial storage formats inlude oordinate (COO), where eah nonzero is represented as a (row, olumn, value) tuple, and ompressed sparse formats that sort and ompress the COO format along either the row (CSR) or olumn (CSC) axis and provides offsets to the first entry in eah segment. We onsider the following sparse matries in our algorithmi desriptions for Setions III-A, III-B and III-C, 1 1 A = and B = 2 3 4, with tuple form given by A. SpMV (,, 1) (1, 1, 2) A = (1, 2, 3) (1, 3, 4) (2, 3, ) (3, 1, 6) (,, 1) (1, 1, 2) (1, 3, 3) and B = (2,, 4). (2, 1, ) (3, 1, 6) (3, 3, 7) For SpMV we onsider Ax = y, where A is in CSR format and y R m 1, and the total work is two times the number of nonzeros in A, denoted A. The obvious parallelization, known as salar CSR, is to assign one thread 48

3 to eah row of A and proess eah row independently. Unfortunately this work deomposition may exhibit a large amount of imbalane between threads if the number of nonzeros per rows of A vary signifiantly. One solution is to vetorize the row-wise proessing sheme and enlist a fixed number of threads within a warp or CTA to proess eah row of the A. This sheme is advantageous when: the average number of entries per row does not vary signifiantly, eah row ontains a relatively large number of entries relative to the number of threads, and there are enough rows to fully saturate the devie. However, it may be impossible to globally selet a stati number of threads appropriate for every row of A. We avoid possible irregularities assoiated of row-wise proessing by assigning a fixed number of nonzeros, N prods, per CTA. To implement this strategy we deompose the SpMV kernel into three phases: partition, redution, and update. Although a flat nonzero based deomposition is perfetly balaned row-wise segmentation is required to enat the redution orretly. Proessing the matries in COO format is one alternative but requires the additional storage and movement of one row entry per nonzero. We address this issue, for sparse matries in CSR format, during the partitioning phase whih omputes the row-wise limits of the entries proessed per CTA. For A entries we proess the produts using m = A /N prods CTAs. During the partitioning phase we enlist m threads to ooperatively perform one binary searh per CTA segment to loate the last offset in the row offsets array proessed by eah CTA and storing these offsets in an auxiliary global memory buffer, S. After partitioning the rows the redution phase launhes m CTAs to redue produts within eah segment of A. A given CTA, i, proesses entries in the range [i N prods, min( A, (i +1) N prods )] and row offsets orresponding to S[i, i +1] are loaded into shared memory to generate the expanded row indies orresponding to eah nonzero. To ompute the redution we first load the olumn indies and values of A, in strided order to redue the penalty of global memory operations, into register. Then we dereferene entries from x aording to the olumn indies and ompute the saled produts. The produts are transposed from strided to bloked order using shared memory and a CTA wide segmented san is performed. On enountering disontinuities in the row indies partial sums are stored to y and the CTA remainder, orresponding to the last row, is stored to a global memory buffer, r. Lastly, during the update phase we aumulate the inter- CTA updates by performing a segmented san on r and augmenting the first redution for eah CTA with the arry out value from the previous CTA. This approah exposes parallelism without regard for segment geometry allowing it to sale over a wide range of sparsity patterns. We statially tune the number of entries per thread empirially Algorithm 1: Tuple Ordering parameters: T1=(row 1,ol 1 ), T2=(row 2,ol 2 ) return: MIN(T1,T2) if row 1 < row 2 return T1 if row 2 < row 1 return T2 if ol 1 <= ol 2 return T1 return T2 to maximize throughput. B. Addition For SpAdd we onsider the total work to form C as the sum of the number of nonzero entries in A and B, A + B. Two approahes for omputing SpAdd are rowwise segmented redutions and global sorting. In the roworiented sheme eah row of C is proessed by a thread group. Though the deomposition of work is simple and proessing any output row i requires O( A rowi + B rowi ) work, it is hallenging to hoose the number of ooperative threads per row. Conversely, with respet to the global sorting sheme the total work to generate C is onsidered olletively as a single monolithi unit by ombining nonzero entries from both A and B into a intermediate matrix ˆT. By lexiographially sorting ˆT by (row, olumn) tuples all dupliate entries are adjaent and performing a simple tuple-wise redution generates C. Though not suseptible to penalties assoiated with row-wise irregularities of A or B the omplexity of globally sorting ˆT is O(k( A + B ) whih is k times more expensive than the aumulated ost of the row-wise approah, O( A + B ). Ideally a method to onstrut C should mitigate the impat of load imbalane while attaining near optimal work omplexity. To ahieve this requires a deomposition of work omposed of non-overlapping ranges from A and B suh that eah thread proesses a unique set of entries in C, thus orresponding to O( A + B ) work omplexity. We observe that when both input matries are sorted by row and row-wise internally sorted by olumn then sparse matrix addition may be formulated as a more general operation requiring the union of sets. Considering the natural omparison of tuples outlined in Algorithm 1 and the ordering of the input matries then parallel merging using merge path appears to be a natural fit for SpAdd. We forgo a detailed desription of the merge path algorithm and provide only a short introdution but refer to Green et al. for additional information []. Merge path deomposes the work of merging two sorted list by performing a binary searh along the diagonals formed by orienting the lists on the x and y axes as shown in Figure 1a. The ordering 49

4 a b e A a b d f B t t 1 t 2 t 3 d e (a) Merge path a b 1 2 e A a b t 4 d d e f f 6 B (b) Balaned path Figure 1: Comparison of deision paths for merge path and balaned path given two sorted sets A and B having six elements eah. The paths, and thus the outputs, are evenly partitioned among four threads. For balaned path, the partition boundary between thread t and t1 is starred so that zipped pairs are never split aross threads. t 1 t 2 of entries along diagonals presribe a partition of both lists into non-overlapping regions of uniform size that an be proessed in parallel by individual threads. Though SpAdd appears merge-like, parallel merge path partitioning is inadequate. Eah onseutive range of mathing (row, olumn) tuples must be available to the same thread and therefore always appear on the same side of any diagonal. We therefore introdue balaned path partitioning, illustrated in Figure 1b, to extend the merge path deomposition to partition the inputs aording to this additional onstraint. We desribe the balaned path methodology by onsidering two sorted lists, A and B, onsisting of dupliate keys in plae of sparse matrix tuples. The normal merge path 1 t f Inputs proessed/s (1 6 ) Number of inputs keys-32 keys-64 pairs-32 pairs-64 Figure 2: Performane of union operation on sorted sets. Entries per input array are divided evenly. implementation onsumes all dupliate keys in A before mathing entries in B therefore onstruting a merged result with dupliate entries. In ontrast balaned path partitioning assigns a rank to dupliate keys within eah dupliation range of the input sequenes. This enables the assignment of keys with mathing ranks to the same partition for serial merging within individual threads therefore ensuring the key-rank pairs emitted by eah thread during onstrution of the result is a unique non-dupliated member of the output list. In Figure 1b the stair-step pattern of balaned path snaking its way through keys for two sorted lists highlights the unique differenes between the deompositions where we see a starred diagonal indiating a translation of thread t1 s partition to inlude a mathing key from B. As in merge path the balaned path diagonals begin at fixed intervals. However, the intervals may shift to onsume one additional or fewer elements than the presribed number proessed per thread to ensure both elements of a mathed key-rank pair are found on the same side of a diagonal partition. Diagonals, mapping to entries (i, diag i) in A and B, that need to be extended to inlude an additional element from B, orresponding to a mathed pair, are starred. The starred balaned path diagonal now maps to mathed entries (i, diag i + diag ) in A and B therefore stealing one entry from the right partition and depositing it into the left partition. Note that performing this key-rank deomposition allows the implementation of not only unions but many other set operations, suh as intersetion, differene and symmetri differene [4]. For SpAdd we fous on the union of sets therefore the output list is omprised of only zeroth ranked entries from A and B. In Figure 2 we illustrate the performane of our set oriented union operation on various array sizes and data types. Using our balaned path approah we perfetly deompose the work required to identify and 41

5 redue dupliates. With respet to SpAdd we redue the global memory overhead by omputing the union of the matries in two phases. During the first phase the unique tuples in the union of A and B are ounted and spae for C is alloated. In the seond balaned path invoation the entries from eah matrix are loaded into shared memory and dupliate values are redued within the CTA. Note that although for wellformed sparse matries A and B there are at most two dupliates that must be redued to form eah unique entry of C our implementation is general enough to proess a muh wider range of input sets onsisting of an arbitrary number of dupliates. C. Multipliation SpGEMM is substantially more hallenging than SpMV and SpAdd beause of the added irregularity of the workload to dynamially expand, during formation of the produts, and ontrat, during redution of dupliates. If we onsider the row-wise produts expanded for our simple example matries we see the following intermediate representation: (,, 1) (,, 1) (1, 3, 6) (1,, 12) (1, 1, 4) (1, 1, 4) (,, 1) (1, 1, 1) (1, 1, 1) (1,, 12) (1,, 12) (1, 1, 24) (1, 1, 43) Ĉ = (1, 3, 28) = (1, 3, 6) = (1, 3, 34) (1, 1, 24) (1, 3, 28) (2, 1, 3), (2, 3, 3) (2, 3, 3) (2, 1, 3) (3, 3, 18) (3, 1, 12) (2, 1, 3) (2, 3, 3) (3, 1, 12) (3, 3, 18) (3, 1, 12) (3, 3, 18) whih redues to the sparse matrix 1 C = A B = This depition of SpGEMM deomposes the FLOPs into two phases, expansion and ontration. We onsider the total work as a funtion of the number of produts, N prods,inthe expansion phase sine it is a measure of the total number of global memory operations required to feth data from the input matries and it also desribes the number of insertions/redutions required to form C. By far the identifiation and redution of dupliates during the ontration phase is the most hallenging phase of the omputation and forms the primary performane limiting routine in many SpGEMM implementations. To ombat the irregularity of ontrating the intermediate matrix we propose a balaned approah based on merge path partitioning of the total number produts and split Clok yles (1 4 ) P-Pairs 1P-Pairs 1P-Keys 1P(28-bits) 1P(24-bits) Sorting method 1P(2-bits) 1P(16-bits) 1P(12-bits) Figure 4: Clok yles per CTA radix sorting operations for two-pass (2P) key-value pairs (red), one-pass (1P) keyvalue pairs, and one-pass keys-only routines as the number of sorting bits are dereased (blue). the proessing into separate but performane stable sorting phases, the first phase performed in shared memory and a seond phase performed in global memory. To partition the produts we first speify the number of produts that may be proessed per CTA, N CTA, then we ompute the segmented prefix-sum, S, of the size of eah row of B, B rowi, referened by the olumn indies of A. Following Nprods this preproessing phase N CTA CTAs are launhed to independently expand and proess N CTA produts. Eah CTA loates the range of produts it is assigned by performing a binary searh on S to loate the first orresponding entry nonzero entry from A greater than or equal to its CTA number times N CTA. By deomposing A at the granularity of produts our approah balanes the work perfetly aross CTAs irrespetive of the row-wise redutions ditated by the input matries. The first sorting phase redues dupliates within eah CTA using radix sort. Our key observation is that beause we expand entries in order aording to olumn entries of A then ontiguous entries within a CTA remain ordered by row and dupliates are in adjaent loations following a single radix sort on the olumn indies as illustrated in Figure 3, this is in ontrast to two phase sorting approahes [14]. Figure 4 illustrates the performane improvement ahieved by reduing the total number of radix-sort operations within a CTA. We benhmark the performane using the radixsort routine in the open-soure CUB programming framework [17] for 32-bit data types with 128 threads per CTA and 11 entries per thread. For key-value pairs we observe that performing a single radix-sort pass redues the number lok yles by approximately a fator of two yielding notable performane improvement. In Figure 4 we also ompare the performane of radix- 411

6 (,,χ) (,,χ) (,,χ) (1, 3,χ) (1, 3,χ) (1,,χ) (,,χ) (1,,χ) (,,χ) (,, 1) (1,,χ) (1,, 12) (,, 1) (1,,χ) (1,,χ) (1, 3,χ) (1, 3,χ) (1, 1, 19) (1,, 12) (1, 3,χ) (1, 1, 24) (1, 1, 43) (1, 3,χ) = (1, 3,χ) = (1, 3,χ) = = = (1, 3, 34) = (2, 1, 3) (2, 3,χ) (2, 1,χ) (2, 3,χ) (2, 1,χ) (2, 1,χ) (2, 1,χ) (2, 1, 3) (3, 1, 12) (3, 3,χ) (2, 1,χ) (3, 1,χ) (3, 1,χ) (3, 1,χ) (2, 3, 3) (1, 3, 34) (2, 3,χ) (3, 1, 12) (2, 3, 3) (3, 3,χ) (2, 3,χ) (2, 3,χ) (3, 3, 18) (3, 3,χ) (3, 3, 18) (3, 1,χ) (3, 3,χ) (3, 1,χ) (3, 3,χ) (a) (b) () (d) Figure 3: All of the intermediate indies are formed, (a), and partitioned into subsets of approximately equal size, (b). χ represents unformed produts orresponding to eah intermediate entry. Eah subset is sorted by olumn index independently, (), and dupliate entries are redued, (d). All the subsets are aggregated into a intermediate matrix that ontains dupliates but is not sorted by row or olumn, (e). The redued matrix is sorted, (f), the produts of the redued entries are omputed and the remaining dupliates are redued to form C, (g). (e) (f) (g) sort when the number of sorted bits are redued from 28 to 12 bits. Based on this observation we aggressively optimize our sorting implementation to minimize the proessing time by exploiting the limited range of the olumn indies and sorting the minimum number of bits required by omputing log 2 (n), the number of olumns in B. We further optimize our performane by embedding the permutation bits in the unused upper range of the olumn indies when possible. Using this approah we redue the number of shared memory operations and perform a keys-only CTA radix-sort. The permutation omputed by eah blok is then stored to global memory using 16-bit integers to minimize the memory traffi and storage overhead. The first phase onludes with eah CTA sanning the sorted entries to identify unique entries and asynhronously storing the redued set to global memory. Note that following the first phase no produts have been formed and the redued intermediate entries in global memory onsist of possible dupliates that are ompletely unordered. To ompute the unique dupliates in C we perform a two pass global radix-sort routine similar to the method outline by the ESC algorithm. The notable differene is that the global memory sorting routine omputes only the permutation that sorts the pairs and does not perform any reordering of the, urrently unformed, intermediate produts. In the third phase the redued intermediate produts are formed by performing the expansion phase a seond time. In this version eah CTA loads the permutation omputed during the first expansion and a segmented san is performed to redue permuted produts aording to the preomputed dupliate indiators. To redue the number of global memory load and store operations the redued values are not stored randomly bak to global memory. The output loation that orders the redued produts aording to global dupliation, omputed during the seond phase, is used to store entries in sorted order. The final step onsists of forming the values in C by performing a global memory redue-by-key operation on the ordered produts. A key feature of this two-level deomposition is that both the CTA-wide and global memory sorting operations are oblivious to the irregularity of the underlying input matries. By preproessing many of the dupliates in shared memory prior to the global memory pass the total work required to perform the expensive two-pass radix-sort operation is signifiantly redued in ases where eah blok yields only a small number of unique pairs, i.e., a large number of dupliate entries per CTA. IV. NUMERICAL RESULTS In this setion we evaluate the performane of our parallel proessing strategy and ompare our results with two pakages for proessing sparse matrix operations on GPUs, Cusp, open-soure, and Cusparse, losed soure. Our evaluation is performed on a set of SpMV matries taken from the University of Florida (UFL) sparse matrix olletion outlined in Table II and the onfiguration of our testing environment is desribed in Table I. All of the performane data was olleted using double preision arithmeti with the input matries resident in GPU memory. CPU i7-382 CPU 3.6GHz GPU GTX Titan.88GHz CUDA 6. GCC 4.8 -O3 Table I: System onfiguration settings. On the GPU error heking and orretion (ECC) is disabled. 412

7 Matrix rows olumns nonzeros avg/row std Dense Protein Spheres Cantilever Wind Tunnel Harbor QCD Ship Eonomis Epidemiology Aelerator Ciruit Webbase LP Table II: Unstrutured matries taken from the UFL sparse matrix olletion for performane testing. GFLOPs/s Time (ms) ρ Cusparse =.84 ρ Merge = Nonzeros (1 6 ) Merge Cusparse Figure 6: Comparison of SpMV performane to A. We use the orrelation oeffiient, ρ, as a measure of the performane preditability as a funtion of A. For our Merge sheme ρ is.97 indiating that prediting the SpMV performane of untested matries should be relatively aurate. 1 Dense Protein Spheres Cantilever Wind Harbor QCD Ship Eonomis Epidemiology Aelerator Ciruit Webbase LP Cusp Cusparse Merge Figure : Sparse matrix-vetor performane, measured in GFLOPs/s, omparison for CSR format in double preision for three implementation using Cusp, Cusparse, and Merge. A. SpMV In Figure we ompare the SpMV performane of three different implementations operating on an input matrix stored in CSR format. Although the performane for some matries is substantially higher using speialized storage formats our goal is to ompare the ahieved performane using a general and standard storage format. For eah matrix the performane reported is the average of 1 iterations. The Cusp SpMV implementation uses the vetorized CSR implementation disussed in Setion III-A. In the Merge implementation we detet the presene of empty rows in the input matries and adaptively swith between a faster method that assumes there are no empty rows and a slightly slower method that ompats the CSR row offsets prior to performing eah SpMV operation. From Figure we see that the relative performane between the three implementations varies with respet to eah matrix. We note that the performane of our Merge sheme remains ompetitive with the best ases in all tests exept the Dense matrix. For hallenging matries that exhibit a large degree of irregularity in the number of entries per row, Webbase and LP, we see that our flat deomposition ahieves markedly better performane ompared to the other implementations. To illustrate the onnetion between the required work, proportional to A for SpMV, and the performane we plot the proessing time versus A for eah matrix in Figure 6. From Figure 6 we see that the performane of the Merge approah is highly orrelated, with a orrelation oeffiient of.97, with number the number of nonzeros in eah matrix and exhibiting only small perturbations from the least-squares fit of the data points. B. SpAdd In Figure 7 we ompare SpAdd, A + A, performane for the previously mentioned pakages. The performane illustrated is the speedup with respet to the sequential implementation using CSR format on the CPU. We note that the storage format for these tests are different between the pakages. The Cusp and Merge approahes operate on the input matries stored in expanded COO format to failitate proessing at the granularity of individual nonzero entries while Cusparse operates on the matries in CSR format. The Cusp implementation uses the two-pass global sorting approah desribed in Setion III-B. In ontrast with SpMV we see that both Cusparse and Merge perform signifiantly better than Cusp on all matries in the test suite. Another notable differene is the signifiant performane improvement of Cusparse over the Merge implementation for the Dense, Protein, and Wind matries. However, for the remaining matries in the test suite the performane of the two methods are omparable with the exeption of the Webbase and LP matries. 413

8 Speedup Speedup Dense Protein Spheres Cantilever Wind Harbor QCD Ship Eonomis Epidemiology Aelerator Ciruit Webbase LP Cusp Cusparse Merge Figure 7: SpAdd performane, measured in speedup versus sequential CPU implementation, for Cusp(COO), Cusparse(CSR), and Merge(COO). 2 1 Dense Protein Spheres Cantilever Wind Harbor QCD Ship Eonomis Epidemiology Aelerator Ciruit Cusp Cusparse Merge Webbase LP Figure 9: SpGEMM performane, measured in speedup versus sequential CPU implementation, for double preision for Cusp(COO), Cusparse(CSR), and Merge(CSR). Time (ms) ρ Cusparse =.68 ρ Merge = Nonzeros (1 6 ) Merge Cusparse Figure 8: Comparison of SpAdd performane to the number of nonzeros. For the merge implementation our orrelation oeffiient of 1 implies a nearly perfet math between the work and proessing time irrespetive of the underlying struture of the input matries. In Figure 8 we again plot the performane versus the total work to understand the orrelation between the required work and the total proessing time. Not surprisingly the Merge approah is perfetly orrelated with the number of nonzeros in the input matries. This is expeted sine the approah is based on parallel deompositions that yield perfet balane irrespetive of the segmentation of the underlying data. Note that for two of the largest SpAdd instanes Cusparse ahieves speedup for one and dramati slowdown for the other ompared to the Merge implementation. C. SpGEMM In Figure 9 we ompare SpGEMM, A A, performane for all three implementations. In the speial ase of the nonsquare LP matrix we transpose the matrix and perform A A T. The performane is tabulated in terms of the average speedup for 1 iterations versus the sequential CPU implementation in CSR format. The Cusp implementation uses a global memory ESC algorithm and although the performane is inherently independent of the input matries the absolute time is high beause of global operations involving the total number of produts, speifially radix sort. For many of the matries the Cusparse and Merge approahes outperform Cusp by a signifiant margin. However, the performane of Cusparse degrades for the Eonomis, Ciruit, Webbase, and LP matries. In ontrast the Merge approah sustains performane improvement ompared to Cusp in all instanes. The weakness of sort based SpGEMM methods operating on intermediate produts is also illustrated in Figure 9 where both the Cusp and Merge approahes required more physial memory than the resoure onstrained GPU ould support. For the Dense test ase the dupliates per CTA is lose to zero therefore the global memory pass requires a signifiant amount of temporary storage to order the expanded entries. For matries with a relatively dense number of entries per row segmented proessing presents a notable advantage. Although both the Webbase and LP matries also have rows ontaining a relatively large number of entries the powerlaw distribution of the row lengths in the ase of Webbase and the small number of rows in LP, sine it is omputing A A T, ensures that on average many of the CTAs ontrat a substantially large number of intermediate entries during the first sorting pass. In Figure 1 we plot the performane of the Merge and Cusparse implementations versus the total number of produts required to form C. Note that this analysis ignores the FLOPs performed to redue dupliate entries to form C. From Figure 1 we see that the performane of the two- 414

9 Matrix Label LP Webbase Ciruit Aelerator Epidemiology Eonomis Ship QCD Harbor Wind Cantilever Spheres Protein Setup Perent Time Produt Compute Produt Redue Blok Sort Global Sort Other Figure 11: SpGEMM performane breakdown Total Time (ms) level sorting approah based on merge path deompositions is highly orrelated with the total number of produts. This regularity implies that the performane of the Merge approah is easily preditable based on simple analysis of the input matries prior to exeution of the operation. The observed regularity in the Merge sheme performane is in ontrast with Cusparse whih diverges onsiderably from the antiipated performane in two ases. We do not laim the Cusparse implementation is inherently unpreditable but that the performane appears to be unrelated to our interpretation of the required work, the number of produts. In Figure 11 we deompose the performane of the merge SpGEMM performane into several phases. During the Setup phase the number of entries on eah row of B referened by eah olumn of A is sanned to ompute the number of produts required for eah nonzero entry in A. During Blok Sort the produts are deomposed into a fixed number of entries and proessed using a single radixsort pass within eah CTA and the resulting permutation is stored to global memory. The Global Sort phase organizes the redued number of entries identified by eah CTA using two-radix passes and the Produt Compute phase ombines the loal CTA permutation with the global dupliate permutation to form the ordered redued produts in global memory. In the last phase the final produts are formed by performing a redue-by-key operation on dupliates in global memory. Additional overheads in terms of alloation and misellaneous memory operations are aggregated into Other. From Figure 11 it is lear that the two sorting passes and the omputation of the output produts onstitute the bulk of the proessing time for eah matrix. Although there are signifiant variations in the total relative time of eah operation based on the matrix it is important to onsider these variations with respet to the total proessing time, shown on the right axis. V. CONCLUSION Although the speifi tehniques employed to perform eah sparse matrix operation varied substantially the fous of our study was the performane impliations of using a flat deomposition of the work irrespetive of the underlying segmentation ditated by the input operands. Note that our method does not ahieve the best performane in all ases for any of our tests. Indeed, we expet when omparing any two algorithms proessing suh irregular omputations that subtle data-dependent performane harateristis may have a large impat on the overall proessing time. Our primary fous in this work is ahieving preditable performane aross a vast landsape of input instanes. In future work we plan to address the defiienies of sort based SpGEMM methods by adaptively introduing segmented approahes when neessary. Deteting speifi ases like the Dense matrix is relatively simple but would also require a more detailed model to aurately predit the trade-off between the number of entries proessed in the segmented and unsegmented regions of the algorithm. REFERENCES [1] M. Garland and D. B. Kirk, Understanding throughputoriented arhitetures, Commun. ACM, vol. 3, pp. 8 66, November 21. [Online]. Available: 114/ [2] A. Davidson, D. Tarjan, M. Garland, and J. D. Owens, Effiient parallel merge sort for fixed and variable length keys, in Innovative Parallel Computing, May 212, p

10 Time (ms) Time (ms) ρ Merge = Number of produts (1 6 ) (a) Merge ρ Cusparse = Number of produts (1 6 ) (b) Cusparse Figure 1: Comparison of SpGEMM performane to the number of produts. Flat two-level sorting results in performane that is highly orrelated, ρ =.98, with the work expressed as a funtion of number of produts in the intermediate matrix. [3] D. Merrill and A. Grimshaw, High performane and salable radix sorting: A ase study of implementing dynami parallelism for GPU omputing, Parallel Proessing Letters, vol. 21, no. 2, pp , 211. [Online]. Available: 21/212/S html [4] Mgpu : Design patterns for gpu omputing, github.io/moderngpu/, version 1.1. [] O. Green, R. MColl, and D. A. Bader, Gpu merge path: A gpu merging algorithm, in Proeedings of the 26th ACM International Conferene on Superomputing, ser. ICS 12. New York, NY, USA: ACM, 212, pp [Online]. Available: [6] R. W. Vudu, Automati performane tuning of sparse matrix kernels, Ph.D. dissertation, 23, aai [7] S. Williams, L. Oliker, R. Vudu, J. Shalf, K. Yelik, and J. Demmel, Optimization of sparse matrix-vetor multipliation on emerging multiore platforms, Parallel Computing, vol. 3, no. 3, pp , 29, revolutionary Tehnologies for Aeleration of Emerging Petasale Appliations. [Online]. Available: sienediret.om/siene/artile/pii/s [8] N. Bell and M. Garland, Implementing sparse matrix-vetor multipliation on throughput-oriented proessors, in SC 9: Proeedings of the Conferene on High Performane Computing Networking, Storage and Analysis. New York, NY, USA: ACM, 29, pp [9] J. W. Choi, A. Singh, and R. W. Vudu, Modeldriven autotuning of sparse matrix-vetor multiply on gpus, SIGPLAN Not., vol. 4, no., pp , Jan. 21. [Online]. Available: [1] J. L. Greathouse and M. Daga, Effiient sparse matrixvetor multipliation on gpus using the sr storage format, in Proeedings of the International Conferene for High Performane Computing, Networking, Storage and Analysis, ser. SC 14. Pisataway, NJ, USA: IEEE Press, 214, pp [Online]. Available: [11] A. Monakov, A. Lokhmotov, and A. Avetisyan, Automatially tuning sparse matrix-vetor multipliation for gpu arhitetures, in High Performane Embedded Arhitetures and Compilers, ser. Leture Notes in Computer Siene, Y. Patt, P. Foglia, E. Duesterwald, P. Faraboshi, and X. Martorell, Eds. Springer Berlin Heidelberg, 21, vol. 92, pp [Online]. Available: 1 [12] F. G. Gustavson, Two fast algorithms for sparse matries: Multipliation and permuted transposition, ACM Trans. Math. Softw., vol. 4, pp , September [Online]. Available: [13] A. Buluç and J. R. Gilbert, Parallel sparse matrixmatrix multipliation and indexing: Implementation and experiments, SIAM Journal of Sientifi Computing (SISC), vol. 34, no. 4, pp , 212. [Online]. Available: aydin/spgemm sis12.pdf [14] N. Bell, S. Dalton, and L. Olson, Exposing fine-grained parallelism in algebrai multigrid methods, SIAM Journal on Sientifi Computing, vol. 34, no. 4, pp. C123 C12, 212. [Online]. Available: / [1] S. Dalton, N. Bell, and L. Olson, Optimizing sparse matrixmatrix multipliation for the gpu, Department of Computer Siene, University of Illinois at Urbana-Champaign, Champaign, Illinois, Teh. Rep., February 213. [16] W. Liu and B. Vinter, An effiient gpu general sparse matrix-matrix multipliation for irregular data, in Parallel and Distributed Proessing Symposium, 214 IEEE 28th International, May 214, pp [17] Cub : Reusable software for uda programming, nvlabs.github.io/ub/, version

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Improved Circuit-to-CNF Transformation for SAT-based ATPG Improved Ciruit-to-CNF Transformation for SAT-based ATPG Daniel Tille 1 René Krenz-Bååth 2 Juergen Shloeffel 2 Rolf Drehsler 1 1 Institute of Computer Siene, University of Bremen, 28359 Bremen, Germany

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes Redued-Complexity Column-Layered Deoding and Implementation for LDPC Codes Zhiqiang Cui 1, Zhongfeng Wang 2, Senior Member, IEEE, and Xinmiao Zhang 3 1 Qualomm In., San Diego, CA 92121, USA 2 Broadom Corp.,

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Dynamic Algorithms Multiple Choice Test

Dynamic Algorithms Multiple Choice Test 3226 Dynami Algorithms Multiple Choie Test Sample test: only 8 questions 32 minutes (Real test has 30 questions 120 minutes) Årskort Name Eah of the following 8 questions has 4 possible answers of whih

More information

This fact makes it difficult to evaluate the cost function to be minimized

This fact makes it difficult to evaluate the cost function to be minimized RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

The AMDREL Project in Retrospective

The AMDREL Project in Retrospective The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

ICCGLU. A Fortran IV subroutine to solve large sparse general systems of linear equations. J.J. Dongarra, G.K. Leaf and M. Minkoff.

ICCGLU. A Fortran IV subroutine to solve large sparse general systems of linear equations. J.J. Dongarra, G.K. Leaf and M. Minkoff. http://www.netlib.org/linalg/ig-do 1 of 8 12/7/2009 11:14 AM ICCGLU A Fortran IV subroutine to solve large sparse general systems of linear equations. J.J. Dongarra, G.K. Leaf and M. Minkoff July, 1982

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

New Fuzzy Object Segmentation Algorithm for Video Sequences *

New Fuzzy Object Segmentation Algorithm for Video Sequences * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 521-537 (2008) New Fuzzy Obet Segmentation Algorithm for Video Sequenes * KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO Department

More information

Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters

Parallelization and Performance of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters Parallelization and Performane of 3D Ultrasound Imaging Beamforming Algorithms on Modern Clusters F. Zhang, A. Bilas, A. Dhanantwari, K.N. Plataniotis, R. Abiprojo, and S. Stergiopoulos Dept. of Eletrial

More information

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core Announements Your fous should be on the lass projet now Leture 17: Cahing Issues for Multi-ore Proessors This week: status update and meeting A short presentation on: projet desription (problem, importane,

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework EECS 33 There be Dragons here http://ziyang.ees.northwestern.edu/ees33/ Teaher: Offie: Email: Phone: L477 Teh dikrp@northwestern.edu 847 467 2298 Today s material might at first appear diffiult Perhaps

More information

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary *

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary * DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Eunheol Kim, Gwan Choi, Mark Yeary * Dept. of Eletrial Engineering, Texas A&M University, College Station, TX-77840

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

Alleviating DFT cost using testability driven HLS

Alleviating DFT cost using testability driven HLS Alleviating DFT ost using testability driven HLS M.L.Flottes, R.Pires, B.Rouzeyre Laboratoire d Informatique, de Robotique et de Miroéletronique de Montpellier, U.M. CNRS 5506 6 rue Ada, 34392 Montpellier

More information

Parallel Block-Layered Nonbinary QC-LDPC Decoding on GPU

Parallel Block-Layered Nonbinary QC-LDPC Decoding on GPU Parallel Blok-Layered Nonbinary QC-LDPC Deoding on GPU Huyen Thi Pham, Sabooh Ajaz and Hanho Lee Department of Information and Communiation Engineering, Inha University, Inheon, 42-751, Korea Abstrat This

More information

Design of High Speed Mac Unit

Design of High Speed Mac Unit Design of High Speed Ma Unit 1 Harish Babu N, 2 Rajeev Pankaj N 1 PG Student, 2 Assistant professor Shools of Eletronis Engineering, VIT University, Vellore -632014, TamilNadu, India. 1 harishharsha72@gmail.om,

More information

Cluster-based Cooperative Communication with Network Coding in Wireless Networks

Cluster-based Cooperative Communication with Network Coding in Wireless Networks Cluster-based Cooperative Communiation with Network Coding in Wireless Networks Zygmunt J. Haas Shool of Eletrial and Computer Engineering Cornell University Ithaa, NY 4850, U.S.A. Email: haas@ee.ornell.edu

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors Eurographis Symposium on Geometry Proessing (003) L. Kobbelt, P. Shröder, H. Hoppe (Editors) Rotation Invariant Spherial Harmoni Representation of 3D Shape Desriptors Mihael Kazhdan, Thomas Funkhouser,

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Sparse Certificates for 2-Connectivity in Directed Graphs

Sparse Certificates for 2-Connectivity in Directed Graphs Sparse Certifiates for 2-Connetivity in Direted Graphs Loukas Georgiadis Giuseppe F. Italiano Aikaterini Karanasiou Charis Papadopoulos Nikos Parotsidis Abstrat Motivated by the emergene of large-sale

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction University of Wollongong Researh Online Faulty of Informatis - apers (Arhive) Faulty of Engineering and Information Sienes 7 Time delay estimation of reverberant meeting speeh: on the use of multihannel

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

C 2 C 3 C 1 M S. f e. e f (3,0) (0,1) (2,0) (-1,1) (1,0) (-1,0) (1,-1) (0,-1) (-2,0) (-3,0) (0,-2)

C 2 C 3 C 1 M S. f e. e f (3,0) (0,1) (2,0) (-1,1) (1,0) (-1,0) (1,-1) (0,-1) (-2,0) (-3,0) (0,-2) SPECIAL ISSUE OF IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION: MULTI-ROBOT SSTEMS, 00 Distributed reonfiguration of hexagonal metamorphi robots Jennifer E. Walter, Jennifer L. Welh, and Nany M. Amato Abstrat

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems Andreas Brokalakis Synelixis Solutions Ltd, Greee brokalakis@synelixis.om Nikolaos Tampouratzis Teleommuniation

More information

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization Zippy - A oarse-grained reonfigurable array with support for hardware virtualization Christian Plessl Computer Engineering and Networks Lab ETH Zürih, Switzerland plessl@tik.ee.ethz.h Maro Platzner Department

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Coprocessors, multi-scale modeling, fluid models and global warming. Chris Hill, MIT

Coprocessors, multi-scale modeling, fluid models and global warming. Chris Hill, MIT Coproessors, multi-sale modeling, luid models and global warming. Chris Hill, MIT Outline Some motivation or high-resolution modeling o Earth oean system. the modeling hallenge An approah Sotware triks

More information

A Dictionary based Efficient Text Compression Technique using Replacement Strategy

A Dictionary based Efficient Text Compression Technique using Replacement Strategy A based Effiient Text Compression Tehnique using Replaement Strategy Debashis Chakraborty Assistant Professor, Department of CSE, St. Thomas College of Engineering and Tehnology, Kolkata, 700023, India

More information

Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers.

Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers. Evaluation of Benhmark Performane Estimation for Parallel Fortran Programs on Massively Parallel SIMD and MIMD Computers Thomas Fahringer Dept of Software Tehnology and Parallel Systems University of Vienna

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

COMBINATION OF INTERSECTION- AND SWEPT-BASED METHODS FOR SINGLE-MATERIAL REMAP

COMBINATION OF INTERSECTION- AND SWEPT-BASED METHODS FOR SINGLE-MATERIAL REMAP Combination of intersetion- and swept-based methods for single-material remap 11th World Congress on Computational Mehanis WCCM XI) 5th European Conferene on Computational Mehanis ECCM V) 6th European

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by

And, the (low-pass) Butterworth filter of order m is given in the frequency domain by Problem Set no.3.a) The ideal low-pass filter is given in the frequeny domain by B ideal ( f ), f f; =, f > f. () And, the (low-pass) Butterworth filter of order m is given in the frequeny domain by B

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps Stairase Join: Teah a Relational DBMS to Wath its (Axis) Steps Torsten Grust Maurie van Keulen Jens Teubner University of Konstanz Department of Computer and Information Siene P.O. Box D 88, 78457 Konstanz,

More information

8 Instruction Selection

8 Instruction Selection 8 Instrution Seletion The IR ode instrutions were designed to do exatly one operation: load/store, add, subtrat, jump, et. The mahine instrutions of a real CPU often perform several of these primitive

More information

High-level synthesis under I/O Timing and Memory constraints

High-level synthesis under I/O Timing and Memory constraints Highlevel synthesis under I/O Timing and Memory onstraints Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn, Eri Martin To ite this version: Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn,

More information

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification A New RBFNDDA-KNN Network and Its Appliation to Medial Pattern Classifiation Shing Chiang Tan 1*, Chee Peng Lim 2, Robert F. Harrison 3, R. Lee Kennedy 4 1 Faulty of Information Siene and Tehnology, Multimedia

More information

Distributed Resource Allocation Strategies for Achieving Quality of Service in Server Clusters

Distributed Resource Allocation Strategies for Achieving Quality of Service in Server Clusters Proeedings of the 45th IEEE Conferene on Deision & Control Manhester Grand Hyatt Hotel an Diego, CA, UA, Deember 13-15, 2006 Distributed Resoure Alloation trategies for Ahieving Quality of ervie in erver

More information

Chemical, Biological and Radiological Hazard Assessment: A New Model of a Plume in a Complex Urban Environment

Chemical, Biological and Radiological Hazard Assessment: A New Model of a Plume in a Complex Urban Environment hemial, Biologial and Radiologial Haard Assessment: A New Model of a Plume in a omplex Urban Environment Skvortsov, A.T., P.D. Dawson, M.D. Roberts and R.M. Gailis HPP Division, Defene Siene and Tehnology

More information

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW FOREGROUND OBJECT EXTRACTION USING FUZZY C EANS WITH BIT-PLANE SLICING AND OPTICAL FLOW SIVAGAI., REVATHI.T, JEGANATHAN.L 3 APSG, SCSE, VIT University, Chennai, India JRF, DST, Dehi, India. 3 Professor,

More information

Batch Auditing for Multiclient Data in Multicloud Storage

Batch Auditing for Multiclient Data in Multicloud Storage Advaned Siene and Tehnology Letters, pp.67-73 http://dx.doi.org/0.4257/astl.204.50. Bath Auditing for Multilient Data in Multiloud Storage Zhihua Xia, Xinhui Wang, Xingming Sun, Yafeng Zhu, Peng Ji and

More information