Solving Planted Motif Problem on GPU

Size: px
Start display at page:

Download "Solving Planted Motif Problem on GPU"

Transcription

1 Solvng Planted Motf Problem on GPU Naga Shalaja Dasar Old Domnon Unversty Norfolk, VA, USA Ranjan Desh Old Domnon Unversty Norfolk, VA, USA Zubar M Old Domnon Unversty Norfolk, VA, USA ABSTRACT (l,d) planted motf problem s defned as: Gven a sequence of n DNA sequences, each of length L, fnd M, the set of sequences(or motfs) of length l whch have at-least one d- neghbor n each of the n sequences. Planted motf problem s an mportant and well-studed problem n computatonal bology. Motf fndng s useful for developng methods to obtan transcrpton factor bndng stes, sequence classfcaton, n developng methods for buldng phylogenetc trees etc. The planted motf problem s dffcult to solve especally for challengng nstance szes (15,5), (17,6), (19,7), and (21,8). The challengng nstances are computatonally ntensve and requre large amount of memory. Several seral mplementatons have been proposed for solvng ths problem. The tme requred by these methods for solvng large challenge nstances s prohbtvely expensve. In ths paper, we propose a parallel mplementaton on GPU that solves the challenge nstance (21,8) n 1.1 hours. We are not aware of any sequental or parallel method that wll solve ths challenge nstance n better tme. Addtonally, to the best our knowledge we are not aware of any prevous mplementaton of a parallel method to solve the planted motf problem on GPU. 1. INTRODUCTION Motf fndng s an mportant and well-studed problem n computatonal bology [18] [6]. Motf fndng s useful for developng methods to obtan transcrpton factor bndng stes, sequence classfcaton, n developng methods for buldng phylogenetc trees etc. Fndng motf s a computatonally expensve and challengng task. Many varants of motf fndng problem can be found n the lterature. One set of varants concentrates on fndng repeated patterns n a sngle sequence, and the other set concentrates on fndng patterns that appear n multple sequences. The planted motf problem (PMP) falls n the second category. An (l,d) planted motf problem can be defned as Gven a sequence of n DNA sequences, each of length L, fnd M, the set of sequences(or motfs) of length l whch have at-least one d-neghbor n each of the n sequences. A d-neghbor of an l-mer(sequence of length l) p s defned as an l-mer that s at a Hammng dstance of d or less from p. In the rest of the paper, we refer to l as enumeraton length and d as enumeraton dstance. A number of approaches have been proposed to solve the motf fndng problem ncludng PMP. Some of these approaches fnd approxmate motfs [12], [2], [14] and others fnd exact motfs[9], [16], [17], [5], [15], [11], [3], [13], [10], [8]. These approaches can be classfed nto two types: teratve approaches and combnatoral approaches. Iteratve approaches lke Gbbs samplng and expectaton maxmzaton are based on poston weght matrces whle combnatoral approaches lke MITRA, WINDOWER are based on hammng dstances. Planted motf problem defned n ths paper s based on hammng dstances. Most approaches to solve PMP are seral n nature and are dffcult to parallelze. We had recently proposed a new parallel approach to solve PMP called BtBased approach[7]. BtBased s a smple, easly parallelzable approach. It outperforms all the approaches proposed so far to solve the plantedmotf problem. Inths paper, we show howtomplement BtBased on GPU archtecture. Iteratve approaches lke Gbbs samplng [19] and MEME [4] have been mplemented on GPU whle there are no combnatoral approaches mplemented on GPU currently. BtBased s an enumeraton based approach to solvng planted motf problem. Ituses n btarrays, n n, ofsze 4 l eachto fnd the planted motfs. Each bt n the bt array corresponds to an l-mer. The key dea of BtBased s to enumerate all the l-mers n the nput sequences to fnd ther d-neghbors and set the bts correspondng to the d-neghbors n the bt arrays. It then uses the bt arrays to fnd the planted motfs. It can be notced that BtBased has hgh memory requrement. To reduce memory requrement one can use the teratve BtBased approach at the expense of ncreasng the executon tme. Iteratve approach works by vrtually parttonng the bt arrays nto chunks such that a chunk fts n the avalable memory. We then make multple passes of the orgnal algorthm to fnd motfs. The number of passes s determned by the number of vrtual parttons. A small chunk sze results n ncreased number of vrtual parttons, and thus ncreasng the overall tme to fnd motfs.

2 GPUs are becomng ncreasngly popular n the world of parallel computng. GPUs, whch were once used only for graphcs, are now beng used for dfferent types of applcatons to acheve hgh performance. Wth the advent of CUDA, the task of programmng for GPU has become much smple. A GPU s a massvely parallel, mult-threaded, manycore processor wth hundreds of cores and huge computaton power. It can execute thousands of threads concurrently. The programmer must carefully desgn her applcaton to map to GPU and effectvely utlze the hardware. In ths paper we parallelze the BtBased approach[7] for GPU. Though BtBased approach s easly parallelzable, t s challengng to effectvely mplement t on GPU. The reason beng the hgh memory requrement. We have seen that BtBased uses bt arrays to fnd planted motfs and that the bt arrays are of sze 4 l bts each. And moreover the access to the bt arrays s very scattered. For example, to solve a (15,5) nstance, BtBased needs bt arrays of sze 128MB each. Such amount of memory s only avalable on GPU s global memory. But global memory has very hgh latences especally when the access pattern s scattered. In such cases t s hghly recommended to use GPU s shared memory. But the shared memory s too small (16KB for Tesla C1060 and S1070) to accommodate the bt arrays. So we use teratve BtBased approach and partton the bt arrays nto chunks that ft n shared memory. We then optmze the approach by decreasng the regster usage whch ncreases the occupaton of the GPU. We also do reorderng of shared memory to avod bank conflcts.. We have mplemented BtBased on NVda Tesla C1060 whch has one GPU devce and NVda Tesla S1070 whch has four GPU devces. Tesla C1060 has 30 mult-processors wth 8 streamng processor cores each whle Tesla S1070 has 960 cores. We tested the (15,5), (17,7), (19,7), (21,8) challengng nstances. Tesla C1060 took 8 seconds, 1.52 mnutes, 19.7 mnutes and 4.5 hours respectvely and Tesla S1070 took 3 seconds, 23.9 seconds, 5 mnutes and 69 mnutes respectvely. These are the best tmngs obtaned for planted motf problem so far. We also compare wth the results on multcore archtecture. We found that a sngle GPU shows up to 13 to 14 tmes speed-up and 4 GPU devces shows up to 40 to 60 tmes speed-up compared to sngle core CPU. 2. THE BITBASED APPROACH BtBased approach s a smple, easly parallelzable approach to solvng PMP. It s based on exhaustve enumeraton of l- mers n the nput sequences. Let S = {S 0 n 1} be the set of n nput sequences. An l-mer n S startng at locaton j, 0 j L l s represented as S l {j}. The set of d-neghborsofall thel-mersns srepresentedbyn l,d. Its easy to see that the set of planted motfs s M = n 1 =0 Nl,d. Therefore, to fnd the planted motfs we frst need to generate the set of N l,d, 0 n 1, and then fnd the motfs,.e. l-mers that are present n all N l,d, 0 n 1. The man ssue here s the memory requrement. To see the ssue consder (15,5) nstance. For a 15-mer, there can be number of 5-neghbors. For a sequence of length 600, the sze of N l,d s ntegers whch requres approxmately 2GB of memory for a sngle sequence. To reduce the memory requrement we use bt arrays of sze 4 l. Each bt n the bt array corresponds to an l-mer. For example, when l = 4 bt 0 represents AAAA, bt 1 represents AAAC, bt 255 represents TTTT assumng A=0, C=1, G=2, T=3. For (15,5) nstance we now requre only 4 15 bts.e. 128MB of memory for each nput sequence. The memory requrement can further be reduced usng the approaches mentoned n sectons 2.1.1, and The basc BtBased approach The basc BtBased approach conssts of two phases, settng bts and fndng motfs. In settng bts phase, N l,d, 0 n 1, sgenerated. N l,d s represented usng bt arrays. A bt array B s assgned to each nput sequence S, 0 n 1. Each l-mer n sequence S s enumerated to generate all ts d-neghbors and the bts are set n the bt array B at the ndexes correspondng to the d-neghbors. The ndex correspondng to an l-mer can be obtaned by replacng A by 00, C by 01, G by 10 and T by 11. For example the ndex correspondng to the 4-mer GACT s After settng bts phase, a bt array B has a bt set only f the l-mer correspondng to ts ndex s present n N l,d. In fndng bts phase, the equvalent to M = n 1 =0 Nl,d s performed. We perform logcal AND operaton on the bt arrays to generate a sngle bt array whch can be used to obtan the planted motfs. The fnal bt array B s obtaned by B = B 0 B1... Bn 1. If a bt s set at ndex j n B only f the bt s set at ndex j n all the bt arrays B, 0 n 1. In other words, the l-mer correspondng to the ndex j s present n all N l,d, 0 n 1 makng the l-mer a planted motf. Therefore the planted motfs are nothng but the l-mers correspondng to the ndexes n B n whch a bt s set. To reduce the memory requrement further, we use two modfcatons to the basc approach: Increment motfs and flterng motfs. These modfcatons, f applcable, not only reduce the memory requrement but also mprove the performance Increment Motfs Ths modfcaton s based on the observaton that gven the set of motfs for (l 1,d) nstance ther d-neghbors and correspondng dstances n all the n sequences, we can fnd the motfs for (l,d) nstance n O(n) tme. Let p be a motf for (l 1,d) nstance. Let (j 0,j 1,...,j n 1) and (d 0,d 1,...,d n 1) be the locatons of d-neghbors n n sequences and ther dstances respectvely. We can say that p R, R {A,C,G,T} and s append operaton, has a d-neghbor n sequence S f t satsfes any of the followng condtons: 1. resdue at locaton j + l s R. 2. d < d. For each motf p for (l 1,d) nstance, we fnd f p A, p C, p G, p T s a motf for (l, d) nstance usng the above condtons. Therefore to fnd (l,d) motfs, we can frst fnd (l,d) motfs and then use the above logc ncrementally to fnd (l,d) motfs. Wth decreasng values of l, the number of (l,d) motfs ncrease exponentally and hence the tme spent n ncrement motfs. Therefore the value of l must be carefully chosen Flter Motfs Instead of settng bts and fndng motfs for all n sequences, ths modfcaton frstfndsthe motfs for n sequenceswhere

3 n n. These motfs are called canddate motfs. These canddate motfs are then fltered to fnd the fnal planted motfs. Ths s done by checkng each of the canddate motfs f t s present n all the remanng n n nput sequences. Ths modfcaton reduces the memory requrement because we now requre only n buffers nstead of n buffers. By decreasng the value of n, not only the space requrement decreases but also the tme decreases. The reason beng that the tme taken by BtBased approach s domnated by settng bts phase. By reducng n we need to set the bts for fewer sequences and hence reducng the tme taken. But f the value of n s chosen to be too low, then the tme spent n flterng motfs ncreases and so the overall tme. So t s mportant to chose an optmum value for n. 2.2 The Iteratve BtBased Approach Ths s a crucal modfcaton to the basc BtBased approach and also s the bass for mplementng BtBased on GPU. As we have seen prevously, BtBased has hgh memory requrement. It mght not always be possble to satsfy such requrement. In such cases, we can use the teratve BtBased approach. Iteratve BtBased approach solves the planted motf problem wth much less memory requrement but at the expense of ncrease n tme due to the ncrease n number of operatons. Iteratve approach works by reusng the avalable memory to accomplsh the requred task, whch s to fnd planted motfs. Let l max=max{ 4 bts of memory can be allocated}. We vrtually partton the bt array of sze 4 l nto 4 l lmax chunks, each chunk of sze 4 lmax bts. In th teraton, the l-mers of nput sequences are enumerated n such a way that the bts are only set n the th chunk. After fndng motfs n th chunk the same memory s then reused for the (+1)th teraton. Note that when bt array of sze 4 l bts s parttoned nto 4 l lmax chunks, the frst l l max resdues correspondng to the ndexes n a chunk are all the same. For example, when we partton 4 17 bts nto 16 parttons, all the 17-mers correspondng to the ndexes n the frst chunk start wth AA, second chunk starts wth AC, and so on. To effectvely enumerate the l-mers, we reduce the enumeraton length from l to l max as shown n algorthm 1. Note that the more number of chunks the bt array s parttoned nto, the less s the enumeraton length. 3. OVERVIEW OF GPU GPU s a massvely parallel, mult-threaded, manycore processor. Each GPU devce s an array of streamng multprocessor whch n turn conssts of a number of scalar processor cores. GPU s capable of runnng thousands of threads concurrently. It s able to do so by employng SIMT(snglenstructon multple-threads) archtecture. The threads are created, scheduled and executed n groups called warps. All the threads n a warp share a sngle nstructon unt. The threads n a GPU are extremely lght weght and they can be created and executed wth zero schedulng overhead. CUDA s a parallel programmng model that enables programmers to develop scalable applcatons to be executed on GPU. It exposes a set of extenson to C and C++. A CUDA program s organzed nto sequental host code whch s executed on CPU and calls to functons called kernels whch are executed on GPU. A kernel contans the devce code that s executed by the GPU threads n parallel. CUDA threads Algorthm 1 IteratveApproach Input: n, l, l max Output: M, the set of (l, d) planted motfs 1: Let l dff = l l max 2: M = 3: for dx = 0 to 4 l dff 1 do 4: get the sequence p of length l dff that corresponds to dx 5: {settng the bts n dx th chunk} 6: for = 0 to n 1 do 7: for j = 0 to L l+1 do 8: get dstance d between p and S l dff {j} 9: generate N lmax,d d {j +l dff } 10: for each l max-mer q n N lmax,d d {j +l dff } do 11: get ndex dx correspondng to q 12: set B [dx ] = 1 13: end for 14: end for 15: end for 16: 17: {fndng motfs n dx th chunk} B = B 0 B1... Bn 1 18: for = 0 to 4 lmax 1 do 19: f B[] = 1 then 20: Let r be the l max-mer correspondng to 21: Append r to p and add the appended sequence to M 22: end f 23: end for 24: clear all the bt arrays B 0 to B n 1 25: end for can be grouped nto thread blocks. Usng CUDA one can defne the number of blocks and the number of threads per block that can execute a kernel. 3.1 Memory organzaton The devce RAM s vrtually and physcally dvded nto dfferent types of memory: global, local, constant and texture memory. Apart from devce RAM the threads can also access on-chp shared memory and regsters as shown n fgure 1. Global memory and texture memory have hghest latency compared to the other types of memory. A thread has exclusve access to ts local memory. All the threads n a block can access on-chp shared memory. All the threads across all thread blocks have access to global, texture and constant memory. Constant and texture memores are read only whle global s both read and wrte. 3.2 Performance consderatons A CUDA program should be properly desgned takng advantage of the resources for better performance. Snce GPU uses a SIMT archtecture n whch all the threads n a warp use a sngle nstructon unt, the best results can be acheved when all the threads n a warp execute wthout dvergng. When threads dverge they are executed serally, thus decreasng performance. Global memory has very hgh latency. But by coalescng the global memory accesses, hgh throughput can be acheved. For example f the threads n a warp access contguous ad-

4 block enumerate the l-mers n such a way that they generate the d-neghbors only n the chunk of bt arrays assgned to the block. We use the same logc as n teratve approach. Note that the enumeraton length here s l s. Fgure 1: GPU Memory dress, then only two transactons are ssued. But f the threads access separate addresses then 32 transactons are ssued. Shared memory s dvded nto equally szed blocks called banks. If two threads n a half warp access the same bank, ths would result n bank conflct and the accesses are seralzed thus reducng the effectve bandwdth. In order to avod ths, the programmer should try to make sure that the threads n a half warp access dfferent banks. The memory latences can be hdden by executng other warps when a warp s paused. So to keep the hardware busy there should be enough actve warps. Occupancy s the rato of number of actve warps per mult-processor to the maxmum possble number of actve warps. If the occupancy s too low, then the memory latency cannot be hdden resultng n performance degradaton. So the programmer should try to ncrease the occupancy to effectvely use the hardware. 4. PARALLELIZING BITBASED ON GPU Though BtBased s a easly parallelzable approach, t s not straght-forward to mplement t on the GPU. The man ssue s that BB has hgh memory requrements. As we have seen n secton 2, t requres 4 l bts of memory for each bt array. Such hgh amount of memory s only avalable on the global memory. But global memory has a drawback of hgh latency. Furthermore, the access pattern of the bt arrays s very scattered makng t dffcult to use the coalescng feature of the global memory. So to avod usng global memory, we partton the bt arrays nto smaller chunks that ft n shared memory. Ths s smlar to the teratve approach dscussed n secton 2.2. The only dfference s that nstead of teratng, we assgn the task of each teraton to a GPU thread block. Let t be the number of threads n each block. To solve (l,d) nstance we frst fnd l and n as explaned n [7]. Let l s=max{ 4 n bts of memory can be allocated on shared memory}. The bt arrays are parttoned nto chunks of 4 ls bts of memory. Each chunk s assgned to a sngle block. Thus the number of blocks s 4 l l s. The threads n each The t threads n a block are responsble for settng bts n the chunkof bt arrays assgned to the block. The l-mers are dstrbuted among the t threads. The consecutve l-mers are assgned to consecutve threads. After all the threads have fnshed enumeratng the l-mers and settng bts, the threads enter the fnd Motfs phase. After fndng the canddate motfs, they must be fltered by checkng f they are present n the remanng n n nput sequences. We perform ths step n a separate kernel called FlterMotfs to avod dvergence of threads. So a thread, after fndng a canddate motf nstead of performng the flterng phase, t wrtes t to the global memory so that the canddate motf can be accessed n the FlterMotfs kernel. To wrte on to global memory, we use a varable called gindex. When a thread fnds a canddate motf, t frst atomcally ncrements gindex and then wrtes the canddate motf to the global memory at the ndex returned by the atomc operaton. Ths s to avod dfferent threads n dfferent blocks wrtng to the same ndex n global memory. After fndng the canddate motfs, flterng them s straght forward. Let c be the number of canddate motfs. For the FlterMotfs kernel, we need c/t blocks. The c canddate motfs are equally dstrbuted among the blocks. Wthn the block, the canddate motfs are further dstrbuted among the threads. Each thread s assgned a canddate motf and t checks f the canddate motf has d-neghbors n the remanng n n nput sequences whch were not consdered durng FndCanddateMotfs kernel. If a thread fnds that the canddate motf s a planted motf, t wrtes to the global memory usng the same logc explaned prevously. We mprove ths mplementaton by usng two modfcatons: Bt representaton and reparttonng and reorderng. 4.1 Bt Representaton As we have seen n secton 3, each multprocessor has a lmted number of regsters. Ths mplementaton s lmted by the number of regsters. Snce each thread consumes large number of regsters, the number of threads per block s less and hence the occupancy of GPU. To mprove the occupancy and performance, we need to reduce the regstry usage as much as possble. Each nput sequence of length L has L l+1 l-mers. If the nput sequence s represented usng a character array then an l-mer requres l bytes of memory. Instead we can represent an l-mer usng an nteger, 2 bts for each resdue [1] [15]. For example, the 4-mer CGGA can be represented usng an nteger whose bnary representaton s By dong so, an l-mer, l 16, would need only 4 bytes and l 32 would need 8 bytes of memory. So we convert the nput character array nto an nteger array, the nteger at ndex represents the l-mer startng at locaton n the nput sequences. By convertng nto nput array, GPU threads only need to read one nteger rather than l bytes. Ths would not only reduce the regstry usage by also reduce the I/O tme as only an nteger need to be read. We use texture bndng to read the nput sequences. 4.2 Reparttonng and reorderng

5 Fgure 2: (a) The nteger array s parttoned nto 16 chunks so that the th thread n a half warp only accesses th chunk. (b) The nteger array s reordered such that the th thread n a half warp only accesses th bank. Table 1: Comparson wth multcore (15, 5) (17, 6) (19, 7) (21, 8) GPU tme speed-upspeed-up tme speed-upspeed-up tme speed-upspeed-up tme speed-upspeed-up devces(seconds) 1 core 16 cores(seconds) 1 core 16 cores(mnutes) 1 core 16 cores(hours) 1 core 16 cores CPU CPU CPU CPU CPU CPU CPU CPU We have seen n secton 3 that the shared memory s organzed nto banks. Successve 32-bt words are assgned to successve banks. We mplement a bt array usng a 32-bt nteger array. Therefore successve ntegers are assgned to successve banks. Each thread executng the kernel enumerates l s-mers n the nput sequence and may set the bts n any of the nteger and therefore n any bank resultng n bank conflcts. In order to avod bank conflcts we repartton the nteger array and then reorder the nteger array. The nteger array, whch was once parttoned to ft n the shared memory, s reparttoned nto 16 chunks(as there are 16banksnTesla). The ththreadn ahalfwarp enumerates the l s-mers to set the bts n th chunk. We then reorder the nteger array such that the th thread n a half warp would only access the ntegers n the th bank. For example, when l s = 6, each bt array has 4 6 bts and s mplemented usng an nteger array of sze 128. We partton the nteger array nto 16 chunks each of sze 8 ntegers. Fgure 2(a) shows the parttoned bt array. The frst thread n a half warp(threads 0, 16, 32,...) only accesses the frst chunk.e. ntegers 0 to 7. Now we reorder the ntegers n the bt array such that the ntegers 0, 1,.., 7 belong to the same bank. Fgure 2(b) shows the reordered nteger array. It can be seen from the fgure that threads 0 and 16 only access the ntegers n bank 0 and threads 15, 31 only access the threads n bank 15. Therefore there wll be no bank conflcts after reorderng the nteger array. In addton to avodng the bank conflcts, reparttonng and reorderng has another advantage. Parttonng a bt array nto chunks reduces the enumeraton length. Because we partton the nteger array nto 16 chunks, the enumeraton length reduces from l s to l s 2. Note that the maxmum enumeraton dstance s equal to the enumeraton length. For example, when enumeraton length s 4, the maxmum enumeraton dstance s 4. So the maxmum enumeraton dstance also decreases by 2. Thus we only need to enumerate to generate (l s 2)-neghbors nstead of l s-neghbors. Ths would reduce the regstry consumpton of each thread and hence we can ncrease the number of threads per block. Havng more threads per block would ncrease the occupancy resultng n better performance. 5. EXPERIMENTAL RESULTS We have mplemented BtBased on Nvda Tesla C1060 and Nvda Tesla S1070 both runnng at 1.3GHz. C1060 has 30

6 multprocessors wth 8 scalar processor cores each. S1070 has four GPU devces wth 240 cores each. We have tested our code wth 20 nput sequences of length 600 each. We tested t on random sequences wth motfs planted at random postons n the 20 sequences. We have used n = 6 for all our experments. C1060 and S1070 both have a shared memory of 16KB per processor. As we have descrbed n secton 4 we need to fnd the value of l s where l s=max{ 4 n bts of memory can be allocated on shared memory}. We have found that 6 s the most sutable value for l s. Table 1 shows the performance results obtaned on 1 to 4 GPUs. We have also expermented the approach usng 1 to 120 multprocessors on Tesla S1070 wth only one actve block for each multprocessor and the load s dstrbuted equally among the multprocessors. It can be seen from Fgure 3 that the approach scales well wth the number of multprocessors. We have also collected the results usng dfferent number of GPU devces. Fgure 4 shows the speed-up of the approach wth respect to number of GPU devces. It can be seen clearly that the approach scales well wth the ncrease n number of GPU devces The BtBased approach was mplemented on a 4 quadcore 2.67 GHz Intel Xeon X5550 machne wth a total of 16 cores usng 1GB memory. The basc BtBased approach was used for (15, 5) and lower nstances and teratve BtBased approach was used for (17,6) and hgher nstances. Table 1 shows the results obtaned on the multcore machne. It shows the speed-up obtaned on GPU wth respect to 1 core CPU and 16 cores CPU. The actual results for multcore are dscussed n [7]. It can be seen that a sngle GPU devce s 13 to 14 tmes faster than a sngle core of Xeon X5550 machne. It performs better than 16 core Xeon machne. 4 GPU devces are 40 to 60 tmes faster than sngle core CPU and 4 to 6 tmes faster than 16 core CPU. 6. CONCLUSION We presented an effcent parallel approach for solvng the planted motf problem on GPU. Ths approach s modfcaton of a BtBased approach that was orgnally proposed for Intel based multcore archtectures. The BtBased approach had to be modfed for GPU archtecture. The proposed mplementaton solves the challenge nstance (21,8) of planted problem n 1.1hrs. We are not aware of any sequental or parallel method that wll solve ths challenge nstance n better tme. Addtonally, to the best our knowledge we are not aware of any prevous mplementaton of a parallel method to solve the planted motf problem on GPU. 70 speed-up number of mult-processors Fgure 3: Plot showng the speed-up of the approach wth respect to number of multprocessors. speed-up (15,5) (17,6) (19,7) number of GPU devces Fgure 4: Plot showng the speed-up of the approach wth respect to the number of GPU devces. 5.1 Comparson wth multcore 7. REFERENCES [1] S. Altschul, W. Gsh, W. Mller, E. Myers, and D. Lpman. Basc local algnment search tool. Journal of molecular bology, 215(3): , [2] J. Buhler and M. Tompa. Fndng motfs usng random projectons. Journal of Computatonal Bology, 9(2): , [3] A. M. Carvalho, A. T. Fretas, A. L. Olvera, and M.-F. Sagot. A hghly scalable algorthm for the extracton of cs-regulatory regons. In APBC, pages , [4] C. Chen, B. Schmdt, W. Lu, and W. Müller-Wttg. GPU-MEME: Usng graphcs hardware to accelerate motf fndng n DNA sequences. In PRIB, pages , [5] F. Y. L. Chn and H. C. M. Leung. Votng algorthms for dscoverng long motfs. In APBC, pages , [6] M. K. Das and H.-K. Da. A survey of DNA motf fndng algorthms. BMC Bonformatcs, 8(S-7), [7] N. S. Dasar, R. Desh, and Z. M. An effcent multcore mplementaton of planted motf problem. In Proceedngs of the Internatonal Conference On Hgh Performance Computng and Smulaton, pages 9 15, [8] J. Davla, S. Balla, and S. Rajasekaran. Space and tme effcent algorthms for planted motf search. In Internatonal Conference on Computatonal Scence (2), pages , [9] J. Davla, S. Balla, and S. Rajasekaran. Fast and practcal algorthms for planted (l, d) motf search. IEEE/ACM Transactons on Computatonal Bology and Bonformatcs, 4: , [10] E. Eskn and P. A. Pevzner. Fndng composte regulatory patterns n DNA sequences. In ISMB, pages , [11] L. Marsan and M.-F. Sagot. Extractng structured motfs usng a suffx tree - algorthms and applcaton to promoter consensus dentfcaton. In RECOMB,

7 pages , [12] P. A. Pevzner and S.-H. Sze. Combnatoral approaches to fndng subtle sgnals n DNA sequences. In ISMB, pages , [13] N. Psant, A. M. Carvalho, L. Marsan, and M.-F. Sagot. Rsotto: Fast extracton of motfs wth msmatches. In LATIN, pages , [14] A. L. Prce, S. Ramabhadran, and P. A. Pevzner. Fndng subtle motfs by branchng from sample strngs. In ECCB, pages , [15] S. Rajasekaran, S. Balla, and C.-H. Huang. Exact algorthms for planted motf problems. Journal of Computatonal Bology, 12(8): , [16] M.-F. Sagot. Spellng approxmate repeated or common motfs usng a suffx tree. In LATIN, pages , [17] M. Tompa. An exact method for fndng short motfs n sequences, wth applcaton to the rbosome bndng ste problem. In ISMB, pages , [18] M. Tompa, N. L, T. Baley, G. Church, B. De Moor, E. Eskn, A. Favorov, M. Frth, Y. Fu, W. Kent, et al. Assessng computatonal tools for the dscovery of transcrpton factor bndng stes. Nature botechnology, 23(1): , [19] L. Yu and Y. Xu. A parallel Gbbs samplng algorthm for motf fndng on GPU. Parallel and Dstrbuted Processng wth Applcatons, Internatonal Symposum on, 0: , 2009.

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

Convolutional interleaver for unequal error protection of turbo codes

Convolutional interleaver for unequal error protection of turbo codes Convolutonal nterleaver for unequal error protecton of turbo codes Sna Vaf, Tadeusz Wysock, Ian Burnett Unversty of Wollongong, SW 2522, Australa E-mal:{sv39,wysock,an_burnett}@uow.edu.au Abstract: Ths

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Storage Binding in RTL synthesis

Storage Binding in RTL synthesis Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Predicting Transcription Factor Binding Sites with an Ensemble of Hidden Markov Models

Predicting Transcription Factor Binding Sites with an Ensemble of Hidden Markov Models Vol. 3, No. 1, Fall, 2016, pp. 1-10 ISSN 2158-835X (prnt), 2158-8368 (onlne), All Rghts Reserved Predctng Transcrpton Factor Bndng Stes wth an Ensemble of Hdden Markov Models Yngle Song 1 and Albert Y.

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce Performance Study of Parallel Programmng on Cloud Computng Envronments Usng MapReduce Wen-Chung Shh, Shan-Shyong Tseng Department of Informaton Scence and Applcatons Asa Unversty Tachung, 41354, Tawan

More information

High Performance Implementation of Planted Motif Problem using Suffix trees

High Performance Implementation of Planted Motif Problem using Suffix trees High Performance Implementation of Planted Motif Problem using Suffix trees Naga Shailaja Dasari Desh Ranjan Zubair M Old Dominion University Old Dominion University Old Dominion University ndasari@cs.odu.edu

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

A fair buffer allocation scheme

A fair buffer allocation scheme A far buffer allocaton scheme Juha Henanen and Kalev Klkk Telecom Fnland P.O. Box 228, SF-330 Tampere, Fnland E-mal: juha.henanen@tele.f Abstract An approprate servce for data traffc n ATM networks requres

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Transaction-Consistent Global Checkpoints in a Distributed Database System

Transaction-Consistent Global Checkpoints in a Distributed Database System Proceedngs of the World Congress on Engneerng 2008 Vol I Transacton-Consstent Global Checkponts n a Dstrbuted Database System Jang Wu, D. Manvannan and Bhavan Thurasngham Abstract Checkpontng and rollback

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Multi-stable Perception. Necker Cube

Multi-stable Perception. Necker Cube Mult-stable Percepton Necker Cube Spnnng dancer lluson, Nobuuk Kaahara Fttng and Algnment Computer Vson Szelsk 6.1 James Has Acknowledgment: Man sldes from Derek Hoem, Lana Lazebnk, and Grauman&Lebe 2008

More information

THE low-density parity-check (LDPC) code is getting

THE low-density parity-check (LDPC) code is getting Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space low-densty party-check (LDPC) codes

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Routing on Switch Matrix Multi-FPGA Systems

Routing on Switch Matrix Multi-FPGA Systems Routng on Swtch Matrx Mult-FPGA Systems Abdel Enou and N. Ranganathan Center for Mcroelectroncs Research Department of Computer Scence and Engneerng Unversty of South Florda Tampa, FL 33620 Abstract In

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Area Efficient Self Timed Adders For Low Power Applications in VLSI ISSN(Onlne): 2319-8753 ISSN (Prnt) :2347-6710 Internatonal Journal of Innovatve Research n Scence, Engneerng and Technology (An ISO 3297: 2007 Certfed Organzaton) Area Effcent Self Tmed Adders For Low

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Newton-Raphson division module via truncated multipliers

Newton-Raphson division module via truncated multipliers Newton-Raphson dvson module va truncated multplers Alexandar Tzakov Department of Electrcal and Computer Engneerng Illnos Insttute of Technology Chcago,IL 60616, USA Abstract Reducton n area and power

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Article RGCA: a Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

Article RGCA: a Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization Artcle RGCA: a Relable GPU Cluster Archtecture for Large-Scale Internet of Thngs Computng Based on Effectve Performance-Energy Optmzaton Yulng Fang, Qngku Chen *, Neal N. Xong, Deyu Zhao and Jngjuan Wang

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

A Load-balancing and Energy-aware Clustering Algorithm in Wireless Ad-hoc Networks

A Load-balancing and Energy-aware Clustering Algorithm in Wireless Ad-hoc Networks A Load-balancng and Energy-aware Clusterng Algorthm n Wreless Ad-hoc Networks Wang Jn, Shu Le, Jnsung Cho, Young-Koo Lee, Sungyoung Lee, Yonl Zhong Department of Computer Engneerng Kyung Hee Unversty,

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Computer models of motion: Iterative calculations

Computer models of motion: Iterative calculations Computer models o moton: Iteratve calculatons OBJECTIVES In ths actvty you wll learn how to: Create 3D box objects Update the poston o an object teratvely (repeatedly) to anmate ts moton Update the momentum

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

RADIX-10 PARALLEL DECIMAL MULTIPLIER

RADIX-10 PARALLEL DECIMAL MULTIPLIER RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com

More information

Space-Optimal, Wait-Free Real-Time Synchronization

Space-Optimal, Wait-Free Real-Time Synchronization 1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,

More information