Microprocessors and Microsystems

Size: px
Start display at page:

Download "Microprocessors and Microsystems"

Transcription

1 Mroproessors and Mrosystems 36 (2012) Contents lsts avalable at SeneDret Mroproessors and Mrosystems journal homepage: Hardware aelerator arhteture for smultaneous short-read DNA sequenes algnment wth enhaned traebak phase Nuno Sebastão, Nuno Roma, Paulo Flores INESC-ID/IST, Rua Alves Redol, 9, Lsboa, Portugal artle nfo abstrat Artle hstory: Avalable onlne 30 May 2011 Keywords: Hardware aelerator DNA Loal sequene algnment Traebak Dynam programmng algorthms are wdely used to fnd the optmal sequene algnment between any two DNA sequenes. Ths manusrpt presents a new, flexble and salable hardware aelerator arhteture to speedup the mplementaton of the frequently used Smth Waterman algorthm. When ntegrated wth a general purpose proessor, the developed aelerator sgnfantly redues the omputaton tme and memory spae requrements of algnment tasks. Suh effeny manly omes from two nnovatve tehnques that are proposed. Frst, the usage of the maxmum sore ell oordnates, gathered durng the omputaton of the algnment sores n the matrx-fll phase, n order to sgnfantly redue the tme and memory requrements of the traebak phase. Seond, the explotaton of an addtonal level of parallelsm n order to smultaneously algn several query sequenes wth the same referene sequene, targetng the proessng of short-read DNA sequenes. The results obtaned from the mplementaton of a omplete algnment system based on the new aelerator arhteture n a Vrtex-4 FPGA showed that the proposed tehnques are feasble and the developed aelerator s able to provde speedups as hgh as 16 for the onsdered test sequenes. Moreover, t was also shown that the proposed approah allows the proessng of larger DNA sequenes n memory restrted envronments. Ó 2011 Elsever B.V. All rghts reserved. 1. Introduton The advent of the latest generatons of sequenng tehnologes [1] has opened many new researh opportuntes n the felds of bology and medne, nludng ell Deoxyrbonule Ad (DNA) sequenng, gene dsovery and evolutonary relatonshps. These tehnologes have ontrbuted to the exponental growth of bologal data that s avalable for researhers. For nstane, the GenBank [2] has doubled ts data sze approxmately every 18 months and n ts Deember 2010 release t nluded over base pars (bps) from several dfferent spees. To assst the bologsts n the extraton of useful nformaton and n the nterpretaton of the huge szed sequene databases, a set of algnment algorthms (e.g. the wdely used Smth Waterman (S W) [3]) have been developed to solve many open problems n the feld of bonformats, suh as () DNA re-sequenng, where genome assembly s done aganst a referene genome; () Multple Sequene Algnment (MSA), where multple genomes are algned to perform genome annotaton; and () Gene fndng, where Rbonule Ad (RNA) sequenes (the transrptome) are algned aganst the organsm genome to dentfy new genes. Correspondng author. E-mal address: Nuno.Sebastao@nes-d.pt (N. Sebastão). Currently, a ommon sequenng approah s based on the applaton of Hgh Throughput Short Read (HTSR) tehnologes [4], to redue the ost of the sequenng proess. Ths tehnque onssts of uttng the DNA fragments under analyss nto shorter fragments (reads), whh are ndvdually sequened and algned aganst a referene sequene. At present, the three most mportant HTSR sequenng platforms are: the GS FLX Genome Analyzer (454), the Solexa 1G Sequener (lllumna) and the SOLD Sequener (Appled Bosystems). The bohemstry tehnology underlyng eah of these platforms leads to very dfferent haratersts, n terms of reads length, throughput and raw errors. However, ndependently of the adopted platform, the length of the reads produed by these platforms s small when ompared to prevous generaton sequenng tehnologes and muh smaller than the orgnal omplete DNA sequene. Nevertheless, the sheer volume of data that s generated and the need to algn these reads to large referene genomes lmts a dret and nave applaton of standard Dynam Programmng (DP) tehnques. One smple example of a ommon hallenge omes from the need to algn up to 100 mllon reads aganst a referene genome that an be as large as 3 Gbp. For the SOLD sequener, wth reads as short as 30 bps, ths orresponds to the omputaton of 100 mllon matres of dmenson , whh results n a omputatonal task that s unfeasble even for a standard hgh performane mahne /$ - see front matter Ó 2011 Elsever B.V. All rghts reserved. do: /j.mpro

2 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) Hene, the omputatonal demands for the analyss of the bologal data produed by the varous sequenng tehnologes has lead to the development of several aeleratng strateges that am at parallelzng the exeuton of the algnment algorthms. Some of these strateges are software based, whle others use dedated hardware mplementatons. Among the former, an optmzed mplementaton usng Sngle-Instruton Multple-Data (SIMD) nstrutons for urrent CPUs [5] s ommonly adopted n sequene algnment programs, lke SSEARCH35. Other software mplementatons make use of the hghly parallel exeuton apabltes presented by Graphs Proessng Unt (GPU) to aheve a hgh algnment throughput [6]. Wth regard to the hardware mplementatons, these nlude both Applaton Spef Integrated Crut (ASIC) [7 10] and Feld Programmable Gate Array (FPGA) [11 15] mplementatons. Regardless of the onsdered mplementaton, the most ommon and effent hardware arhtetures map the algnment algorthm to a systol array of Proessng Element (PE). Furthermore, although some bdmensonal arrays have been presented [16], the most ommon mplementatons adopt undmensonal (lnear) arrays [7 13,15]. In fat, the man dfferenes among the several mplementatons relate to the desgn of the ndvdual PE. However, some of these desgns oversmplfy the mplemented algorthm, by only alulatng the edt dstane between a sequene par [12,14], therefore not beng suted to aelerate the more gener S W algorthm. A ommeral soluton [17], developed by CLC bo and mplemented n FPGA, was also made avalable but lttle nformaton s gven about ts arhteture. Nevertheless, all the prevously presented hardware solutons only fous on aeleratng the frst phase of the S W algorthm (DP matrx fll), ompletely dsregardng the seond phase (traebak), whh s typally performed usng a General Purpose Proessor (GPP) n a post proessng step. In Ref. [18] t was proposed a hardware arhteture that also aelerates the traebak phase. However, only the global algnment problem s addressed. Furthermore, the prevously proposed hardware arhtetures are not easly optmzed to deal wth short reads sequenes, obtaned from urrent HTSR sequenng platforms (e.g. Illumna). In another perspetve, there has been a growng nterest n the development of proessng solutons that merge, n a sngle pakage, the reonfguraton apabltes offered by FPGAs wth the advantages of a hardwred CPU. Suh solutons, lke the Intel Atom E645C proessor [19], allow to mplement hghly spealzed hardware aelerators tghtly oupled wth a general purpose CPU, n order to sgnfantly mprove the overall system performane. Furthermore, by makng use of the offered reonfguraton apabltes, t s possble to mplement a wde range of aelerators aordng to the spef task that s urrently beng exeuted. Ths task-multplexng apablty along the tme redues the total ost of ownershp of suh system due to ts adaptablty, low ntal ost and hgh performane. To overome the lmtatons of prevous aelerator arhtetures, to mprove the overall sequenng performane and to make use of the advantages provded by urrent FPGAs, a new hardware aelerator arhteture together wth a new tehnque to speedup the sequene algnment, s now proposed. Suh aelerator, targetng an embedded platform, s based on the explotaton of the followng two mportant ontrbutons that are extensvely desrbed n the remanng setons of the manusrpt: An nnovatve and qute effent tehnque that makes use of the nformaton gathered durng the omputaton of the algnment sores n the matrx fll phase (n hardware), n order to sgnfantly redue the tme and memory requrements of the traebak phase (later mplemented n software) [20]. To support suh tehnque, the developed hardware aelerator arhteture was tghtly ntegrated wth a GPP, to form a omplete and qute effent loal algnment system mplemented n an FPGA. The obtaned expermental results show that the proposed aeleratng struture may provde speedups as hgh as 16 for the mplementaton of the whole algnment proedure when ompared to an Intel Core2 Duo proessor. It s also observed that a sgnfant reduton of the memory resoures requred by the subsequent traebak phase s aheved. An addtonal level of parallelsm s also exploted n the proposed aeleratng struture, to further nrease ts performane. Wth the presented struture, several query sequenes may be smultaneously algned wth the same referene sequene, thus allowng a sgnfant aeleraton of the algnment task of the short reads aganst the referene genome, as used by HTSR tehnques. Ths s aheved by onfgurng the developed aelerator n a multple-stream struture, by nludng multple lnear arrays that work n parallel. Besdes the speedup that s aheved wth suh mprovement, whh s proportonal to the number of lnear arrays that are mplemented (defned through platform parameterzaton), the aelerator also takes advantages of the temporal loalty n the manpulaton of the larger referene genome, thus redung the number of requred memory and I/O aesses to perform the algnment. Ths manusrpt s organzed as follows: Seton 2 gves a bref overvew on the wdely adopted S W algorthm to determne the optmal algnment. The proposed tehnque to speed up the traebak phase s presented n Seton 3. Seton 4 ntrodues the newly developed aelerator arhteture that mplements the proposed enhanements n the algnment proedure, nludng the optmzatons for short reads sequenes. A performane model of the entre algnment system s presented n Seton 5. In Seton 6, the prototypng platform that ntegrates the proposed aelerator and a GPP s presented whle n Seton 7 the obtaned results are dsussed and the aheved speedups are presented. The onlusons are drawn n Seton Parwse loal sequene algnment Sequene algnment s the method by whh useful nformaton s extrated from the large amounts of sequened DNA. The algnments an be lassfed ether as loal or global. In global algnments, the omplete sequenes are algned from one end to the other, whereas n loal algnments only the subsequenes that present the hghest smlarty are onsdered. In prate, the loal algnment s generally preferred when searhng for smlartes between dstantly related bologal sequenes, sne ths type of algnment more losely fouses on the subsequenes that were onserved durng evoluton. One of the most wdely adopted algorthms to fnd the optmal loal algnment between a par of sequenes s the S W algorthm [3]. Ths algorthm s based on a DP method and s haraterzed by the smallest runtme among the optmal loal algnment algorthms. Wth a runtme omplexty of O(nm), where n and m denote the szes of the sequenes beng algned, the S W algorthm omputes the algnment n two phases: a DP matrx fll phase and a traebak phase Smth Waterman algorthm Consder any two strngs S 1 and S 2 of an alphabet R wth szes n and m, respetvely. The loal algnment of strngs S 1 and S 2 reveals whh par of substrngs of S 1 and S 2 optmally algn, suh that no other pars of substrngs have a hgher algnment sore. Let G(, j)

3 98 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) Table 1 Example of a substtuton sore matrx. Fg. 1. Obtaned loal algnment for the onsdered example sequenes. represent the best algnment sore between a suffx of strng S 1 [1..] and a suffx of strng S 2 [1..j]. The S W algorthm allows the omputaton of G(n, m) by reursvely alulatng G(, j), whh wll reveal the hghest algnment sore between the substrngs of strngs S 1 and S 2. The reursve relaton to alulate the loal algnment sore G(, j) s gven by Eq. (1), where Sb(S 1 (), S 2 (j)) denotes the substtuton sore value obtaned by algnng harater S 1 () aganst harater S 2 (j) and a represents the gap penalty ost (the ost of algnng a harater to a spae, also known as gap nserton). An example of a substtuton funton s shown n Table 1. 8 Gð 1; j 1ÞþSbðS 1 ðþ; S 2 ðjþþ >< Gð 1; jþ a Gð; jþ ¼max Gð; j 1Þ a >: 0 Gð; 0Þ ¼Gð0; jþ ¼0 The algnment sores are usually postve for haraters that math, thus denotng a smlarty between them. Msmathng haraters may have ether postve or negatve sores, aordng to the type of algnment that s beng performed, denotng the bologal proxmty between them. Dfferent substtuton sore matres may be used to reveal dfferent types of algnments. In fat, the partular sore values are usually defned by bologsts, aordng to evolutonary relatons. The gap penalty ost a s always a postve value. As soon as the entre sore matrx G s flled, the substrngs of S 1 and S 2 wth the best algnment an be found by loatng the ell wth the hghest sore n G. Then, all matrx ells that lead to ths hghest sore ell are sequentally determned by performng a traebak phase. Ths last phase onludes when a ell wth a zero sore s reahed, dentfyng the algned substrngs as well as the orrespondng algnment. The path taken at eah ell s hosen based on whh of the three neghborng ells (left, top-left and top) was used to alulate the urrent ell value usng the reurrene gven by Eq. (1). Table 2 shows an example of the alulated sore matrx for algnng two sequenes (S 1 = CAGCCTCGCT and S 2 = AATGCCATTGAC) Table 2 Example of an algnment sore matrx. ð1þ usng the substtuton sore matrx presented n Table 1 (a math has a sore of 3 and a msmath a sore of 1). The gap penalty has a value of 4. The shadowed ells represent the traebak path (startng at the hghest sore ell (8, 10)) that was taken n order to determne the best algnment. The resultng algnment s llustrated n Fg Trakng the algnment orgn and end ndexes As prevously referred, whenever a sequene par algnment s requred, t s neessary to mplement the traebak phase of the S W algorthm. Most sequene algnment hardware aelerators that have been proposed untl now [11,15,16] only mplement the sore matrx omputaton (wthout performng the traebak phase). Therefore, they smply output the alulated algnment sore (the hghest value of matrx G). Afterwards, whenever the obtaned sore s greater than a gven user-defned threshold, the whole G matrx must be realulated (usually by software, usng a GPP). However, ontrastng to what happened n the hardware aelerator, n ths realulaton all the ntermedate data that s requred to perform the traebak and retreve the orrespondng algnment must be mantaned n the GPP memory. Moreover, ths re-omputaton does not re-use any data from the prevous alulaton performed by the hardware aelerator. Suh stuaton an be even aggravated by the fat that typal algnments onsder sequenes wth a qute dssmlar sze, wth m n (e.g. HTSR sequenng analyss). Therefore, the sze of the subsequenes that partpate n the algnment s always n the order of n, meanng that a large part of matrx G that must be ompletely reomputed n the GPP s not even requred to obtan the algnment. To overome ths neffeny, an nnovatve tehnque s now proposed to sgnfantly redue the tme and memory spae that s requred to fnd the loal algnment n the traebak phase of ths algorthm. In fat, assumng that t s possble to know that the loal algnment of a gven sequene par S 1 and S 2 starts at poston S 1 (p) and S 2 (q), denoted as (p, q), and ends at poston S 1 (u) and S 1 (v), denoted as (u, v), then the loal algnment an be obtaned n the traebak phase by just onsderng the sore matrx orrespondng to substrngs S a = S 1 [p..u] and S b = S 2 [q..v]. To determne the harater poston where the algnment starts, an auxlary matrx C b s proposed. Let C b (, j) represent the oordnates of the sore matrx ell where the algnment of strngs S 1 [1..] and S 2 [1..j] starts. Usng the same DP method that s used to alulate matrx G(, j), t s possble to smultaneously buld matrx C b, wth the same sze as G, that traks the ell that orgnated the sore that reahed ell G(, j) (.e. the start of the algnment endng at ell (, j)). The reursve relatons to ompute matrx C b are gven by Eq. (2), wth ntal ondtons of C b (,0)=C b (0, j) = (0, 0). 8 ð; jþ; f Gð; jþ ¼Gð 1; j 1Þ þsbðs 1 ðþ; S 2 ðjþþ and C b ð 1; j 1Þ ¼ð0; 0Þ >< C b ð 1; j 1Þ; f Gð; jþ ¼Gð 1; j 1Þ C b ð; jþ ¼ þsbðs 1 ðþ; S 2 ðjþþ ð2þ and C b ð 1; j 1Þ ð0; 0Þ C b ð 1; jþ; f Gð; jþ ¼Gð 1; jþ a C b ð; j 1Þ; f Gð; jþ ¼Gð; j 1Þ a >: ð0; 0Þ; f Gð; jþ ¼0

4 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) Table 3 Example of an AOEI trakng matrx. Table 4 Redued algnment sore matrx. great reduton of the omputatonal effort (tme and spae) of the whole algnment algorthm. 4. Algnment ore arhteture Hene, by applyng the proposed tehnque, denoted as Algnment Orgn and End Indexes (AOEI) trakng, and by knowng the ell where the maxmum sore (G(u, v)) ourred, t s possble to determne from C b (u, v)=(p, q) the oordnates of the ell where the algnment began. Consequently, to obtan the desred algnment, the traebak phase only has to rebuld the sore matrx for the subsequenes S 1 [p..u] and S 2 [q..v], whh are usually onsderably smaller than the entre S 1 and S 2 sequenes. The obtaned matrx C b for the algnment example of sequenes S 1 and S 2, whose G matrx was presented n Table 2, s shown n Table 3. In ths example, by knowng from G matrx that the maxmum sore ours at ell (8, 10), t s possble to retreve the oordnates of the begnnng of the algnment n ell C b (8, 10) = (3, 4). Wth ths nformaton, the optmal loal algnment between S 1 and S 2 an be found by proessng only the substrngs S a = S 1 [3..8] = GCCTCG and S b = S 2 [4..10] = GCCATTG. Suh algnment (between S a and S b ) an now be determned by omputng a muh smaller G 0 matrx n the traebak phase, as shown n Table 4. The major advantage of ths tehnque s the sgnfant reduton of the tme and memory spae requred to reompute matrx G for the subsequenes that atually partpate n the algnment, when ompared to the entre sequenes. Therefore, t provdes a The loal algnment algorthm desrbed n Seton 2 s usually appled to proess bologal sequenes wth pronouned dssmlar szes m and n, where m n (e.g. m 10 6 and n 10 2 ). The matrx fll phase of the algnment algorthm s the most omputatonally ntensve part beng, therefore, a good anddate for parallelzaton. However, the data dependenes that exst n the alulaton of eah matrx ell hghly restrt the parallelzaton model. In fat, only the omputaton of the values along the matrx ant-dagonal dreton an be performed n parallel (to alulate the value for ell G(, j) t s neessary to know the values of G( 1, j 1), G(, j 1) and G( 1, j)). Spealzed parallel hardware that s apable of performng a great number of smultaneous arthmet operatons s espeally suted for ths task. In partular, lnear systol arrays wth several dental Proessng Elements (PEs), as shown n Fg. 2, have proved to be effent strutures to mplement ths type of omputaton, by smultaneously omputng the values of matrx G that are loated n a gven ant-dagonal [15] Base proessor element arhteture The PE s arhteture proposed n ths paper s based on the PE struture desrbed n Ref. [15] and llustrated n Fg. 3. Ths base PE only mplements the bas sore matrx alulaton and t s omposed by a two stage ppelned datapath that alulates eah matrx ell value (output n G(, j)). The throughput of eah element s one sore value per lok yle. Sne the S W algorthm requres the evaluaton of the maxmum sore value among the set of sores that ompose the entre matrx, t s neessary to nlude an addtonal datapath that selets the maxmum value that was Referene Sequene S 2 (M)... S 2 (2) S 2 (1) Query Sequene S 1 (1) S 1 (2) S 1 () S 1 (N) PE 1 PE 2 PE PE N Sb(S 1 (1),*) Sb(S 1 (2),*) Sb(S 1 (),*) Sb(S 1 (N),*) Query Sequene Data (substtuton matrx olumn) SR SR SR SR Auxlary Query Sequene Data Load Struture Fg. 2. Systol array struture for DNA algorthms.

5 100 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) S 2 (j) 2 2 S 2 (j 1) Sb(S 1 (),S 2 (j)) Sb w Max(-1,j) Max(,j) sgn extend G(,j 1) G( 1,j) + G( 1,j 1) + Sb(S 1 (),S 2 (j)) - α + G(,j) Fg. 3. Base arhteture of proessor element PE. alulated n the whole PE array (output Max(, j)). The wdth of the buses, denoted as and Sb w, are onstraned by the onsdered mplementaton ondtons of the aelerator. In partular, the wdth of the sore bus,, s dretly onstraned by the maxmum sze of the query sequene (the shortest among the two sequenes) and the sore matrx values. Suh substtuton sore values also determnes the sze of the orrespondng bus (Sb w ). Wth suh datapath, PE outputs the maxmum sore that was omputed by PEs 1 through. The array evolves along the tme, by shftng the referene sequene haraters through the PEs. The query sequene harater S 1 () s alloated to the th PE and ths PE performs, at every lok yle, the omputatons requred to determne the sore value of a ertan matrx ell. After all the referene sequene haraters S 2 (j) have passed through all the PEs, the algnment sore s avalable at output Max(, j) of the last PE. The omputaton that s performed n eah PE requres, among other operatons, the seleton of the substtuton sore orrespondng to the two haraters,.e. the value of Sb(S 1 (), S 2 (j)). Sne eah PE always operates wth the same harater of S 1, t only needs to store the olumn of the substtuton sore matrx (Sb) that represents the osts of algnng harater S 1 () wth the entre alphabet. In the omputaton of eah matrx ell value G(, j), the evaluaton of the maxmum value among the results of the three dstnt possbltes presented n Eq. (1) s also requred. In partular, the zero ondton of the S W algorthm s mplemented by ontrollng the reset nput of the regsters that store the G(, j) value. Suh reset makes use of the sgn bt of the sore value,.e. f the maxmum value among the three partal sores s negatve, then the regsters that hold that sore are leared Enhaned proessor element arhteture The PE arhteture that s now proposed mplements the AOEI aelerator tehnque that was desrbed n Seton 3. Wth ths tehnque, the re-omputaton of the entre G matrx when performng the traebak phase s avoded. It s mplemented by propagatng, through the PEs, not only the partal maxmum sores (as n the base PE), but also the oordnates of ther orgn (the begnnng of the algnment), together wth the oordnates where the maxmum sore ourred. As t was shown n Seton 3, ths greatly smplfes the traebak phase by only fousng on the substrngs that are atually nvolved n the algnment and avodng the re-omputaton of the whole matrx G. The arhteture of the enhaned PEs s presented n Fg. 4. Eah PE features a datapath that mplements Eqs. (1) and (2). The addtonal hardware that s requred to mplement Eq. (2) (the AOEI tehnque) s manly omposed of multplexers and regsters. The sgnals that ontrol these addtonal multplexers are generated by the magntude omparators ntegrated n the unts and that were already present n the base PE arhteture. The wdths of the oordnates buses, Cq w and Cr w, are onstraned by the maxmum query sequene sze and the maxmum referene sequene sze, respetvely. The wdth of the C w bus s the sum of Cq w and Cr w. The oordnates of the matrx ell under proessng are obtaned by usng the hardwred PE ndex () and the symbol oordnate (j) that omes alongsde wth the sequene harater present at nput S 2 (j). Regardng the nput data sgnals, the orgn oordnates that orrespond to the sore at nput G( 1, j) are present at nput C b ( 1, j). Lkewse, the orgn oordnates orrespondng to the sore at output G(, j) are present at output C b (, j). Fnally, the oordnates of the urrently hghest sore (present at Max(, j)) are output at MaxC b (, j) Short-read optmzatons When the query sequenes under proessng are aqured by short-read sequenng platforms, the sample sequenes an be extremely short and n some ases they may even have less symbols than the number of avalable PEs n the array. For nstane, the reads generated by the Illumna platform an be as short as 35 nuleotdes long. In suh a ase, several of the PEs do not perform any useful alulatons, due to the fat that no query sequene symbol s attrbuted to them. Ths stuaton would ertanly result n a substantal derease of the throughput of the array. Therefore, onsderng that n most pratal setups there s a very sgnfant number of short-read sequenes that must be algned wth the same referene sequene, alternatve arrangements of

6 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) j Cr w j 1 S 2 (j) 2 2 S 2 (j 1) Max(-1,j) Magntude Comparator Magntude Comparator Max(,j) MaxC b ( 1,j) 2C w MaxC b (,j) Sb(S 1 (),S 2 (j)) Cr w Sb w Cq w C b (,j 1) 2C w sgn extend C w G(,j 1) G( 1,j) + G( 1,j 1) + Sb(S 1 (),S 2 (j)) Magntude Comparator - α + Magntude Comparator G(,j) C w C w C b (,j) C b ( 1,j) C w = 0 C w C b ( 1,j 1) j 1 Fg. 4. Enhaned arhteture of proessor element PE. the avalable PE resoures may be onsdered n order to make t possble to smultaneously perform the algnment of more than one short query sequene to the same referene sequene. Ths optmzaton an be aheved wth the proposed arhteture by onfgurng the hardware aelerator n a multple-stream proessng sheme. In suh onfguraton, the aelerator nludes several oupled lnear arrays of PEs that work n parallel and algn to the same referene sequene. Hene, whle the referene sequene s smultaneously shfted to the multple arrays, the set of ndependent query sequenes to be proessed s dstrbuted and assgned among the PEs of the multple-stream array, as shown n Fg. 5. The exat number of parallel PE arrays s Query Sequenes A, B,...,X S A (1) S A (2) S A () S A (N) Referene Sequene S 2 (M)... S 2 (2) S 2 (1) PE 1A PE 2A PE A PE NA S B (1) S B (2) S B () S B (N) PE 1B PE 2B PE B PE NB S X (1) S X (2) S X () S X (N) PE 1X PE 2X PE X PE NX Fg. 5. Example onfguraton of a multple-stream PE array (several ndependent streams).

7 102 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) onfgurable aordng to the sze of the short-read sequenes to be algned and to the amount of avalable hardware resoures. It s worth notng that the mplemented multple-stream array also allows an mprovement of the resoure usage of the aelerator, sne t s possble to share a set of resoures that are ommon among the multple parallel PEs that are proessng the same referene sequene. Ths s aomplshed by usng a ommon set of regsters that hold the referene symbol (S 2 (j)) and the respetve oordnate (j) for the several elements of the array that work n parallel, as shown n Fg. 6 for a dual-stream onfguraton. Ths optmzaton an sgnfantly nrease the atual throughput of the array sne an nreased number of PEs s performng useful omputatons and the algnment of more than one query sequene may be smultaneously performed, therefore leadng to a greater speedup than would be aheved wth just a sngle array. Furthermore, the use of suh onfguraton also leads to a reduton of the amount of data that s transferred to the aelerator, sne the referene sequene s smultaneously algned wth more than one query sequene. Ths s espeally sgnfant when a large number of short-read query sequenes are algned to a large referene genome sequene Array programmng Sne eah PE ompares the referene sequene symbols wth a sngle query sequene harater, t wll just aess the values present at the orrespondng olumn of the substtuton matrx. Therefore, eah PE wll only reeve the substtuton sore matrx olumn that orresponds to the query sequene harater alloated to that PE. Suh data s stored n dedated regsters wthn eah PE, sne ths allows for a fast reprogrammng of a new query sequene. In the event of a PE s not beng used (beause the query sequene has a smaller sze than the number of avalable PEs (N)), the substtuton sore data that s stored n suh PE orresponds to a matrx olumn n whh every value s zero. To program the sore values orrespondng to query sequene S 1, an auxlary data load struture, omposed by a n bt-wdth shft regster, was nluded n the array. Ths struture allows the preloadng of the next query sequene data nto ths temporary storage shft regster, by serally shftng the substtuton matrx olumn, whle the array s stll proessng the data orrespondng to the urrent query sequene. As soon as the array has fnshed the proessng of the urrent query sequene, the next query sequene data (already stored n the auxlary shft regster) s parallel loaded (n just one lok yle) nto the respetve PEs. In ase the proposed aelerator arhteture s onfgured as a multple-stream struture, eah ndvdual PE array has the orrespondng auxlary data load struture for the query sequene, whh allows the smultaneous load of the query nformaton to the several PEs. Ths allows to mask the tme that would be requred to shft the next query sequene data nto the array and therefore sgnfantly redues ts programmng tme. Furthermore, the use of ths shft regster also provdes a salable method to program the proessor array, as t avods the use of a ommon data bus to program the several PEs Interfae To ntegrate the proposed hardware aelerator wth the GPP that wll mplement the remanng algnment proedure (.e. the Command buffer... Controller Status Data buffer... PE array Output buffer... Interonneton Bus GPP Fg. 6. Example of a multple-stream PE n dual-stream onfguraton. Fg. 7. Aelerator nterfae and nteronneton wth the GPP n the prototypng platform.

8 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) traebak), the systol array nludes an embedded ontroller that s responsble for deodng seven nstrutons (requred to properly ontrol the array), as well as to reeve the data to be proessed. The developed nterfae, llustrated n Fg. 7, s omposed of two nput Frst-In Frst-Out (FIFO) queue (one for the referene sequene and the other for ommands and the query sequene), one output FIFO queue (to return the proessed values) and one status regster. The two nput FIFOs allow the next query sequene to be loaded nto the array whle the urrent algnment s beng proessed, wthout nreasng the omplexty of the ontrol that would arse from havng all of the data (query and referene sequenes data) nput through the same FIFO. In the ase of a multple-stream onfguraton, the several query sequenes are nput usng the prevously mentoned nput FIFO and are then sent to a spef PE array, based on the nformaton defned by the man program runnng on the GPP. Afterwards, as soon as the algnment sores and orrespondng AOEI oordnates are alulated, they are serally stored n the output FIFO for later proessng n the GPP. Eah FIFO has a depth of 64 words and s 32-bts wde, to math the bus-wdth adopted by most urrent GPPs. The status regster ontans some nformaton about the avalable postons n eah of the nput FIFOs, allowng the mplementaton of a flow ontrol mehansm. Furthermore, ths status regster also ontans some nformaton regardng to the avalablty of output data n the output FIFO, ndatng when the aelerator has onluded the algnment. The developed nterfae allows ths aelerator to be nteronneted to several types of nteronneton buses, requrng only the desgn of the approprate log to deode the spef bus ontrol sgnals. The nput and output FIFOs an be mapped to the GPP memory address spae and therefore be easly aessble usng ommon load/store nstrutons. Ths type of nterfae an be used ether n PCI, PCIe, AMBA APB or other types of nteronnetons, therefore allowng ths aelerator to be used n a wde range of platforms. 5. Performane model The performane of a omplete algnment system omposed of several dfferent modules depends on the performane of eah ndvdual module and how they nterat. Among these are the CPU performane, the nteronneton (bus) throughput and the aelerator performane (f present). To better understand and evaluate the advantages provded by the proposed algnment struture, ths seton presents a thorough modelzaton of the resultng global performane. Typally, the set of operatons that are requred to perform an algnment n a system wthout an aelerator are: () database read, () data transfer to the proessng deve and () omputaton, whh nludes the matrx fll and the traebak phases. Assumng that these operatons are ompletely sequental, the total algnment tme (T s ) an be modeled as the sum of the database read tme (T db ), the data transfer tme T ds and the CPU proessng tme for the matrx fll phase and the traebak phase : T Ms T s ¼ T db þ T ds þ T Ms þ T T ð3þ The tme orrespondng to eah ndvdual omponent s gven by: T db ¼ n þ m f d B w Cg ; ð0 < g d d d 6 1Þ T ds ¼ n þ m f B w Cg ; T s ¼ TMs ð0 < g 6 1Þ þ T T ¼ ðnmþ f g þ k f g ; ð0 < g 6 1; P 1Þ T T ð4þ where n and m denote the query and referene sequene szes, k represents the number of ells traversed durng the traebak phase, f d, f and f denote the database read, nteronneton bus and CPU proessng frequenes, respetvely. C represents the ompresson fator (how many nuleotdes are enoded n an 8-bt word), whle B w d and Bw denote the wdth (n bytes) of the database read deve and of the nteronneton (bus), respetvely. The g d, g and g parameters denote effeny fators, whh take nto aount eventual ontenton on aessng the database, the nteronneton and the CPU, as well as nherent wat states and protool dependent ontrol operatons. Fnally, represents the average number of CPU lok yles requred to proess a sngle ell of the DP matrx. In sequental sngle-ore CPUs (wthout any aelerator), T s T db þ T ds, due to the OðnmÞ runtme of the matrx fll phase, whh leads to the ommonly observed total algnment tme of T s T s. In ontrast, when the proposed aelerator s present, the proessng s splt among the aelerator and the CPU. By onsderng (as an example) the arhteture of the proposed aelerator n a sngle-stream onfguraton, the tme t takes to ompute the whole DP sore matrx, n the aelerator (T a ) s gven by: T a ¼ N þ m 1 f a ; ðn 6 NÞ ð5þ where N represents the number of PEs n the array and f a denotes ther operatng frequeny. In ths parallel proessng sheme, the aelerator omputes the whole sore matrx (G), of sze n m, whle the CPU performs a muh smpler matrx fll (tme T Mr ) and traebak over the smaller matrx G 0, totalng a omputaton tme of T r ¼ TMr þ T T. Typally, an algnment only nludes part of the onsdered sequenes. Hene, the number of traversed ells durng the traebak (k) an be used to major the sze of the subsequenes that are used to ompute the smaller matrx G 0, whh wll thus have a maxmum sze of k k. By usng the proposed aelerator, t s possble to parallelze some operatons. In ths ase, the aelerator performs the DP matrx fll phase of the urrent sequene par algnment, whle the CPU mplements the traebak of the prevous sequene par. Therefore, both the aelerator and the CPU work n a ppelned way. Furthermore, t s also possble to read the next query sequene (as well as the referene sequene, f neessary) from the database n parallel wth the proessng of both the aelerator and the CPU. Ths type of proessng nvolves three dstnt data transfers, wth the respetve duraton: () from the database to the system s man memory T ds, () from the system s man memory to the aelerator T sa, and () from the aelerator to the system s man mem- ory T as. The tme to transfer the sore and oordnates output from the aelerator to the CPU T as, whh onssts of no more than fve 32-bt values, s qute small and thus an be negleted when ompared to the other parels T as T sa. The data transfers between the several omponents an our n parallel wth the remanng proessng (tme T r T ds algnment tme an be modeled as: n T p max T a ; T ds þ T sa ; T Mr þ T T ; T db þ T sa ). Therefore, the total Assumng that a data parallel 32-bt wde bus s used to nteronnet the aelerator, then B w ¼ 4. The same wdth s also typally used n the database deve nterfae, makng B w d ¼ 4. Furthermore, the used 2-bt enodng per nuleotde leads to C ¼ 4. Therefore, the algnment tme for these partular ondtons beomes: T p max N þm 1 nþm ; þ nþm ; ðkkþ þ k nþm ; f a 44 f g 44 f g f g f g 44 f d g d o ð6þ

9 104 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) Algnment tme (ms) T r T r T db T a T p Algnment Tme (ms) T p T s Speedup Speedup Referene/Query sze relaton (a) Referene/Query sze relaton (b) Fg. 8. Varaton of the algnment tme (T p ) and speedup (T s /T p ) aordng to the model desrbed n Eq. (7). ( T p max N þ m 1 n þ m ; þ n þ m ; ) ðk2 þ kþ n þ m ; f a 16 f g 16 f g f g 16 f d g d Hene, n an algnment senaro where the same referene sequene s algned to a large number of query sequenes (Q) and assumng that the referene sequene an be permanently stored n the system s man memory whle algnng all the respetve query sequenes, the database readng tme and the orrespondng data transfer tme to the system s memory are redued, leadng to an average algnment tme per query sequene ðt 1 Þ: ( ) T 1 max N þ m 1 ; n þ m=q þ n þ m f a 16 f g ; ðk2 þ kþ 16 f g f g ; n þ m=q 16 f d g d Moreover, onsderng that the aelerator may be able to perform b smultaneous algnments by usng the multple-stream feature, the average algnment tme for eah sequene par ðt b Þ s gven by ( ) T b max N þ m 1 ; n þ m=q þ n þ m=b f a b 16 f g ; ðk2 þ kþ 16 f g f g ; n þ m=q 16 f d g d As an example, Fg. 8 depts the total algnment tme aordng to the model desrbed by Eq. (7), n whh the referene sequene s read from the database for eah query sequene (worst ase). The onsdered model parameters are: n = k = 128, = 50, N = 128, f a = f = f = f d = 100 MHz, g = 0.2, g = 0.8 and g d = 0.5. One nterestng observaton that an be extrated from the presented model s onerned to the aelerator role n the resultng performane of the whole algnment system. In fat, as the relaton between the referene and the query sequene sze nreases, the aelerator beomes the most lmtng performane fator, as t has the hghest workload. Therefore, the nrease of the CPU performane, above a gven mnmum value, does not sgnfantly nfluene the performane of the algnment system leadng to a quas-statonary speedup value. In the presented example, the threshold value s about 8000, whh orresponds to the relaton between the referene and query sequenes szes frequently adopted n bonformat applatons. ð7þ ð8þ ð9þ 6. Prototypng platform To valdate the funtonalty and to assess the performane of the proposed hardware aelerator n a pratal realzaton, a omplete loal algnment system based on the S W algorthm was developed and mplemented. The bas onfguraton of ths system, used as a proof-of-onept, onssts of a Leon3 GPP proessor [21] that exeutes all operatons of the S W algorthm, exept those onernng the sore matrx omputaton phase. Suh phase s exeuted by the proposed hardware aelerator, atng as a spealzed funtonal unt of the GPP. The software mplementaton of the S W algorthm nludes some optmzatons n order to aheve more effent applatons n embedded systems. In partular, all memory aesses were optmzed by usng a stat memory alloaton mehansm. Speal attenton was also devoted to the data transfers of both the referene and query sequenes from the GPP to the proposed hardware aelerator, so that a hgh level of effeny s aheved Leon3 proessor The Leon3 proessor [21] s one of the most used proessor ores that are freely avalable. It was spefally desgned for embedded applatons by the European Spae Ageny, although nowadays t s mantaned by Gasler Researh. It onssts of a hghly onfgurable and fully syntheszable ore, desrbed n VHDL, mplementng a RISC arhteture onformng to the SPARC v8 defnton. Suh freely avalable VHDL desrpton allows ths GPP to be mplemented n several dfferent platforms (e.g. ASIC), unlke other propretary GPPs (e.g. Xlnx s MroBlaze). Furthermore, the avalablty of relable software development tools (e.g. ompler and debugger) for the Leon3 proessor make t an adequate hoe for the proof-of-onept system. The Leon3 32-bt ore s based on a Harvard mro-arhteture wth a 7-stage nstruton ppelne and 32-bt nternal regsters. The ore funtonalty an be easly extended by means of the AMBA 2.0 AHB/APB on-hp buses. The AMBA 2.0 AHB s used to onnet the Leon3 proessor wth hgh-speed ontrollers, suh as the ahe and memory ontrollers. On the other hand, the AMBA 2.0 APB s used to aess most on-hp perpherals and s onneted to the Leon3 proessor va the AHB/APB Brdge. External memory aess and memory mapped I/O operaton are

10 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) provded by a programmable memory ontroller wth nterfaes to PROM, SRAM and SDRAM hps DNA algnment perpheral A new perpheral, onsstng of the proposed hardware aelerator for DNA algnment, was developed and embedded n the Leon3 proessor (see Fg. 7). Ths algnment perpheral was onneted to the AMBA 2.0 APB as a slave deve. Ths bus was seleted not only beause t has enough bandwdth for all of the sequene data transfers, but also beause t offers a smple nterfae and low-power onsumpton. Some addtonal wrapper log, responsble for the adaptaton of the aelerator to the AMBA 2.0 APB bus, was also nluded, onsstng mostly of multplexers, deoders and a smple ontrol unt that mplements the bus protool. The I/O FIFOs and the status regster of the algnment ore are mapped n the Leon3 memory address spae. Hene, by usng suh nterfae, the wrte and read operatons over ths perpheral an be easly mplemented usng smple load and store operatons FPGA mplementaton The mplementaton of the proposed loal algnment system was realzed n an FPGA deve by usng a GR-CPCI-XC4V development board from Pender Eletron Desgn. Suh development system nludes a Vrtex4 XC4VLX100 FPGA deve from Xlnx, a 133 MHz 256 MB SRAM memory bank, and several perpherals for ontrol, ommunaton and storage purposes. The adopted Leon3 proessor s based on verson gpl b3403 of GRLIB. Ths soft-proessor ore was onfgured to norporate a hardware dvde and multply unt, an nterrupt ontroller, separate data and nstruton ahe ontrollers and an SRAM memory ontroller, all nteronneted wth the AMBA 2.0 AHB nterfae. Moreover, suh ore also enompasses two 32-bt tmers and the proposed DNA Algnment perpheral, whh were all onneted to the system AMBA 2.0 APB. 7. Expermental results The prevously presented aelerator arhteture, desrbed usng parameterzable VHDL ode, was syntheszed usng Xlnx ISE 10.1 (SP3) software tools and mplemented n the prevously desrbed FPGA. Ths reonfgurable embedded system, used fundamentally as a proof-of-onept prototypng platform, s omposed by the Leon3 GPP and the algnment aelerator ore wth an array omposed by a maxmum of 128 PEs. Although the maxmum operatng frequeny of the aelerator ore s 120 MHz, the atual operatng frequeny of the entre system s 60 MHz, as a onsequene of a lmtaton mposed by the onsdered Leon3 proessor mplementaton. However, as t was explaned n Seton 5, for the usual ranges of the relaton between the referene and query sequenes szes ths GPP lmtaton does not sgnfantly onstrant the overall performane of the system Sngle-stream onfguraton The obtaned resoure alloaton results of the entre algnment system, when onsderng the sngle-stream array onfguratons, are presented n Table 5. The resoures solely ouped by the Leon3 proessor are also presented as a referene. These results show that the Leon3 proessor alone oupes 18% of the avalable log resoures of the used FPGA deve. In what onerns the resoure alloaton for the systol array usng the enhaned PEs, t s possble to observe that t s 77% larger n relaton to the orrespondng base onfguraton, wthout the AOEI trakng funtonalty. However, the exat nrease of the amount of used hardware depends on the onsdered operatng envronment, namely, the sze of the sequenes to be algned (whh determnes the btwdth of the oordnate representaton) and the adopted sorng sheme (whh nfluenes the bt-wdth of the sore alulatons). To valdate and assess the performane of the proposed system, a set of real DNA sequenes was used for the referene sequene. These sequenes were obtaned from the GenBank database [2] and ther sze ranges from about to nuleotdes. Table 5 FPGA resoure alloaton of a sngle-stream array. PE Sore wdth Maxmum sze Resoure usage Type # Referene Query Regsters LUTs Leon (6%) 17,788 (18%) Base (8%) 19,818 (20%) Base ,736 (12%) 28,148 (29%) Base ,031 (16%) 34,130 (35%) Enh (10%) 22,168 (23%) Enh ,625 (23%) 36,114 (37%) Enh ,024 (41%) 56,541 (58%) Table 6 Proessng tme results for the algnment system when usng a sngle-stream array wth 128 PEs and a query sequene of 128 nuleotdes. Referene sze Proessng tme usng only the Leon3 proessor (ms) Matrx fll T Ms Traebak T T Total T s Proessng tme usng the Leon3 proessor and the proposed aelerator (ms) Sore and oordnates (HW) maxft a; T sa g Redued matrx fll (Leon3) T Mr Redued traebak (Leon3) T T Cyle perod T p 17, , , , , , ,311, ,623, Speedup

11 106 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) In what onerns the query sequenes, ther maxmum sze s lmted by the number of avalable PEs n the array. Consequently, for the mplemented onfguraton, t must not be greater than 128 nuleotdes long (a sze entrely ompatble wth the latest Next-Generaton Sequenng tehnologes [1]). In ths partular nstantaton, the sze of the hosen query sequenes s 128 nuleotdes. For larger query sequenes, the number of PEs n the array has to be nreased and, f neessary, the array an be expanded by onnetng another FPGA deve. The advantages provded by the proposed AOEI tehnque, as well as the performane of the developed hardware aelerator, were assessed usng the prevously seleted sequenes, whh were algned usng two dfferent methods: () pure software mplementaton, where the algnment between eah sequene par s obtaned usng a pure and straght-forward mplementaton of the S W algorthm runnng exlusvely on the GPP (keepng the entre sore matrx n memory) and () hardware aelerated mplementaton, where the algnment s obtaned by usng the developed aelerator (wth the enhaned PEs) and the GPP. The obtaned exeuton tme results for both of these methods are presented n Table 6. Whle the total proessng tme for the pure software mplementaton (T s ) s the sum of the partal tmes, the total tme of the hardware aelerated mplementaton (T p ) onsders the fat that the aelerator and the GPP work onurrently n a ppelned sheme: the aelerator determnes the sore and the algnment oordnates of a gven sequene par whle the GPP s performng the matrx reomputaton and traebak for the prevous par of proessed sequenes. Therefore, n the onurrent onfguraton, the presented total tme (T p ) s the maxmum value between the hardware aelerator max T a ; T sa and GPP exeuton tmes T Mr þ T T (see Eqs. (6) and (7)). It should be noted that the presented results for the aelerator proessng tme already onsder the ommunaton between the GPP and the aelerator max T a ; T sa and that the database readng and orrespondng data transfer tmes are not onsdered, sne the queres and referene sequene were pre-loaded to the system s man memory. The obtaned speedup was determned by omparng the tme requred to obtan eah whole algnment usng the pure software sequental mplementaton of the S W algorthm and the tme requred to obtan the same algnment wth the ad of the proposed AOEI tehnque and the orrespondng hardware aelerator. Aordng to the obtaned results, the attaned speedups may be as hgh as These speedups are n aordane to the trends predted n Seton 5 (see Fg. 8) and are the onsequene of a twofold ontrbuton: on the one hand, the parallelzaton of the whole matrx fll phase by the systol array; on the other hand, the reduton of the proessng tme requred to perform the traebak n the GPP, due to the sgnfant reduton of the sze of the sore matrx that must be reomputed n ths phase. At ths respet t s worth notng that the tme omplexty of the G matrx omputaton durng the matrx fll phase mplemented n the GPP s O(nm), whereas n the aelerator ths omplexty s redued to O(m), due to the parallel proessng n the n PEs. These two fators justfy the sgnfant speedup value that s attaned n determnng the loal algnment sore as t was desrbed n Seton 5. In what onerns the traebak phase, the tme omplexty s the same n both ases (O(n + m)). In fat, n order to perform the traebak n the GPP t s neessary to reompute the whole G matrx. Nevertheless, ths reomputaton tme s sgnfantly redued when the proposed AOEI tehnque s adopted. As an example, and onsderng the algnment of the 128 nuleotde query sequene wth the 1,311,701 referene sequene, the obtaned loal algnment spans over only a 124 nuleotde long subsequene of the referene sequene and over a 123 nuleotde subsequene of the query sequene. If the entre G matrx had to be reomputed to obtan the algnment, t would have approxmately ells, whh sgnfantly ontrasts wth the stuaton provded by the proposed AOEI tehnque, where the sze of the G 0 matrx that needs to be reomputed n the GPP s redued to only ells. Ths sgnfant reduton (of about four orders of magntude) s partularly mportant when the mplementaton of the algnment proedure s onsdered n embedded systems, wth strt memory and power onsumpton restrtons. Hene, not only does the proposed tehnque allow to sgnfantly redue the tme requred to obtan the algnment, but t also makes t possble to proess larger sequenes as t sgnfantly redues the amount of memory used by the GPP (e.g. the 2,623,402 nuleotde long referene sequene, whose memory requrements prevent t from beng algned usng the pure software approah on the GPP). Fnally, t s also mportant to note that the obtaned throughput results of the proposed systol PE array are n lne wth the results orrespondng to smlar arhtetures presented n the past [11,15,16]. However, suh past arhtetures were only foused on aeleratng the matrx-fll phase of the S W algorthm. In ontrast, besdes aeleratng the matrx-fll phase, the presented aelerator arhteture also mplements the new AOEI method and therefore returns addtonal nformaton that s subsequently used to further redue the omputatonal requrements. Suh feature s not nluded n any other proposals, therefore beng a dfferentatng haraterst of ths work and hnderng a dret and far omparson Multple-stream onfguraton To evaluate the developed multple-stream apablty, several dfferent onfguratons of the algnment system were mplemented. The orrespondng resoure usage results are presented n Table 7. The maxmum number of mplemented multplestreams was 3, sne the resoures of the onsdered FPGA deve do not allow for addtonal streams. However, any number of proessng streams s supported f there are enough resoures avalable to mplement them. All of the onsdered onfguratons have a maxmum referene sequene sze of As expeted, the results n Table 7 show a slght reduton n the amount of used resoures when ompared to an dentally szed sngle-stream array (.e. when the number of PEs of the snglestream array s equal to the number of PEs of the multple-stream array multpled by the number of streams). Ths reduton s due to the shared resoures among the multple-stream PEs, as well as the reduton n the bt-wdth requred to represent the AOEI oordnates, sne the query sequene beng algned s smaller. Therefore, n terms of used resoures, a trple-stream onfguraton s more advantageous to algn several short-read sequenes when ompared to three ompletely ndependent arrays. To evaluate the performane of the algnment task, three streams of short-read query sequenes, eah wth 35 nuleotdes and obtaned wth the Illumna sequenng platform, were algned Table 7 FPGA resoure usage of the multple-stream arrays. PE Sore wdth Resoure usage Type # n-stream Regsters LUTs Leon (6%) 17,788 (18%) Enh ,024 (41%) 56,541 (58%) Enh ,625 (23%) 36,114 (37%) Enh ,349 (39%) 54,183 (55%) Enh ,427 (16%) 28,071 (29%) Enh ,149 (25%) 38,169 (39%) Enh ,687 (33%) 48,205 (49%) Enh ,299 (35%) 50,143 (51%)

12 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) Table 8 Proessng tme results usng multple-stream array onfguratons, to algn three query sequene streams, eah wth 35 nuleotdes. #PE n-stream Referene sze Proessng tme usng the Leon3 proessor and the proposed aelerator (ms) PE oupany rate (%) Sore and oordnates (HW) Redued matrx fll (Leon3) Redued traebak (Leon3) ,623, ,623, ,623, Table 9 Performane omparson wth an Intel Core2 Duo CPU. Query sze Deve Intel CPU Aelerator (128 1) Intel CPU Aelerator (35 1) Aelerator (35 3) Tme (ms) Speedup Equvalent MCUPS to the same 2,623,402 nuleotdes long referene sequene. Sx dfferent aelerator onfguratons, all wth the proposed AOEI funtonalty, were used to obtan the algnments: () the sngle-stream onfguraton wth 128 PEs that was used n the prevous seton, () a sngle-stream, and a () dual-stream onfguratons wth 64 PEs eah, (v) a sngle-stream, (v) a dual-stream, and (v) a trple-stream onfguratons wth 35 PEs eah. The 35 PE arrays are adequately ftted to the sze of the short-reads beng algned usng ths sequenng tehnology. The aheved proessng tme results for algnng the three query streams usng the prevously desrbed aelerator onfguratons are presented n Table 8. As t s possble to observe, the algnment task s onsderably faster when the aelerator s onfgured as a multple-stream array wth the number of PEs n eah array dental to the query sequene sze, sne ths leads to a onfguraton where all the PEs are performng useful alulatons, leadng to a PE oupany rato of 100%. If the sngle-stream array wth 128 PEs s used to algn the three streams of 35 nuleotdes long query sequenes, the PE oupany rato of the array s sgnfantly dereased (down to 27%). Ths means that a sgnfant part of the PEs would be performng null operatons, sne only 35 of them would have a query sequene symbol assgned, therefore dereasng the atual throughput of the array. Consequently, the requred tme to obtan the sore and the ndex oordnates for the three streams of query sequenes s roughly three tmes the tme requred to obtan the same nformaton for a sngle stream (see Table 6). However, usng a trple-stream array where the number of PEs s adequately ftted to the query sequene sze (35 nuleotdes), t s possble to smultaneously algn three dfferent query sequenes usng the same hardware resoures, as presented n Table 8. Therefore, not only s the overall effeny of the system sgnfantly nreased, as an addtonal speedup s also aheved, proportonal to the number of mplemented lnear arrays Comparson and dsusson To omplete the presented arhteture evaluaton, the performane of the proposed aelerator was also ompared to the performane of a pure-software mplementaton runnng on a ommon CPU. The SSEARCH35 software program from the FASTA framework was used for ths purpose, sne t s one of the most used programs to determne the loal algnment. Ths program mplements the state-of-the-art SIMD optmzatons proposed n Ref. [22] and was exeuted on a 2.4 GHz Intel Core2 Duo proessor. The exeuton tmes were obtaned by algnng the same query and referene sequenes adopted n the prevous evaluatons. The obtaned exeuton tmes, presented n Table 9, show that the speedup attaned wth the oneved aelerator when ompared wth a pure software mplementaton runnng on the Core2 Duo may be as hgh as 16. In partular, the lower proessng tme obtaned for the short sequenes s due to the better usage of the avalable hardware resoures provded by the aelerator, whh enabled a trple-stream onfguraton usng the same FPGA deve. Table 9 also nludes the equvalent mllon ell updates per seond (MCUPS) metr, whh s ommonly used to ompare the performane of algnment algorthms aross dfferent platforms. However, ths metr only takes nto aount the throughput of the matrx fll phase of the S W algorthm, wthout onsderng the traebak phase requrements. Nevertheless, the performane obtaned usng the developed system stll aheves a sgnfant speedup ompared to the SSEARCH35 program. The derease n performane of the software based soluton for the smaller query sequenes reveals ts nablty to mantan the performane levels wth suh short sequenes. Furthermore, t s mportant to reall that the overall performane of the aelerator s proportonal to the total number of PEs, thus the apparent smaller equvalent performane of the 35 trple-stream PEs (35 3 = 105) array when ompared to the 128 sngle-stream PEs array. Regardng the database read rate, the mplemented aelerator requres one referene sequene nuleotde n eah lok yle. As prevously mentoned, four nuleotdes are enoded n eah byte, thus the aelerator requres an nput transfer rate of, at least, 15 MB/s. In the worst ase senaro, n whh the referene sequene s not stored n the man memory and needs to be read from the database at eah algnment, the database readng rate (whh also needs to aount for the muh smaller query sequene reads) has to be hgher than 15 MB/s n order to sustan the operaton of the aelerator at ts maxmum performane. Current manstream storage deves (hard dsk drves) have a sustaned throughput above 100 MB/s. Even when onsderng the aelerator runnng at 120 MHz, the database readng rate would double to 30 MB/s, stll well below the throughput of the storage deves.

13 108 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) To further demonstrate the mplementaton alternatves offered by the proposed aelerator, the proessng ore was also syntheszed for the FPGA deve avalable n the Intel Atom E645C proessor, an Altera Arra II GX deve [19]. The synthess was performed usng the Quartus II v10.1 software from Altera. The obtaned results demonstrated that the proessor s apable of operatng at a lok frequeny of 120 MHz. Synthess results also revealed that the avalable hardware resoures of ths deve allow to mplement aelerators wth 128 PEs n dual-stream onfguraton and wth 35 PEs n a 6-stream onfguraton. Aordng to the model derved n Seton 5, these onfguratons sgnfantly mprove the overall performane of the aelerator allowng for the onurrent algnment of two 128 nuleotdes long query sequenes (N = 128, b = 2) or sx 35 nuleotdes long query sequenes (N = 35, b = 6), respetvely. In ths ase, only the aelerator s mplemented n the FPGA, whle the Intel Atom proessor performs the role of the GPP. Fnally, a last observaton onernng the system ost s deserved. In fat, the aquston ost of a system based on a hybrd platform, lke the Intel Atom E645C, s smlar to the ost of urrent off-the-shelf omputng systems, lke those based on the Intel Core2 Duo proessors. However, f the hgher throughput provded by the aelerator mplemented n the FPGA s taken nto aount, the algnment system based on ths new platform wll aheve a muh smaller ost per algnment than urrent mplementatons. Moreover, the memory sze reduton provded by the proposed aelerator also allows a further reduton of the total system ost. 8. Conlusons A hghly effent hardware aelerator arhteture that sgnfantly speedups the mplementaton of DNA loal algnment algorthms s presented. Suh aelerator s based on the explotaton of an nnovatve and qute effent tehnque to sgnfantly redue the omputatonal tme and memory requrements of the traebak phase that s exeuted as part of the wdely used Smth Waterman algorthm. Furthermore, the developed struture also explots an addtonal level of parallelsm, n order to smultaneously algn several query sequenes wth the same referene sequene, by adoptng a mult-stream proessng flow. Suh feature s partularly useful n the proessng of short-read DNA sequenes obtaned from urrent HTSR sequenng tehnologes. The developed aelerator was ntegrated wth a Leon3 general purpose proessor, n order to prototype a omplete embedded algnment system for DNA proessng. The oneved platform was mplemented n a Vrtex-4 FPGA. The obtaned results demonstrate that the developed aelerator provdes speedups as hgh as 6042, when ompared wth a pure software verson of the Smth Waterman algorthm, runnng on the Leon3 proessor. Speedups up to 16 were also aheved when ompared to an hghly optmzed SIMD software mplementaton runnng on an Intel Core 2 Duo Proessor. The obtaned results also reveal that the proposed multplestream onfguratons favor the explotaton of the avalable FPGA resoures and ad n mantanng the array runnng at maxmum performane n dfferent algnment senaros. Moreover, t was shown that the use of the proposed aelerator enables the algnment of larger DNA sequenes, even n a memory restrted envronments. Aknowledgments The presented researh was performed n the sope of projet HELIX: Heterogeneous Mult-Core Arhteture for Bologal Sequene Analyss, funded by the Portuguese Foundaton for Sene and Tehnology (FCT) wth referene PTDC/EEA-ELC/113999/2009, and partally supported by FCT (INESC-ID multannual fundng) through the PIDDAC Program funds and through the Ph.D. grant wth referene SFRH/BD/43497/2008. Referenes [1] J. Shendure, H. J, Next-generaton DNA sequenng, Nat. Botehnol. 26 (2008) [2] D.A. Benson, I. Karsh-Mzrah, D.J. Lpman, J. Ostell, E.W. Sayers, GenBank, Nule Ads Res. 38 (2010) D46 D51. [3] T.F. Smth, M.S. Waterman, Identfaton of ommon moleular subsequenes, J. Mol. Bol. 147 (1981) [4] M.J. Chasson, P.A. Pevzner, Short read fragment assembly of bateral genomes, Genome Res. 18 (2008) [5] M.S. Farrar, Strped Smth-Waterman speeds database searhes sx tmes over other SIMD mplementatons, Bonformats 23 (2007) [6] L. Lgowsk, W. Rudnk, An effent mplementaton of Smth Waterman algorthm on GPU usng CUDA, for massvely parallel sannng of sequene databases, n: IEEE Int. Symp. Parallel & Dstrbuted Proessng, IPDPS 2009, IEEE, 2009, pp [7] E.T. Chow, J.C. Peterson, M.S. Waterman, T. Hunkapller, B.A. Zmmermann, A systol array proessor for bologal nformaton sgnal proessng, n: Pro. of the 5th Internatonal Conf. on Superomputng, ICS 91, ACM, New York, NY, USA, 1991, pp [8] P. Guerdoux-Jamet, D. Lavener, SAMBA: hardware aelerator for bologal sequene omparson, Bonformats 13 (1997) [9] T. Han, S. Parameswaran, Swasad: an as desgn for hgh speed DNA sequene mathng, n: Pro. of the 2002 Asa and South Paf Desgn Automaton Conf., ASP-DAC 02, IEEE Computer Soety, Washngton, DC, USA, 2002, pp [10] C. Whte, R. Sngh, P. Rentjes, J. Lampe, B. Erkson, W. Dettloff, V. Ch, S. Altshul, BoSCAN: a VLSI-based system for bosequene analyss, n: Pro. IEEE Int. Conf. on Computer Desgn: VLSI n Computers and Proessors, ICCD 91, pp [11] K. Benkrd, Y. Lu, A. Benkrd, A hghly parameterzed and effent FPGA-based skeleton for parwse bologal sequene algnment, IEEE Trans. Very Large Sale Integr. (VLSI) Syst. 17 (2009) [12] G. Caffarena, C. Pedrera, C. Carreras, S. Bojan, O. Neto-Taladrz, FPGA aeleraton for DNA sequene algnment, J. Cruts Syst. Comput. 16 (2007) [13] M. Gokhale, B. Holmes, A. Kopser, D. Kunze, D. Loprest, S. Luas, R. Mnnh, P. Olsen, Splash: a reonfgurable lnear log array, n: Int. Conf. on Parallel Proessng, 1990, pp [14] S.A. Guone, E. Keller, Gene mathng usng JBts, n: Pro. 12th Int. Conf. Feld-Programmable Log and Applatons. FPL 02, Sprnger-Verlag, London, UK, 2002, pp [15] T. Olver, B. Shmdt, D. Maskell, Hyper ustomzed proessors for bosequene database sannng on FPGAs, n: Pro. 13th Int. Symp. Feld- Programmable Gate Arrays, FPGA 05, ACM, 2005, pp [16] L. Hasan, Z. Al-Ars, Z. Nawaz, K. Bertels, Hardware mplementaton of the Smth Waterman algorthm usng reursve varable expanson, n: 3rd Int. Desgn and Test Workshop, IDT 2008, IEEE, 2008, pp [17] CLC Bo, Whte paper on CLC Bonformats Cube 1.03, Tehnal Report, CLC Bo, Fnlandsgade Aarhus N Denmark, [18] S. Lloyd, Q. Snell, Sequene algnment wth traebak on reonfgurable hardware, n: Int. Conf. Reonfgurable Computng and FPGAs ReConFg 08, IEEE, 2008, pp [19] Intel Ò Atom Proessor E6x5C Seres Produt Prevew Datasheet, Intel Corporaton, [20] N. Sebastão, T. Das, N. Roma, P. Flores, Integrated aelerator arhteture for DNA sequenes algnment wth enhaned traebak phase, n: Internatonal Conferene on Hgh Performane Computng and Smulaton. HPCS, 2010, pp [21] Aeroflex Gasler, SPARC V8 32-bt Proessor LEON3/ LEON3-FT CompanonCore Data Sheet, Verson 1.0.3, [22] M. Farrar, Strped Smth Waterman speeds database searhes sx tmes over other SIMD mplementatons, Bonformats 23 (2007) Nuno Sebastão was born n Lsbon, Portugal n Sne 2007, he holds a M.S. degree n Eletral and Computer Engneerng from Insttuto Superor Téno (IST), Tehnal Unversty of Lsbon, Lsbon, Portugal. In 2007 he joned the Insttuto de Engenhara de Sstemas e Computadores R&D (INESC-ID) as a researher of the Sgnal Proessng Group (SPS) where he s urrently workng towards hs PhD degree, also n Eletral and Computer Engneerng. Hs man researh nterests are foused on Dedated Mult-Core Computer Arhtetures and Hgh-Performane Systems for Bologal Sequene Algnment (DNA, RNA and protens). He s a member of the IEEE Cruts and Systems Soety.

14 N. Sebastão et al. / Mroproessors and Mrosystems 36 (2012) Nuno Roma was born n Entronamento Portugal n He reeved the Ph.D. degree n eletral and omputer engneerng from Insttuto Superor Téno (IST), Unversdade Téna de Lsboa, Lsbon, Portugal, n He s urrently an Assstant Professor wth the Department of Computer Sene and Engneerng at IST, and a Senor Researher of the Sgnal Proessng Systems Group (SPS) of Insttuto de Engenhara de Sstemas e Computadores R&D (INESC-ID). Hs researh nterests nlude spealsed omputer arhtetures for dgtal sgnal proessng (nludng bologal sequenes proessng and mage and vdeo odng/transodng), embedded systems desgn and ompressed-doman vdeo proessng algorthms. He has ontrbuted to more than 40 papers to journals and nternatonal onferenes. He s a member of the IEEE Cruts and Systems Soety and a member of ACM. Paulo Flores reeved the fve-year engneerng degree, M.S. and Ph.D. degrees n eletral and omputer engneerng from the Insttuto Superor Téno, Tehnal Unversty of Lsbon, Lsbon, Portugal, n 1989, 1993, and 2001, respetvely. Sne 1990, he has been teahng at Insttuto Superor Téno, Tehnal Unversty of Lsbon, where he s urrently an Assstant Professor n the Department of Eletral and Computer Engneerng. He has also been wth the Insttuto de Engenhara de Sstemas e Computadores R&D (INESC-ID), Lsbon, sne 1988, where he s urrently a Senor Researher n the Algorthms for Optmzaton and Smulaton Group (ALGOS). Hs researh nterests are omputer arhteture and CAD for VLSI ruts n the area of embedded systems, test and verfaton of dgtal systems, and omputer algorthms, wth partular emphass on optmzaton of hardware/software problems usng satsfablty (SAT) models. He s a member of the IEEE Crut and Systems Soety.

Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec

Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec Matrx-Matrx Multplaton Usng Systol Array Arhteture n Bluespe Team SegFault Chatanya Peddawad (EEB096), Aman Goel (EEB087), heera B (EEB090) Ot. 25, 205 Theoretal Bakground. Matrx-Matrx Multplaton on Hardware

More information

Bit-level Arithmetic Optimization for Carry-Save Additions

Bit-level Arithmetic Optimization for Carry-Save Additions Bt-leel Arthmet Optmzaton for Carry-Sae s Ke-Yong Khoo, Zhan Yu and Alan N. Wllson, Jr. Integrated Cruts and Systems Laboratory Unersty of Calforna, Los Angeles, CA 995 khoo, zhanyu, wllson @sl.ula.edu

More information

Connectivity in Fuzzy Soft graph and its Complement

Connectivity in Fuzzy Soft graph and its Complement IOSR Journal of Mathemats (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 1 Issue 5 Ver. IV (Sep. - Ot.2016), PP 95-99 www.osrjournals.org Connetvty n Fuzzy Soft graph and ts Complement Shashkala

More information

Interval uncertain optimization of structures using Chebyshev meta-models

Interval uncertain optimization of structures using Chebyshev meta-models 0 th World Congress on Strutural and Multdsplnary Optmzaton May 9-24, 203, Orlando, Florda, USA Interval unertan optmzaton of strutures usng Chebyshev meta-models Jngla Wu, Zhen Luo, Nong Zhang (Tmes New

More information

Progressive scan conversion based on edge-dependent interpolation using fuzzy logic

Progressive scan conversion based on edge-dependent interpolation using fuzzy logic Progressve san onverson based on edge-dependent nterpolaton usng fuzzy log P. Brox brox@mse.nm.es I. Baturone lum@mse.nm.es Insttuto de Mroeletróna de Sevlla, Centro Naonal de Mroeletróna Avda. Rena Meredes

More information

Measurement and Calibration of High Accuracy Spherical Joints

Measurement and Calibration of High Accuracy Spherical Joints 1. Introduton easurement and Calbraton of Hgh Auray Spheral Jonts Ale Robertson, Adam Rzepnewsk, Alexander Sloum assahusetts Insttute of Tehnolog Cambrdge, A Hgh auray robot manpulators are requred for

More information

Session 4.2. Switching planning. Switching/Routing planning

Session 4.2. Switching planning. Switching/Routing planning ITU Semnar Warsaw Poland 6-0 Otober 2003 Sesson 4.2 Swthng/Routng plannng Network Plannng Strategy for evolvng Network Arhtetures Sesson 4.2- Swthng plannng Loaton problem : Optmal plaement of exhanges

More information

Semi-analytic Evaluation of Quality of Service Parameters in Multihop Networks

Semi-analytic Evaluation of Quality of Service Parameters in Multihop Networks U J.T. (4): -4 (pr. 8) Sem-analyt Evaluaton of Qualty of Serve arameters n Multhop etworks Dobr tanassov Batovsk Faulty of Sene and Tehnology, ssumpton Unversty, Bangkok, Thaland bstrat

More information

Research on Neural Network Model Based on Subtraction Clustering and Its Applications

Research on Neural Network Model Based on Subtraction Clustering and Its Applications Avalable onlne at www.senedret.om Physs Proeda 5 (01 ) 164 1647 01 Internatonal Conferene on Sold State Deves and Materals Sene Researh on Neural Networ Model Based on Subtraton Clusterng and Its Applatons

More information

Performance Evaluation of TreeQ and LVQ Classifiers for Music Information Retrieval

Performance Evaluation of TreeQ and LVQ Classifiers for Music Information Retrieval Performane Evaluaton of TreeQ and LVQ Classfers for Mus Informaton Retreval Matna Charam, Ram Halloush, Sofa Tsekerdou Athens Informaton Tehnology (AIT) 0.8 km Markopoulo Ave. GR - 19002 Peana, Athens,

More information

A Novel Dynamic and Scalable Caching Algorithm of Proxy Server for Multimedia Objects

A Novel Dynamic and Scalable Caching Algorithm of Proxy Server for Multimedia Objects Journal of VLSI Sgnal Proessng 2007 * 2007 Sprnger Sene + Busness Meda, LLC. Manufatured n The Unted States. DOI: 10.1007/s11265-006-0024-7 A Novel Dynam and Salable Cahng Algorthm of Proxy Server for

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Optimal shape and location of piezoelectric materials for topology optimization of flextensional actuators

Optimal shape and location of piezoelectric materials for topology optimization of flextensional actuators Optmal shape and loaton of pezoeletr materals for topology optmzaton of flextensonal atuators ng L 1 Xueme Xn 2 Noboru Kkuh 1 Kazuhro Satou 1 1 Department of Mehanal Engneerng, Unversty of Mhgan, Ann Arbor,

More information

Fuzzy Modeling for Multi-Label Text Classification Supported by Classification Algorithms

Fuzzy Modeling for Multi-Label Text Classification Supported by Classification Algorithms Journal of Computer Senes Orgnal Researh Paper Fuzzy Modelng for Mult-Label Text Classfaton Supported by Classfaton Algorthms 1 Beatrz Wlges, 2 Gustavo Mateus, 2 Slva Nassar, 2 Renato Cslagh and 3 Rogéro

More information

Scalable Parametric Runtime Monitoring

Scalable Parametric Runtime Monitoring Salable Parametr Runtme Montorng Dongyun Jn Patrk O Nel Meredth Grgore Roşu Department of Computer Sene Unversty of Illnos at Urbana Champagn Urbana, IL, U.S.A. {djn3, pmeredt, grosu}@s.llnos.edu Abstrat

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Minimize Congestion for Random-Walks in Networks via Local Adaptive Congestion Control

Minimize Congestion for Random-Walks in Networks via Local Adaptive Congestion Control Journal of Communatons Vol. 11, No. 6, June 2016 Mnmze Congeston for Random-Walks n Networks va Loal Adaptve Congeston Control Yang Lu, Y Shen, and Le Dng College of Informaton Sene and Tehnology, Nanjng

More information

Multilabel Classification with Meta-level Features

Multilabel Classification with Meta-level Features Multlabel Classfaton wth Meta-level Features Sddharth Gopal Carnege Mellon Unversty Pttsburgh PA 523 sgopal@andrew.mu.edu Ymng Yang Carnege Mellon Unversty Pttsburgh PA 523 ymng@s.mu.edu ABSTRACT Effetve

More information

Integrated Accelerator Architecture for DNA Sequences Alignment with Enhanced Traceback Phase

Integrated Accelerator Architecture for DNA Sequences Alignment with Enhanced Traceback Phase Integrated Accelerator Architecture for DNA Sequences Alignment with Enhanced Traceback Phase Nuno Sebastião Tiago Dias Nuno Roma Paulo Flores INESC-ID INESC-ID / IST INESC-ID INESC-ID IST-TU Lisbon ISEL-PI

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams

A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams A MPAA-Based Iteratve Clusterng Algorthm Augmented by Nearest Neghbors Searh for Tme-Seres Data Streams Jessa Ln 1, Mha Vlahos 1, Eamonn Keogh 1, Dmtros Gunopulos 1, Janwe Lu 2, Shouan Yu 2, and Jan Le

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

A Fast Way to Produce Optimal Fixed-Depth Decision Trees

A Fast Way to Produce Optimal Fixed-Depth Decision Trees A Fast Way to Produe Optmal Fxed-Depth Deson Trees Alreza Farhangfar, Russell Grener and Martn Znkevh Dept of Computng Sene Unversty of Alberta Edmonton, Alberta T6G 2E8 Canada {farhang, grener, maz}@s.ualberta.a

More information

A Real-Time Detecting Algorithm for Tracking Community Structure of Dynamic Networks

A Real-Time Detecting Algorithm for Tracking Community Structure of Dynamic Networks A Real-Tme Detetng Algorthm for Trakng Communty Struture of Dynam Networks Jaxng Shang*, Lanhen Lu*, Feng Xe, Zhen Chen, Jaa Mao, Xueln Fang, Cheng Wu* Department of Automaton, Tsnghua Unversty, Beng,,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Design Level Performance Modeling of Component-based Applications. Yan Liu, Alan Fekete School of Information Technologies University of Sydney

Design Level Performance Modeling of Component-based Applications. Yan Liu, Alan Fekete School of Information Technologies University of Sydney Desgn Level Performane Modelng of Component-based Applatons Tehnal Report umber 543 ovember, 003 Yan Lu, Alan Fekete Shool of Informaton Tehnologes Unversty of Sydney Ian Gorton Paf orthwest atonal Laboratory

More information

Topic 5: semantic analysis. 5.5 Types of Semantic Actions

Topic 5: semantic analysis. 5.5 Types of Semantic Actions Top 5: semant analyss 5.5 Types of Semant tons Semant analyss Other Semant tons Other Types of Semant tons revously, all semant atons were for alulatng attrbute values. In a real ompler, other types of

More information

Steganalysis of DCT-Embedding Based Adaptive Steganography and YASS

Steganalysis of DCT-Embedding Based Adaptive Steganography and YASS Steganalyss of DCT-Embeddng Based Adaptve Steganography and YASS Qngzhong Lu Department of Computer Sene Sam Houston State Unversty Huntsvlle, TX 77341, U.S.A. lu@shsu.edu ABSTRACT Reently well-desgned

More information

Bottom-Up Fuzzy Partitioning in Fuzzy Decision Trees

Bottom-Up Fuzzy Partitioning in Fuzzy Decision Trees Bottom-Up Fuzzy arttonng n Fuzzy eson Trees Maej Fajfer ept. of Mathemats and Computer Sene Unversty of Mssour St. Lous St. Lous, Mssour 63121 maejf@me.pl Cezary Z. Janow ept. of Mathemats and Computer

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Time Synchronization in WSN: A survey Vikram Singh, Satyendra Sharma, Dr. T. P. Sharma NIT Hamirpur, India

Time Synchronization in WSN: A survey Vikram Singh, Satyendra Sharma, Dr. T. P. Sharma NIT Hamirpur, India Internatonal Journal of Enhaned Researh n Sene Tehnology & Engneerng, ISSN: 2319-7463 Vol. 2 Issue 5, May-2013, pp: (61-67), Avalable onlne at: www.erpublatons.om Tme Synhronzaton n WSN: A survey Vkram

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

TAR based shape features in unconstrained handwritten digit recognition

TAR based shape features in unconstrained handwritten digit recognition TAR based shape features n unonstraned handwrtten dgt reognton P. AHAMED AND YOUSEF AL-OHALI Department of Computer Sene Kng Saud Unversty P.O.B. 578, Ryadh 543 SAUDI ARABIA shamapervez@gmal.om, yousef@s.edu.sa

More information

Implementing Lattice Boltzmann Computation on Graphics Hardware

Implementing Lattice Boltzmann Computation on Graphics Hardware To appear n The Vsual omputer Implementng Latte oltzmann omputaton on Graphs Hardware We L, Xaomng We, and re Kaufman enter for Vsual omputng (V) and epartment of omputer Sene State Unversty of New York

More information

GPU Accelerated Elevation Map based Registration of Aerial Images

GPU Accelerated Elevation Map based Registration of Aerial Images GPU Aelerated Elevaton Map based Regstraton of Aeral Images Joseph Frenh, Student Member, IEEE, Wllam Turr, Joseph Fernando, Member, IEEE and Er Balster Senor Member IEEE {joseph.frenh, Wllam.turr, joseph.fernando}@udr.udayton.edu,

More information

On the End-to-end Call Acceptance and the Possibility of Deterministic QoS Guarantees in Ad hoc Wireless Networks

On the End-to-end Call Acceptance and the Possibility of Deterministic QoS Guarantees in Ad hoc Wireless Networks On the End-to-end Call Aeptane and the Possblty of Determnst QoS Guarantees n Ad ho Wreless Networks S. Srram T. heemarjuna Reddy Dept. of Computer Sene Dept. of Computer Sene and Engneerng Unversty of

More information

Cluster ( Vehicle Example. Cluster analysis ( Terminology. Vehicle Clusters. Why cluster?

Cluster (  Vehicle Example. Cluster analysis (  Terminology. Vehicle Clusters. Why cluster? Why luster? referene funton R R Although R and R both somewhat orrelated wth the referene funton, they are unorrelated wth eah other Cluster (www.m-w.om) A number of smlar ndvduals that our together as

More information

Link Graph Analysis for Adult Images Classification

Link Graph Analysis for Adult Images Classification Lnk Graph Analyss for Adult Images Classfaton Evgeny Khartonov Insttute of Physs and Tehnology, Yandex LLC 90, 6 Lev Tolstoy st., khartonov@yandex-team.ru Anton Slesarev Insttute of Physs and Tehnology,

More information

International Journal of Pharma and Bio Sciences HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS ABSTRACT

International Journal of Pharma and Bio Sciences HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS ABSTRACT Int J Pharm Bo S 205 Ot; 6(4): (B) 799-80 Researh Artle Botehnology Internatonal Journal of Pharma and Bo Senes ISSN 0975-6299 HYBRID CLUSTERING ALGORITHM USING POSSIBILISTIC ROUGH C-MEANS *ANURADHA J,

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

RV-Monitor: Efficient Parametric Runtime Verification with Simultaneous Properties

RV-Monitor: Efficient Parametric Runtime Verification with Simultaneous Properties RV-Montor: Effent Parametr Runtme Verfaton wth Smultaneous Propertes Qngzhou Luo 1, Y Zhang 1, Choonghwan Lee 1, Dongyun Jn 2, Patrk O Nel Meredth 1, Traan Florn Şerbănuţă 3, and Grgore Roşu 1 1 Unversty

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

The Simulation of Electromagnetic Suspension System Based on the Finite Element Analysis

The Simulation of Electromagnetic Suspension System Based on the Finite Element Analysis 308 JOURNAL OF COMPUTERS, VOL. 8, NO., FEBRUARY 03 The Smulaton of Suspenson System Based on the Fnte Element Analyss Zhengfeng Mng Shool of Eletron & Mahanal Engneerng, Xdan Unversty, X an, Chna Emal:

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Path Following Control of a Spherical Robot Rolling on an Inclined Plane

Path Following Control of a Spherical Robot Rolling on an Inclined Plane Sensors & ransduers, Vol., Speal Issue, May 3, pp. 4-47 Sensors & ransduers 3 by IFSA http://www.sensorsportal.om Path Followng Control of a Spheral Robot Rollng on an Inlned Plane ao Yu, Hanxu Sun, Qngxuan

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Pixel-Based Texture Classification of Tissues in Computed Tomography

Pixel-Based Texture Classification of Tissues in Computed Tomography Pxel-Based Texture Classfaton of Tssues n Computed Tomography Ruhaneewan Susomboon, Danela Stan Rau, Jaob Furst Intellgent ultmeda Proessng Laboratory Shool of Computer Sene, Teleommunatons, and Informaton

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Configurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment

Configurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2013; 25:1319 1339 Published online 12 October 2012 in Wiley Online Library (wileyonlinelibrary.com)..2934 SPECIAL

More information

Adaptive Class Preserving Representation for Image Classification

Adaptive Class Preserving Representation for Image Classification Adaptve Class Preservng Representaton for Image Classfaton Jan-Xun M,, Qankun Fu,, Wesheng L, Chongqng Key Laboratory of Computatonal Intellgene, Chongqng Unversty of Posts and eleommunatons, Chongqng,

More information

CS1100 Introduction to Programming

CS1100 Introduction to Programming Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact

More information

LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION

LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION IEEE-Internatonal Conferene on Reent Trends n Informaton Tehnology, ICRTIT 211 MIT, Anna Unversty, Chenna. June 3-5, 211 LOCAL BINARY PATTERNS AND ITS VARIANTS FOR FACE RECOGNITION K.Meena #1, Dr.A.Suruland

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

THE low-density parity-check (LDPC) code is getting

THE low-density parity-check (LDPC) code is getting Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space low-densty party-check (LDPC) codes

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

FULLY AUTOMATIC IMAGE-BASED REGISTRATION OF UNORGANIZED TLS DATA

FULLY AUTOMATIC IMAGE-BASED REGISTRATION OF UNORGANIZED TLS DATA FULLY AUTOMATIC IMAGE-BASED REGISTRATION OF UNORGANIZED TLS DATA Martn Wenmann, Bors Jutz Insttute of Photogrammetry and Remote Sensng, Karlsruhe Insttute of Tehnology (KIT) Kaserstr. 12, 76128 Karlsruhe,

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Color Texture Classification using Modified Local Binary Patterns based on Intensity and Color Information

Color Texture Classification using Modified Local Binary Patterns based on Intensity and Color Information Color Texture Classfaton usng Modfed Loal Bnary Patterns based on Intensty and Color Informaton Shvashankar S. Department of Computer Sene Karnatak Unversty, Dharwad-580003 Karnataka,Inda shvashankars@kud.a.n

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY Networking Laboratory

ABHELSINKI UNIVERSITY OF TECHNOLOGY Networking Laboratory ABHELSINKI UNIVERSITY OF TECHNOLOGY Networkng Laboratory Load Balanng n Cellular Networks Usng Frst Poly Iteraton Johan an Leeuwaarden Samul Aalto & Jorma Vrtamo Networkng Laboratory Helsnk Unersty of

More information

Pattern Classification: An Improvement Using Combination of VQ and PCA Based Techniques

Pattern Classification: An Improvement Using Combination of VQ and PCA Based Techniques Ameran Journal of Appled Senes (0): 445-455, 005 ISSN 546-939 005 Sene Publatons Pattern Classfaton: An Improvement Usng Combnaton of VQ and PCA Based Tehnques Alok Sharma, Kuldp K. Palwal and Godfrey

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

FUZZY SEGMENTATION IN IMAGE PROCESSING

FUZZY SEGMENTATION IN IMAGE PROCESSING FUZZY SEGMENTATION IN IMAGE PROESSING uevas J. Er,, Zaldívar N. Danel,, Roas Raúl Free Unverstät Berln, Insttut für Inforat Tausstr. 9, D-495 Berln, Gerany. Tel. 0049-030-8385485, Fax. 0049-030-8387509

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Analysis of ray stability and caustic formation in a layered moving fluid medium

Analysis of ray stability and caustic formation in a layered moving fluid medium Analyss of ray stablty and aust formaton n a layered movng flud medum Davd R. Bergman * Morrstown NJ Abstrat Caust formaton ours wthn a ray skeleton as optal or aoust felds propagate n a medum wth varable

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification

Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification Gabor-Flterng-Based Completed Loal Bnary Patterns for Land-Use Sene Classfaton Chen Chen 1, Lbng Zhou 2,*, Janzhong Guo 1,2, We L 3, Hongjun Su 4, Fangda Guo 5 1 Department of Eletral Engneerng, Unversty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

An Adaptive Filter Based on Wavelet Packet Decomposition in Motor Imagery Classification

An Adaptive Filter Based on Wavelet Packet Decomposition in Motor Imagery Classification An Adaptve Flter Based on Wavelet Paket Deomposton n Motor Imagery Classfaton J. Payat, R. Mt, T. Chusak, and N. Sugno Abstrat Bran-Computer Interfae (BCI) s a system that translates bran waves nto eletral

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Lecture 3: Computer Arithmetic: Multiplication and Division

Lecture 3: Computer Arithmetic: Multiplication and Division 8-447 Lecture 3: Computer Arthmetc: Multplcaton and Dvson James C. Hoe Dept of ECE, CMU January 26, 29 S 9 L3- Announcements: Handout survey due Lab partner?? Read P&H Ch 3 Read IEEE 754-985 Handouts:

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

Elsevier Editorial System(tm) for NeuroImage Manuscript Draft

Elsevier Editorial System(tm) for NeuroImage Manuscript Draft Elsever Edtoral System(tm) for NeuroImage Manusrpt Draft Manusrpt Number: Ttle: Comparson of ampltude normalzaton strateges on the auray and relablty of group ICA deompostons Artle Type: Tehnal Note Seton/Category:

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

ON THE USE OF THE SIFT TRANSFORM TO SELF-LOCATE AND POSITION EYE-IN-HAND MANIPULATORS USING VISUAL CONTROL

ON THE USE OF THE SIFT TRANSFORM TO SELF-LOCATE AND POSITION EYE-IN-HAND MANIPULATORS USING VISUAL CONTROL XVIII Congresso Braslero de Automáta / a 6-setembro-00, Bonto-MS ON THE USE OF THE SIFT TRANSFORM TO SELF-LOCATE AND POSITION EYE-IN-HAND MANIPULATORS USING VISUAL CONTROL ILANA NIGRI, RAUL Q. FEITOSA

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0 The stream cpher MICKEY-128 (verson 1 Algorthm specfcaton ssue 1. Steve Babbage Vodafone Group R&D, Newbury, UK steve.babbage@vodafone.com Matthew Dodd Independent consultant matthew@mdodd.net www.mdodd.net

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

A Flexible Solution for Modeling and Tracking Generic Dynamic 3D Environments*

A Flexible Solution for Modeling and Tracking Generic Dynamic 3D Environments* A Flexble Soluton for Modelng and Trang Gener Dynam 3D Envronments* Radu Danesu, Member, IEEE, and Sergu Nedevsh, Member, IEEE Abstrat The traff envronment s a dynam and omplex 3D sene, whh needs aurate

More information

Multiscale Heterogeneous Modeling with Surfacelets

Multiscale Heterogeneous Modeling with Surfacelets 759 Multsale Heterogeneous Modelng wth Surfaelets Yan Wang 1 and Davd W. Rosen 2 1 Georga Insttute of Tehnology, yan.wang@me.gateh.edu 2 Georga Insttute of Tehnology, davd.rosen@me.gateh.edu ABSTRACT Computatonal

More information