BIOINFORMATICS ORIGINAL PAPER

Size: px
Start display at page:

Download "BIOINFORMATICS ORIGINAL PAPER"

Transcription

1 BIOINFORMATICS ORIGINAL PAPER Vol. 21 no , pages do: /bonformatcs/bt402 Sequence analyss A boostng approach for motf modelng usng ChIP-chp data Pengyu Hong 1, X. Shrley Lu 2, Qng Zhou 1, Xn Lu 2, Jun S. Lu 1,2 and Wng H. Wong 1,2, 1 Department of Statstcs, Harvard Unversty, Cambrdge, MA 02138, USA and 2 Department of Bostatstcs, Harvard School of Publc Health, Boston, MA 02115, USA Receved on July 30, 2004; revsed on January 10, 2005; accepted on March 21, 2005 Advance Access publcaton Aprl 7, 2005 ABSTRACT Motvaton: Buldng an accurate bndng model for a transcrpton factor (TF) s essental to dfferentate ts true bndng targets from those spurous ones. Ths s an mportant step toward understandng gene regulaton. Results: Ths paper descrbes a boostng approach to modelng TF DNA bndng. Dfferent from the wdely used weght matrx model, whch predcts TF DNA bndng based on a lnear combnaton of poston-specfc contrbutons, our approach bulds a TF bndng classfer by combnng a set of weght matrx based classfers, thus yeldng a non-lnear bndng decson rule. The proposed approach was appled to the ChIP-chp data of Saccharomyces cerevsae. When compared wth the weght matrx method, our new approach showed sgnfcant mprovements on the specfcty n a majorty of cases. Contact: wwong@hsph.harvard.edu Supplementary nformaton: The software and the Supplementary data are avalable at hong2004/ MotfBooster/. 1 INTRODUCTION Wth the contnung explosve growth of sequenced genomes and genome-wde mrna expresson data, scentsts are ncreasngly nterested n modelng regulatory motfs and predctng bndng targets of transcrpton factors (TFs). In ths paper, we propose a dscrmnant approach that bulds models to dstngush postve sequences (.e. bndng targets of a TF) from negatve sequences (.e. non-targets of a TF). Several approaches for ths dscrmnant task have been proposed prevously. DMotfs apples an enumeratve search of the motf space and reports the best motf as a feature of the sequences that best dfferentates postve from negatve sequences (Snha, 2002). Vlo et al. (2000) used a bnomal formula for sgnfcance test to evaluate the occurrences of a motf n postve sequences aganst those n negatve sequences. Smlar to the approach of Vlo et al. (2000), the random selecton null hypothess approach n Barash et al. (2001) tests the sgnfcance of a motf aganst negatve sequences based on a hypergeometrc dstrbuton. Takusagawa and Gfford (2004) extended the works of Vlo et al. (2000) and Barash et al. (2001) to consder the effects of the lengths of sequences. The above approaches report motfs as consensus words, whch are arguably less senstve and precse than the correspondng weght matrx representatons (Stormo et al., 1982). To whom correspondence should be addressed. Snce the poneerng work of Stormo et al. (1982), the weght matrx model has become one of the most wdely used models for representng motfs. A popular approach to estmatng the parameters of a weght matrx de novo s to fnd a statstcally enrched motf n postve sequences wth respect to a background model (Stormo and Hartzell, 1989; Lawrence and Relly, 1990; Lawrence et al., 1993; Lu et al., 1995; Barash et al., 2001). The background model, whch usually s defned as an n-th order Markov model (n = 0, 1, 2 or 3), tres to capture all nformaton n the non-bndng stes that are much more heterogeneous than the bndng stes. Such a background model s so general that the weght matrx model tends to have very low specfcty. To better dentfy the non-bndng stes that are very smlar to the bndng stes, Workman and Stormo (2000) proposed a dscrmnant method called ANN-Spec, whch uses a Perceptron model and Gbbs samplng to tran the weght matrx. They showed that the weght matrx models output by ANN-Spec have hgher specfcty than those bult by non-dscrmnant approaches, such as MEME (Baley and Elkan, 1994). A motf reported as a weght matrx assumes that dfferent postons of the motf are ndependent. Under ths assumpton, a weght matrx s essentally a lnear classfer when used wth a cutoff value to predct bndng stes n sequences. Recent bologcal studes have demonstrated that ndvdual postons of bndng stes are not always ndependent (Bulyk et al., 2001, 2002; Man and Stormo, 2001), and suggested that some TFs recognze ther targets n a non-lnear fashon. Barash et al. (2003), adopted Bayesan networks to model dependences n bndng motfs as trees and mxtures of trees. The Bayesan tree model s smlar to the one used n an early work by Agarwal and Bafna (1998) to model the dependency between bases. It s recently reported (Zhou and Lu, 2004) that a smpler parcorrelaton model can largely account for all observed correlatons among motf postons and usng such a model n conjuncton wth the Gbbs samplng method suffers no overfttng problem. However, such a model stll cannot accommodate some non-lnear factors n dscrmnatng postve and negatve sequences. It s wdely accepted that a TF partcpates n controllng the mrna levels of ts target genes through ts bndng stes n the correspondng promoter regons. Hence, the REDUCE method (Bussemaker et al., 2001) and Motf Regressor (Conlon et al., 2003) were proposed to dscover motfs by assocatng motf abundances wth real-valued changes n genome-wde expresson data. The REDUCE method enumerates all K-mers (DNA segments of length K) and checks whether the combnatoral effects of a set of K-mers can be used to explan changes of gene-expresson data n a regresson manner The Author Publshed by Oxford Unversty Press. All rghts reserved. For Permssons, please emal: journals.permssons@oupjournals.org

2 Motf modelng usng ChIP-chp data Motf Regressor frst uses MDSCAN (Lu et al., 2002) to generate a large set of matrx-based motf canddates that are enrched n the promoter regons of genes wth the hghest fold changes n gene expresson data. Then t uses regresson analyses to select motf canddates that are most relevant to the change of gene expressons. Nevertheless, nether approach explots the potental of usng negatve sequences to change the parameters of a motf so as to ncrease the specfcty of the model. We propose a novel dscrmnant approach to enhance TF DNA bndng models usng the boostng technque. Frst, we use the ChIP-chp data to select postve and negatve sequences. In ChIPchp experments, DNA s crosslnked n vvo to protens at stes of DNA proten nteracton and sheared to 500 bp 2 kb fragments. The DNA proten complexes are precptated by antbodes specfc to the TF of nterest. The precptated proten-bound DNA fragments are PCR amplfed, fluorescently labeled and hybrdzed to mcroarrays contanng every promoter (sometmes also every ORF) n the genome. DNA fragments that are consstently enrched by ChIP-chp over repeated experments are dentfed as postve sequences contanng the proten DNA nteractng loc at 1 kb resoluton. When compared wth the gene-expresson data, the ChIP-chp data provde much more accurate nformaton about the genome-wde locaton of n vvo TF DNA nteractons, whch enables us to assgn defntve class labels to some promoter sequences wth hgh confdence. Consequently, we can model the TF DNA bndng problem as a classfcaton problem. We modfy the confdence-rated boostng (CRB) algorthm (Schapre and Snger, 1999) to tran a TF DNA bndng classfer as an ensemble model, whch s a weghted combnaton of a set of base classfers. The modfed CRB algorthm automatcally decdes the number of base classfers to be used so as to avod overfttng. A key aspect of the boostng technque s that t forces some of the base classfers to focus on the boundary between postve and negatve samples, thus effectvely reducng classfcaton errors. We demonstrate the power of ths approach by ts performance on the ChIP-chp data of Saccharomyces cerevsae (Lee et al., 2002). 2 METHODS 2.1 The ensemble model We defne a TF DNA bndng model as a weghted combnaton of a set of base classfers {q m ( )}: Q(S ) = α m q m (S ), (1) m where α m s the weght of q m ( ). The model weghts can be normalzed so that they sum up to 1. The class label of a DNA sequence S s decded by sgn(q(s )), wth +1 denotng that S s a postve sequence. The base classfer has ts root n the weght matrx method (Stormo et al., 1982). Let f m ( ) be the weght matrx model on whch q m ( ) s based. And let the set {s j } represent all K-mers n a DNA sequence S. The score of a K-mer s j, gven f m ( ) s: f m (s j ) = K w m I (s j ) t, (2) k=1 b {A,C,G,T} where (1) w m s the parameter (n the logarthm scale) of the model f m( ) for the nucleotde b at poston k; (2) I (s j ) = 1fthek-th base of s j s b and I (s j ) = 0, otherwse; (3) t s a threshold decded by some crtera (e.g. P -value). The hgher the score, the more lkely a ste wll be bound by the TF. The weght matrx model decdes s j as a target of the TF f f m (s j )>0 and a non-target ste, otherwse. We wll show later that the threshold can be embedded nto the parameter matrx [w m ]. In many stuatons (e.g. ChIP-chp experments), we only have nformaton about whether a DNA sequence s bound by a TF, but do not know whch stes n the sequence the TF bnds to. Hence, gven a weght matrx, we need to derve a scorng functon to assess the lkelhood of a DNA sequence as a target of a TF. Ths score should be affected by: (1) the number of matchng stes n the sequence; and (2) the degree of the match for each matchng ste. The followng functon takes nto account of the above factors and scores a sequence as: h m (S ) = log e fm(s r), (3) (r) where the sum s over the r best matchng K-mers. Ths equaton s smlar to that proposed by Motf Regressor (Conlon et al., 2003). However, we lmt t to the best r stes to avod favorng very long sequences. Detals for decdng the value of r are explaned n Secton 3.2. The base classfer q m ( ) transforms the score of a sequence wth a hyperbolc tangent functon to a soft class predcton: q m (S ) = 1 e hm(s ) 1 + e hm(s ) = (r) efm(sr) 1 (r) efm(sr) + 1. (4) The hyperbolc tangent functon s a scaled and based logstc functon, whch has been used for motf ste predctons (Barash et al., 2001; Segal et al., 2002). 2.2 Learn the ensemble model va boostng We adopt the CRB algorthm (Schapre and Snger, 1999) to perform the followng tasks n buldng an ensemble model Q( ): (1) decdng the number of lnear classfers q m ( ) n Q( ) and (2) learnng the parameters of each q m ( ) and ts weght α m. Loosely speakng, n the frst round, the CRB algorthm assgns equal weghts to all samples and trans the frst base classfer. In each of the rounds that follow, the boostng procedure gves hgher weghts to prevously msclassfed samples and learns a new base classfer wth ts weght usng the reweghted samples. The fnal classfer s a lnear assembly of weghted base classfers from each round. We made some modfcatons to the CRB algorthm to serve our purpose better. The modfed CRB algorthm s outlned as Fgure 1. Our frst change tres to accommodate the unbalanced tranng set (the number of negatve samples s much larger than that of postve ones) by assgnng larger ntal weghts to the postve samples. Second, to prevent overfttng, we reserve some tranng sequences for nternal test durng tranng. The detals of our mplementatons are explaned n the next secton. 3 IMPLEMENTATION 3.1 Intalze the weghts of sequences In our study, the number of negatve sequences (usually n thousands) s often much larger than the postve ones (usually <100). Wthout proper adjustments, negatve sequences would overwhelm a classfer and reduce ts capablty of recognzng postve sequences. As a remedy, we constran the total weght of the postve sequences to be equal to that of the negatve sequences (step b n Fg. 1). The sequences wthn each class have equal weghts. Ths n effect mposes a hgher penalty for msclassfyng a postve sequence than msclassfyng a negatve one. Note that ths heurstcs s not equvalent to ncreasng the number of postve observatons. 3.2 Learn base classfers The CRB algorthm (Schapre and Snger, 1999) s a Newton-lke algorthm that constructs an ensemble model to mnmze the upper bound on msclassfcaton error Err = d (1) exp( y Q(S )), (5) 2637

3 P.Hong et al. (a) Randomly reserve part of the tranng data for nternal test. The remanng n tranng sequences and ther class labels are denoted as (S 1, y 1 ),...,(S n, y n ); y { 1, 1}. (b) Intalze the weghts of sequences d (1) ( = 1,..., n). (c) For m = 1,..., M (c.1) Tran the parameters of q m ( ) and ts weght α m usng the weghted sequences wth the weghts {d (m) }. (c.2) Update sequence weghts: d (m+1) = d(m) exp( α my q m(s )) j d(m) j exp( α my j q m(s j )) (c.3) Use the reserved data to check f the overall model overfts the tranng data. Roll back (m = m 1) and stop f t overfts. (d) Output the fnal model Q( )= m α mq m ( ). Fg. 1. The modfed boostng algorthm. where d (1) s the ntal weght of S and y s the class label of S. Fredman et al. (2000) have detaled a dscussons on the ratonale of choosng the above crteron. In the m-th round, the CRB algorthm trans q m ( ) and ts weght α m to mnmze the weghted error: ε m = d (m) exp( α m y q m (S )), (6) where d (m) s the weght of S n the m-th round. In our case, the parameters to be estmated n each round nclude α m, r and [ w m ]. Bascally, at step c.1 n Fgure 1, we ncrease r from 1 to R (currently [ R = 5) ] by the step sze 1. For each value of r, the parameters α m and w m are ntalzed and refned to mnmze the weghted error. Fnally, the m-th round reports the values of r, α m and [ w m ], whch correspond to the mnmum weghted error Intalzaton Snce the motf must be an enrched pattern n the postve sequences, we take advantage of Motf Regressor (Conlon et al., 2003) to generate a good seed weght matrx for ntalzng [ w m ]. The seed weght matrx, reported by Motf Regressor, has the best correlaton between the logarthm of ChIPchp [ P ] -value and motf-matchng score of all tranng sequences. Let w 0 be the seed weght matrx. Gven a value of r, we ntalze α m and w m as α m(0) = 1 and w m (0) = w0 + (σ t/k), respectvely, where σ s randomly generated n the range [ 0.2, 0.2] and t s the threshold as n Equaton (2). The value of t s determned as the followng. We frst use the matrx [ w 0 + σ ] to score all stes n the tranng sequences and obtan the mnmum and maxmum ste scores as t mn and t max. Then, we ncrease t from t mn to t max by the step sze 0.1 and select the value that corresponds to the mnmum weghted error under the current values of r and α m Refnement The parameters [ w m ] and α m are teratvely refned by a gradent-lke method. In the n-th teraton (n 1), use [ w m (n 1)] to fnd the best r stes n each sequence as ts representatve stes, and update [ w m (n)] and α m(n) based on the correspondng gradents of the weghted error,.e.: w m (n) = wm (n 1) η 1 (1 + n/10) ε m(n 1) w m (n 1) η 2 α m (n) = α m (n 1) (1 + n/10) ε (7) m(n 1) α m (n 1), where the update rates are set as η 1 = 0.05 and η 2 = 0.1 based on our experence. The teraton stops f (1) the weghted error ncreases, (2) the mprovement of error s < or (3) the maxmum number of teratons (currently 100) s reached. Note that a ste s j s now b {A,C,G,T} wm (n)i (s j ), whch s slghtly df- scored as K k=1 ferent from Equaton (2). The threshold t n Equaton (2) s absorbed by [ w m (n)] and s updated mplctly. 3.3 Prevent overfttng A man challenge wth the small number of postve samples s that one can easly overtran the classfers. Our strategy to allevate ths effect s to reserve a subset of the negatve tranng sequences (5% n our current settng) and one postve tranng sequence for nternal valdaton durng tranng. The sequences are randomly selected. The weght of each reserved sequence s set as the ntal weght of a tranng sequence wth the same class label. Overfttng s checked usng the reserved data at step c.3 n Fgure 1. The boostng procedure wll stop, f addng one more base classfer ncreases the error [as defned n Equaton (5)] for the reserved sequence set. Sometmes, the ensemble model may have only one base classfer, say q 1 ( ). We buld a base classfer q υ ( ) wth ts parameters as r υ and [ w 0 t υ /K ], where r υ and t υ are decded by the ntalzaton method (wthout σ ) descrbed n Secton 3.2. The weght of q υ ( ) s set as 1. We compare q υ ( ) wth q 1 ( ) and choose the one wth a smaller weghted error as defned n Equaton (5). The ratonale for ths step s that the current way for tranng base classfers may not fnd the best one. Ths lmtaton can be amended by a weghted combnaton of multple base classfers. If the fnal model has only one base classfer, q υ ( ) could be a better alternatve. 4 RESULTS 4.1 Data We used the ChIP-chp data reported n Lee et al. (2002). Postve sequences are selected usng ChIP-chp P -value as the cutoff. At ths cutoff selecton, the false postve rate s 6 10% and the false negatve rate s 33% (Lee et al., 2002). Although the data are stll nosy, they are the best genome-wde data of n vvo TF DNA bndng localzaton so far. To avod havng too few postve samples, we also requred that each selected TF should have at least 25 postve sequences. Forty TFs (Lee et al., 2002) satsfy these crtera. Negatve sequences were selected as those wth ChIP-chp 2638

4 Motf modelng usng ChIP-chp data Table 1. Data summary and cross-valdaton results for 31 ChIP-chp data TF Pos seq (no.) Neg seq (no.) Base classfers (no.) Average FP of weght matrx Average FP of boostng Improvement of boostng over weght matrx(%) ABF ACE BAS CAD CBF CIN DAL FHL FKH FKH GCN HAP HSF MBP MCM NRG PDR PHD RAP REB RLM SKN SMP STE SUM SWI SWI SWI YAP YAP YAP Columns 1, TF names; 2, number of postve sequences; 3, number of negatve sequences; 4, number of base classfers n the boosted classfer; 5, number of false postves FP w usng the weght matrx reported by Motf Regressor as a classfer; 6, number of false postves FP b of the boostng method; 7, percentage of mprovement of the boostng method over the weght matrx method, measured as (FP w FP b )/FP w. rato 1 and ChIP-chp P -value Each selected TF has 3000 negatve sequences. For each gene, we take ts upstream sequence, up to 800 bp, not overlappng wth the prevous gene. 4.2 Boostng mproves the specfcty of motf models To evaluate our method, we used the followng cross-valdaton procedure. In each run, we leave one postve sequence and 5% of randomly selected negatve sequences as the test data and tran a classfer on the remanng data. Ths procedure s repeated 10 tmes for each postve sequence. The cross-valdaton error of each run s calculated as the number of false postves f the number of the false negatves s zero. The results are then averaged for all runs and compared. The detaled data, whch nclude the sequence data, the ensemble models of the TFs, the logos of the ensemble models and all the test results, are avalable as the Supplementary data at hong2004/motfbooster/. We used Motf Regressor (Conlon et al., 2003) to fnd the seed weght matrx. For each TF, Motf Regressor called MDSCAN (Lu et al., 2002) to fnd canddate motfs of wdth 6 17 bases. At each wdth, MDSCAN reported the best 20 weght matrces enrched n the postve tranng sequences. Each weght matrx was used to score the tranng sequences. Motf Regressor then performed smple lnear regresson between the logarthm of ChIP-chp P -values and sequence scores. We chose the motf correspondng to the best regresson P -value as our seed motf. We observed that Motf Regressor dd not fnd sgnfcant enough motfs for nne TFs (DIG1, GAL4, GAT3, GCR2, IME4, IXR1, NND1, PHO4 and ROX1). It s possble that under the asynchronzed growth condton, these TFs were not actvated, or the modfed tagged TFs have changed ther bndng characterstcs. Table 1 summarzes the results for the remanng 31 TFs. Compared wth the weght matrx reported by Motf Regressor, the ensemble models performed markedly better n 27 cases and evenly n 4 cases (FKH1, FKH2, RLM1 and YAP6). A closer examnaton on the four even cases reveals that each ensemble model only has one base classfer that s a drect converson from the ntal weght matrx. The boostng approach also reported fnal models wth sngle base classfer n 5 of 27 cases that performed better. These fve TFs are CIN5, MBP1, NRG1, SKN7 and STE12. Snce the base classfer s equvalent to a weght matrx model, these results ndcate that 2639

5 P.Hong et al. Table 2. Contrbutons of the base classfers (BCs) n the leave-one-out cross valdaton tests TF BC no. Average FP of WM Average FP of BC 1 Average FP of BC (1 + 2) Average FP of BC Average FP of BC ( ) ( ) ABF ACE BAS CAD CBF DAL FHL GCN HAP HSF MCM PDR PHD RAP REB SMP SUM SWI SWI SWI YAP YAP Columns 1 7 are the TF names, number of BCs n the ensemble model, number of false postves of the weght matrx method and number of false postves of the ensemble model when ts frst 1, 2, 3 and 4 BCs are used, respectvely. We order the base classfers n each ensemble model so that ther weghts are n the descendng order. usng negatve nformaton can help dscover better weght matrces n many cases. Ths s consstent wth the fndngs of Workman and Stormo (2000). However, the frst base classfer does not always perform better than the ntal weght matrx. Table 2 summarzes the contrbutons of the base classfers for the cases where the boostng method selected more than one base classfer. The base classfers n the fnal models are arranged n the descendng order of ther weghts. The performances of 13 frst base classfers,.e. the ones wth the largest weghts, are worse than those of the weght matrces reported by Motf Regressor. Ths may suggest that when the bndng stes of a TF are heterogeneous and maybe grouped nto clusters, our boostng method fnds base classfers correspondng to dfferent cluster profles, whereas Motf Regressor reports an average profle. Thus, a sngle base classfer may be too specfc to a partcular cluster and does not dscrmnate well globally. 5 DISCUSSION For some cases, the ensemble model can reveal dependences among motf postons. For example, Fgure 2a dsplays the weght matrx found by Motf Regressor for RAP1, from whch we can see that C and T domnate n poston 5, and A and G domnate n poston 8. But there s no further nformaton on how these two postons mght correlate wth each other. In contrast, our boostng approach selected three base classfers (Fg. 2b d) to compose the fnal model. Two base classfers favored C and A n postons 5 and 8, respectvely, whereas the thrd one preferred T and G n those postons, respectvely. Ths observaton mples that postons 5 and 8 may cooperate n a certan way such that the change n one poston correlates wth the change n the other. As another example, we observe that postons 1, 10 and 13 of REB1 motf (Fg. 3) can be decomposed n a smlar way. In ts frst base classfer, poston 13 strongly prefers G; postons 1 and 10 are ambvalent about G and C, respectvely. In the second base classfer, however, poston 13 strongly dsfavors G, and postons 1 and 10 strongly favor G and C, respectvely. Ths suggests that the three postons may cooperate to facltate the proten DNA bndng. The boostng approach termnates wth an ensemble of 2 3 base classfers for most cases. Ths s atypcal for applcatons usng the boostng technque that usually can boost for hundreds to thousands of base classfers. The small number of base classfers could be due to three reasons. The frst reason mght be the unbalanced tranng data ( 100 postve versus 3000 negatve sequences). We examned the senstvty and specfcty of each base classfer alone usng the tranng samples (Fg. 4a). The senstvty of base classfers spreads out n the range of 40 90%, whle ther specfcty concentrates n the range of 75 95%. Ths suggests that t s easer to tran base classfers to recognze negatve samples n our case although the negatve samples are more heterogeneous than the postve ones. We modfy the boostng algorthm by addng more ntal weghts to the postve samples such that the ntal total weghts of two classes are equal. We note that although ths method helps to brng out a less based classfer, t s not equvalent to ncreasng the number of postve observatons. As shown n Fgure 4b, base classfers wth hgher senstvty tend to have lower generalzaton errors. A smlar trend can be observed for the specfcty of base classfers n Fgure 4c. Fgure 5a shows that t s more 2640

6 Motf modelng usng ChIP-chp data Fg. 2. Logos of the bndng models of RAP1. (a) Poston specfc probablty matrx. Logo of the weght matrx reported by Motf Regressor, drawn usng the method of (Schneder and Stephens, 1990). (b), (c) and (d): Logos of the base classfers 1, 2 and 3, respectvely n the ensemble model reported by the boostng approach (weght of base classfer 1 = 0.31; weght of base classfer 2 = 0.30; weght of base classfer 3 = 0.39). Base classfers have negatve parameters and cannot be vsualzed n the same way. (b), (c) and (d) are drawn n the followng way. The heght of a letter corresponds to the absolute magntude of ts weght scaled by a factor k (For vsualzaton purpose, k = 3 for postve weghts and k = 1 for negatve weghts.) Letters are ordered by ther weghts. The black horzontal lne represents zero. Letters above the zero lne have postve weghts, and those below the zero lne have negatve weghts. Fg. 3. Logos of the ensemble model of REB1. (a) The logo of base classfer 1 (Weght = 0.52). (b) The logo of base classfer 2 (Weght = 0.47). lkely to tran base classfers wth relatvely low tranng senstvty and specfcty when the sze of postve sequences s small. Moreover, base classfers traned wth less postve samples are more lkely to have hgher generalzaton errors (Fg. 5b). Based on the above analyses, we reason that (1) base classfers hardly overft the tranng data n most cases and (2) the small sze of postve samples does not provde enough nformaton to boost for more base classfers. Second, the bndng mechansms of some TFs may ndeed be almost lnearly dependent of nucleotde types of the motf postons. For example, ABF1 has a much larger postve sample sze (176) when compared wth other TFs. Both the weght matrx and the ensemble model of ABF1 have low and comparable generalzaton errors (Table 1). The ensemble model has two base classfers. The tranng senstvty/specfcty of the base classfers are 93.18/94.66% and 90.34/95.58%. These results suggest that the bndng mechansm of ABF1 may have lttle non-lnearty because ts samples can be well classfed by lnear decson rules ncludng the weght matrx and the base classfers. The base classfer becomes a strong learner (.e. t can explan most of the tranng data) n such a case. On the other hand, the mld performances of many other base classfers suggest that the bndng mechansms of some other TFs could have relatvely hgh non-lnearty. Fnally, our approach ntalzes a base classfer usng a seed matrx. The successve refnng step may only explore a lmted subspace around the seed matrx. The tranng of base classfers can be mproved by a samplng-based de novo motf fndng algorthm that s capable of explorng a wder range of the soluton space (e.g. by samplng at multple temperature levels). Or we can replace the base learner wth a smpler one, e.g. a smple decson tree that uses rules lke whether a poston should be C or not, etc. Wth the above modfcatons, the ensemble model could have more base classfer and capture more comprehensve features that lead to better classfcaton performance. Nonetheless, the resultant base classfers could be very dverse. Some base classfers could represent hghly degenerated motfs. One potental drawback of ths alternatve s the loss of bologcal nterpretablty of the ensemble model. Although t s stll not perfectly understood why the number of base classfers s small, our approach provdes a good balance between the nterpretablty and the performances of the boosted models. Another choce for mprovng the boosted models s to tran each base classfer only by a randomly selected subset of the full tranng set as suggested 2641

7 P.Hong et al. (a) (b) (c) Fg. 4. (a) The tranng senstvty (horzontal axs) specfcty (vertcal axs) plot of the base classfers. Star, crcle, damond and pentagram denote the senstvty/specfcty of the base classfers, 1, 2, 3 and 4 respectvely. (b) The cross-valdaton false postve rate (FPR) tranng senstvty (horzontal axs) plot of the base classfer 1. (c) The cross-valdaton FPR tranng specfcty (horzontal axs) plot of the base classfer 1. (a) (b) Fg. 5. The result plots of the frst base classfers. (a) Tranng senstvty (star) and specfcty (crcle) number of postve sequences (horzontal axs). (b) Cross-valdaton FPR number of postve sequences (horzontal axs). by Fredman (2002). It was reported that such knd of randomness has advantages n the stuatons of small samples and powerful weak learners. 6 CONCLUSION We ntroduce a boostng-based method for modelng TF DNA bndng. By repeatedly fttng weght matrx based classfers to weghted samples that focus on erroneous classfcatons, the boostng approach can buld a more accurate TF DNA bndng model as a weghted combnaton of the base classfers. The proposed approach was appled to the ChIP-chp data of S.cerevsae and showed sgnfcant mprovements on specfcty n many cases. Lke many recent studes that use mrna mcroarray data to help refne regulatory bndng motfs and nfer combnatoral rules of transcrpton regulaton (W. Wang et al., submtted for publcaton; Beer and 2642

8 Motf modelng usng ChIP-chp data Tavazoe, 2004), we found that ChIP-chp data can be used to further refne motf models and reveal novel features of TF DNA nteractons. Currently, we use Motf Regressor to generate the seed motf for boostng. However, our algorthm s not lmted to workng wth Motf Regressor and can be used to boost weght matrces reported by any motf fndng algorthm. ACKNOWLEDGEMENTS The work of W.H.W. s supported by NIH-HG The work of J.S.L. s supported by NIH-P20-CA96470 and NSF DMS The work of P.H. s supported by NIH-GM We thank the anonymous revewers for constructve suggestons that helped us to unfy the way to ntalze and tran base classfers and nspred us to thnk hard on the overfttng ssue of the ensemble models. REFERENCES Agarwal,P.K. and Bafna,V. (1998) Detectng non-adjonng correlatons wth sgnals n DNA. In Proceedngs of the Second Annual Internatonal Conference on Research n Computatonal Molecular Bology, March 22 25, 1998, New York, USA. ACM Press, pp Baley,T.L. and Elkan,C. (1994) Fttng a mxture model by expectaton maxmzaton to dscover motfs n bopolymers. Proc. Int. Conf. Intell. Syst. Mol. Bol., 2, Barash,Y. et al. (2001) A smple hyper-geometrc approach for dscoverng putatve transcrpton factor bndng stes. In Algorthms n Bonformatcs: Proceedngs of the 1st Internatonal Workshop, LNCS 2149, pp Barash,Y. et al. (2003) Modelng dependences n proten DNA bndng stes. In Prooceedngs of the 7th Annual Internatonal Conference on Computatonal Molecular Bology (RECOMB 2003), Berln, Germany, ACM Press, NY, pp Beer,M.A. and Tavazoe,S. (2004) Predctng gene expresson from sequence. Cell, 117, Bulyk,M.L. et al. (2001) Explorng the DNA-bndng specfctes of znc fngers wth DNA mcroarrays. Proc. Natl Acad. Sc. USA, 98, Bulyk,M.L. et al. (2002) Nucleotdes of transcrpton factor bndng stes exert nterdependent effects on the bndng affntes of transcrpton factors. Nuclec Acds Res., 30, Bussemaker,H.J. et al. (2001) Regulatory element detecton usng correlaton wth expresson. Nat. Genet., 27, Conlon,E.M. et al. (2003) Integratng regulatory motf dscovery and genomewde expresson analyss. Proc. Natl Acad. Sc. USA, 100, Fredman,J.H. (2002) Stochastc gradent boostng. Comput. Stat. Data Anal., 38, Fredman,J.H. et al. (2000) Addtve logstc regresson: a statstcal vew of boostng (Wth dscusson and a rejonder by the authors). Ann. Statst., 28, Lawrence,C.E. et al. (1993) Detectng subtle sequence sgnals: a Gbbs samplng strategy for multple algnment. Scence, 262, Lawrence,C.E. and Relly,A.A. (1990) An expectaton maxmzaton (EM) algorthm for the dentfcaton and characterzaton of common stes n unalgned bopolymer sequences. Protens, 7, Lee,T.I. et al. (2002) Transcrptonal regulatory networks n Saccharomyces cerevsae. Scence, 298, Lu,J.S. et al. (1995) Bayesan models for multple local sequence algnment and Gbbs samplng strateges. J. Am. Stat. Assoc., 90, Lu,X.S. et al. (2002) An algorthm for fndng proten DNA bndng stes wth applcatons to chromatn mmunoprecptaton mcroarray experments. Nat. Botechnol., 20, Man,T.K. and Stormo,G.D. (2001) Non-ndependence of Mnt repressor operator nteracton determned by a new quanttatve multple fluorescence relatve affnty (QuMFRA) assay. Nuclec Acds Res., 29, Schapre,R. and Snger,Y. (1999) Improved boostng algorthms usng confdence-rated predctons. Machne Learnng, 37, Schneder,T.D. and Stephens,R.M. (1990) Sequence logos: a new way to dsplay consensus sequences. Nuclec Acds Res., 18, Segal,E. et al. (2002) From promoter sequence to expresson: A probablstc framework. In Proceedngs of the 6th Internatonal Conference on Research n Computatonal Molecular Bology (RECOMB 02), Washngton, DC, ACM Press, pp Snha,S. (2002) Dscrmnatve motfs. In Proceedngs of the 6th Internatonal Conference on Research n Computatonal Molecular Bology (RECOMB 02), Washngton, DC, ACM Press, pp Stormo,G.D. and Hartzell,G.W.III (1989) Identfyng proten-bndng stes from unalgned DNA fragments. Proc. Natl Acad. Sc. USA, 86, Stormo,G.D. et al. (1982) Use of the Perceptron algorthm to dstngush translatonal ntaton stes n E.col. Nuclec Acds Res., 10, Takusagawa,K. and Gfford,D. (2004) Negatve nformaton for motf dscovery. Pac. Symp. Bocomput., Vlo,J. et al. (2000) Mnng for putatve regulatory elements n the yeast genome usng gene expresson data. Proc. Int. Conf. Intell. Syst. Mol. Bol., 8, Workman,C.T. and G.D. Stormo (2000) ANN-Spec: a method for dscoverng transcrpton factor bndng stes wth mproved specfcty. Pac. Symp. Bocomput., Zhou,Q. and Lu,J. (2004) Modelng wthn-motf dependence for transcrpton factor bndng ste predctons. Bonformatcs, 20,

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Predicting Transcription Factor Binding Sites with an Ensemble of Hidden Markov Models

Predicting Transcription Factor Binding Sites with an Ensemble of Hidden Markov Models Vol. 3, No. 1, Fall, 2016, pp. 1-10 ISSN 2158-835X (prnt), 2158-8368 (onlne), All Rghts Reserved Predctng Transcrpton Factor Bndng Stes wth an Ensemble of Hdden Markov Models Yngle Song 1 and Albert Y.

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Context-Specific Bayesian Clustering for Gene Expression Data

Context-Specific Bayesian Clustering for Gene Expression Data Context-Specfc Bayesan Clusterng for Gene Expresson Data Yoseph Barash School of Computer Scence & Engneerng Hebrew Unversty, Jerusalem, 91904, Israel hoan@cs.huj.ac.l Nr Fredman School of Computer Scence

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros. Fttng & Matchng Lecture 4 Prof. Bregler Sldes from: S. Lazebnk, S. Setz, M. Pollefeys, A. Effros. How do we buld panorama? We need to match (algn) mages Matchng wth Features Detect feature ponts n both

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Learning Ensemble of Local PDM-based Regressions. Yen Le Computational Biomedicine Lab Advisor: Prof. Ioannis A. Kakadiaris

Learning Ensemble of Local PDM-based Regressions. Yen Le Computational Biomedicine Lab Advisor: Prof. Ioannis A. Kakadiaris Learnng Ensemble of Local PDM-based Regressons Yen Le Computatonal Bomedcne Lab Advsor: Prof. Ioanns A. Kakadars 1 Problem statement Fttng a statstcal shape model (PDM) for mage segmentaton Callosum segmentaton

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Journal of Process Control

Journal of Process Control Journal of Process Control (0) 738 750 Contents lsts avalable at ScVerse ScenceDrect Journal of Process Control j ourna l ho me pag e: wwwelsevercom/locate/jprocont Decentralzed fault detecton and dagnoss

More information

Learning-based License Plate Detection on Edge Features

Learning-based License Plate Detection on Edge Features Learnng-based Lcense Plate Detecton on Edge Features Wng Teng Ho, Woo Hen Yap, Yong Haur Tay Computer Vson and Intellgent Systems (CVIS) Group Unverst Tunku Abdul Rahman, Malaysa wngteng_h@yahoo.com, woohen@yahoo.com,

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information

Detection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature

Detection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature Detecton of hand graspng an object from complex background based on machne learnng co-occurrence of local mage feature Shnya Moroka, Yasuhro Hramoto, Nobutaka Shmada, Tadash Matsuo, Yoshak Shra Rtsumekan

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

Adaptive Regression in SAS/IML

Adaptive Regression in SAS/IML Adaptve Regresson n SAS/IML Davd Katz, Davd Katz Consultng, Ashland, Oregon ABSTRACT Adaptve Regresson algorthms allow the data to select the form of a model n addton to estmatng the parameters. Fredman

More information

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions Applcaton of Maxmum Entropy Markov Models on the Proten Secondary Structure Predctons Yohan Km Department of Chemstry and Bochemstry Unversty of Calforna, San Dego La Jolla, CA 92093 ykm@ucsd.edu Abstract

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH Boosting for Multi-Graph Classification

430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH Boosting for Multi-Graph Classification 430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Boostng for Mult-Graph Classfcaton Ja Wu, Student Member, IEEE, Shru Pan, Xngquan Zhu, Senor Member, IEEE, and Zhhua Ca Abstract In ths

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Feature Selection for Target Detection in SAR Images

Feature Selection for Target Detection in SAR Images Feature Selecton for Detecton n SAR Images Br Bhanu, Yngqang Ln and Shqn Wang Center for Research n Intellgent Systems Unversty of Calforna, Rversde, CA 95, USA Abstract A genetc algorthm (GA) approach

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information