BIOINFORMATICS ORIGINAL PAPER
|
|
- Karin McCoy
- 5 years ago
- Views:
Transcription
1 BIOINFORMATICS ORIGINAL PAPER Vol. 21 no , pages do: /bonformatcs/bt402 Sequence analyss A boostng approach for motf modelng usng ChIP-chp data Pengyu Hong 1, X. Shrley Lu 2, Qng Zhou 1, Xn Lu 2, Jun S. Lu 1,2 and Wng H. Wong 1,2, 1 Department of Statstcs, Harvard Unversty, Cambrdge, MA 02138, USA and 2 Department of Bostatstcs, Harvard School of Publc Health, Boston, MA 02115, USA Receved on July 30, 2004; revsed on January 10, 2005; accepted on March 21, 2005 Advance Access publcaton Aprl 7, 2005 ABSTRACT Motvaton: Buldng an accurate bndng model for a transcrpton factor (TF) s essental to dfferentate ts true bndng targets from those spurous ones. Ths s an mportant step toward understandng gene regulaton. Results: Ths paper descrbes a boostng approach to modelng TF DNA bndng. Dfferent from the wdely used weght matrx model, whch predcts TF DNA bndng based on a lnear combnaton of poston-specfc contrbutons, our approach bulds a TF bndng classfer by combnng a set of weght matrx based classfers, thus yeldng a non-lnear bndng decson rule. The proposed approach was appled to the ChIP-chp data of Saccharomyces cerevsae. When compared wth the weght matrx method, our new approach showed sgnfcant mprovements on the specfcty n a majorty of cases. Contact: wwong@hsph.harvard.edu Supplementary nformaton: The software and the Supplementary data are avalable at hong2004/ MotfBooster/. 1 INTRODUCTION Wth the contnung explosve growth of sequenced genomes and genome-wde mrna expresson data, scentsts are ncreasngly nterested n modelng regulatory motfs and predctng bndng targets of transcrpton factors (TFs). In ths paper, we propose a dscrmnant approach that bulds models to dstngush postve sequences (.e. bndng targets of a TF) from negatve sequences (.e. non-targets of a TF). Several approaches for ths dscrmnant task have been proposed prevously. DMotfs apples an enumeratve search of the motf space and reports the best motf as a feature of the sequences that best dfferentates postve from negatve sequences (Snha, 2002). Vlo et al. (2000) used a bnomal formula for sgnfcance test to evaluate the occurrences of a motf n postve sequences aganst those n negatve sequences. Smlar to the approach of Vlo et al. (2000), the random selecton null hypothess approach n Barash et al. (2001) tests the sgnfcance of a motf aganst negatve sequences based on a hypergeometrc dstrbuton. Takusagawa and Gfford (2004) extended the works of Vlo et al. (2000) and Barash et al. (2001) to consder the effects of the lengths of sequences. The above approaches report motfs as consensus words, whch are arguably less senstve and precse than the correspondng weght matrx representatons (Stormo et al., 1982). To whom correspondence should be addressed. Snce the poneerng work of Stormo et al. (1982), the weght matrx model has become one of the most wdely used models for representng motfs. A popular approach to estmatng the parameters of a weght matrx de novo s to fnd a statstcally enrched motf n postve sequences wth respect to a background model (Stormo and Hartzell, 1989; Lawrence and Relly, 1990; Lawrence et al., 1993; Lu et al., 1995; Barash et al., 2001). The background model, whch usually s defned as an n-th order Markov model (n = 0, 1, 2 or 3), tres to capture all nformaton n the non-bndng stes that are much more heterogeneous than the bndng stes. Such a background model s so general that the weght matrx model tends to have very low specfcty. To better dentfy the non-bndng stes that are very smlar to the bndng stes, Workman and Stormo (2000) proposed a dscrmnant method called ANN-Spec, whch uses a Perceptron model and Gbbs samplng to tran the weght matrx. They showed that the weght matrx models output by ANN-Spec have hgher specfcty than those bult by non-dscrmnant approaches, such as MEME (Baley and Elkan, 1994). A motf reported as a weght matrx assumes that dfferent postons of the motf are ndependent. Under ths assumpton, a weght matrx s essentally a lnear classfer when used wth a cutoff value to predct bndng stes n sequences. Recent bologcal studes have demonstrated that ndvdual postons of bndng stes are not always ndependent (Bulyk et al., 2001, 2002; Man and Stormo, 2001), and suggested that some TFs recognze ther targets n a non-lnear fashon. Barash et al. (2003), adopted Bayesan networks to model dependences n bndng motfs as trees and mxtures of trees. The Bayesan tree model s smlar to the one used n an early work by Agarwal and Bafna (1998) to model the dependency between bases. It s recently reported (Zhou and Lu, 2004) that a smpler parcorrelaton model can largely account for all observed correlatons among motf postons and usng such a model n conjuncton wth the Gbbs samplng method suffers no overfttng problem. However, such a model stll cannot accommodate some non-lnear factors n dscrmnatng postve and negatve sequences. It s wdely accepted that a TF partcpates n controllng the mrna levels of ts target genes through ts bndng stes n the correspondng promoter regons. Hence, the REDUCE method (Bussemaker et al., 2001) and Motf Regressor (Conlon et al., 2003) were proposed to dscover motfs by assocatng motf abundances wth real-valued changes n genome-wde expresson data. The REDUCE method enumerates all K-mers (DNA segments of length K) and checks whether the combnatoral effects of a set of K-mers can be used to explan changes of gene-expresson data n a regresson manner The Author Publshed by Oxford Unversty Press. All rghts reserved. For Permssons, please emal: journals.permssons@oupjournals.org
2 Motf modelng usng ChIP-chp data Motf Regressor frst uses MDSCAN (Lu et al., 2002) to generate a large set of matrx-based motf canddates that are enrched n the promoter regons of genes wth the hghest fold changes n gene expresson data. Then t uses regresson analyses to select motf canddates that are most relevant to the change of gene expressons. Nevertheless, nether approach explots the potental of usng negatve sequences to change the parameters of a motf so as to ncrease the specfcty of the model. We propose a novel dscrmnant approach to enhance TF DNA bndng models usng the boostng technque. Frst, we use the ChIP-chp data to select postve and negatve sequences. In ChIPchp experments, DNA s crosslnked n vvo to protens at stes of DNA proten nteracton and sheared to 500 bp 2 kb fragments. The DNA proten complexes are precptated by antbodes specfc to the TF of nterest. The precptated proten-bound DNA fragments are PCR amplfed, fluorescently labeled and hybrdzed to mcroarrays contanng every promoter (sometmes also every ORF) n the genome. DNA fragments that are consstently enrched by ChIP-chp over repeated experments are dentfed as postve sequences contanng the proten DNA nteractng loc at 1 kb resoluton. When compared wth the gene-expresson data, the ChIP-chp data provde much more accurate nformaton about the genome-wde locaton of n vvo TF DNA nteractons, whch enables us to assgn defntve class labels to some promoter sequences wth hgh confdence. Consequently, we can model the TF DNA bndng problem as a classfcaton problem. We modfy the confdence-rated boostng (CRB) algorthm (Schapre and Snger, 1999) to tran a TF DNA bndng classfer as an ensemble model, whch s a weghted combnaton of a set of base classfers. The modfed CRB algorthm automatcally decdes the number of base classfers to be used so as to avod overfttng. A key aspect of the boostng technque s that t forces some of the base classfers to focus on the boundary between postve and negatve samples, thus effectvely reducng classfcaton errors. We demonstrate the power of ths approach by ts performance on the ChIP-chp data of Saccharomyces cerevsae (Lee et al., 2002). 2 METHODS 2.1 The ensemble model We defne a TF DNA bndng model as a weghted combnaton of a set of base classfers {q m ( )}: Q(S ) = α m q m (S ), (1) m where α m s the weght of q m ( ). The model weghts can be normalzed so that they sum up to 1. The class label of a DNA sequence S s decded by sgn(q(s )), wth +1 denotng that S s a postve sequence. The base classfer has ts root n the weght matrx method (Stormo et al., 1982). Let f m ( ) be the weght matrx model on whch q m ( ) s based. And let the set {s j } represent all K-mers n a DNA sequence S. The score of a K-mer s j, gven f m ( ) s: f m (s j ) = K w m I (s j ) t, (2) k=1 b {A,C,G,T} where (1) w m s the parameter (n the logarthm scale) of the model f m( ) for the nucleotde b at poston k; (2) I (s j ) = 1fthek-th base of s j s b and I (s j ) = 0, otherwse; (3) t s a threshold decded by some crtera (e.g. P -value). The hgher the score, the more lkely a ste wll be bound by the TF. The weght matrx model decdes s j as a target of the TF f f m (s j )>0 and a non-target ste, otherwse. We wll show later that the threshold can be embedded nto the parameter matrx [w m ]. In many stuatons (e.g. ChIP-chp experments), we only have nformaton about whether a DNA sequence s bound by a TF, but do not know whch stes n the sequence the TF bnds to. Hence, gven a weght matrx, we need to derve a scorng functon to assess the lkelhood of a DNA sequence as a target of a TF. Ths score should be affected by: (1) the number of matchng stes n the sequence; and (2) the degree of the match for each matchng ste. The followng functon takes nto account of the above factors and scores a sequence as: h m (S ) = log e fm(s r), (3) (r) where the sum s over the r best matchng K-mers. Ths equaton s smlar to that proposed by Motf Regressor (Conlon et al., 2003). However, we lmt t to the best r stes to avod favorng very long sequences. Detals for decdng the value of r are explaned n Secton 3.2. The base classfer q m ( ) transforms the score of a sequence wth a hyperbolc tangent functon to a soft class predcton: q m (S ) = 1 e hm(s ) 1 + e hm(s ) = (r) efm(sr) 1 (r) efm(sr) + 1. (4) The hyperbolc tangent functon s a scaled and based logstc functon, whch has been used for motf ste predctons (Barash et al., 2001; Segal et al., 2002). 2.2 Learn the ensemble model va boostng We adopt the CRB algorthm (Schapre and Snger, 1999) to perform the followng tasks n buldng an ensemble model Q( ): (1) decdng the number of lnear classfers q m ( ) n Q( ) and (2) learnng the parameters of each q m ( ) and ts weght α m. Loosely speakng, n the frst round, the CRB algorthm assgns equal weghts to all samples and trans the frst base classfer. In each of the rounds that follow, the boostng procedure gves hgher weghts to prevously msclassfed samples and learns a new base classfer wth ts weght usng the reweghted samples. The fnal classfer s a lnear assembly of weghted base classfers from each round. We made some modfcatons to the CRB algorthm to serve our purpose better. The modfed CRB algorthm s outlned as Fgure 1. Our frst change tres to accommodate the unbalanced tranng set (the number of negatve samples s much larger than that of postve ones) by assgnng larger ntal weghts to the postve samples. Second, to prevent overfttng, we reserve some tranng sequences for nternal test durng tranng. The detals of our mplementatons are explaned n the next secton. 3 IMPLEMENTATION 3.1 Intalze the weghts of sequences In our study, the number of negatve sequences (usually n thousands) s often much larger than the postve ones (usually <100). Wthout proper adjustments, negatve sequences would overwhelm a classfer and reduce ts capablty of recognzng postve sequences. As a remedy, we constran the total weght of the postve sequences to be equal to that of the negatve sequences (step b n Fg. 1). The sequences wthn each class have equal weghts. Ths n effect mposes a hgher penalty for msclassfyng a postve sequence than msclassfyng a negatve one. Note that ths heurstcs s not equvalent to ncreasng the number of postve observatons. 3.2 Learn base classfers The CRB algorthm (Schapre and Snger, 1999) s a Newton-lke algorthm that constructs an ensemble model to mnmze the upper bound on msclassfcaton error Err = d (1) exp( y Q(S )), (5) 2637
3 P.Hong et al. (a) Randomly reserve part of the tranng data for nternal test. The remanng n tranng sequences and ther class labels are denoted as (S 1, y 1 ),...,(S n, y n ); y { 1, 1}. (b) Intalze the weghts of sequences d (1) ( = 1,..., n). (c) For m = 1,..., M (c.1) Tran the parameters of q m ( ) and ts weght α m usng the weghted sequences wth the weghts {d (m) }. (c.2) Update sequence weghts: d (m+1) = d(m) exp( α my q m(s )) j d(m) j exp( α my j q m(s j )) (c.3) Use the reserved data to check f the overall model overfts the tranng data. Roll back (m = m 1) and stop f t overfts. (d) Output the fnal model Q( )= m α mq m ( ). Fg. 1. The modfed boostng algorthm. where d (1) s the ntal weght of S and y s the class label of S. Fredman et al. (2000) have detaled a dscussons on the ratonale of choosng the above crteron. In the m-th round, the CRB algorthm trans q m ( ) and ts weght α m to mnmze the weghted error: ε m = d (m) exp( α m y q m (S )), (6) where d (m) s the weght of S n the m-th round. In our case, the parameters to be estmated n each round nclude α m, r and [ w m ]. Bascally, at step c.1 n Fgure 1, we ncrease r from 1 to R (currently [ R = 5) ] by the step sze 1. For each value of r, the parameters α m and w m are ntalzed and refned to mnmze the weghted error. Fnally, the m-th round reports the values of r, α m and [ w m ], whch correspond to the mnmum weghted error Intalzaton Snce the motf must be an enrched pattern n the postve sequences, we take advantage of Motf Regressor (Conlon et al., 2003) to generate a good seed weght matrx for ntalzng [ w m ]. The seed weght matrx, reported by Motf Regressor, has the best correlaton between the logarthm of ChIPchp [ P ] -value and motf-matchng score of all tranng sequences. Let w 0 be the seed weght matrx. Gven a value of r, we ntalze α m and w m as α m(0) = 1 and w m (0) = w0 + (σ t/k), respectvely, where σ s randomly generated n the range [ 0.2, 0.2] and t s the threshold as n Equaton (2). The value of t s determned as the followng. We frst use the matrx [ w 0 + σ ] to score all stes n the tranng sequences and obtan the mnmum and maxmum ste scores as t mn and t max. Then, we ncrease t from t mn to t max by the step sze 0.1 and select the value that corresponds to the mnmum weghted error under the current values of r and α m Refnement The parameters [ w m ] and α m are teratvely refned by a gradent-lke method. In the n-th teraton (n 1), use [ w m (n 1)] to fnd the best r stes n each sequence as ts representatve stes, and update [ w m (n)] and α m(n) based on the correspondng gradents of the weghted error,.e.: w m (n) = wm (n 1) η 1 (1 + n/10) ε m(n 1) w m (n 1) η 2 α m (n) = α m (n 1) (1 + n/10) ε (7) m(n 1) α m (n 1), where the update rates are set as η 1 = 0.05 and η 2 = 0.1 based on our experence. The teraton stops f (1) the weghted error ncreases, (2) the mprovement of error s < or (3) the maxmum number of teratons (currently 100) s reached. Note that a ste s j s now b {A,C,G,T} wm (n)i (s j ), whch s slghtly df- scored as K k=1 ferent from Equaton (2). The threshold t n Equaton (2) s absorbed by [ w m (n)] and s updated mplctly. 3.3 Prevent overfttng A man challenge wth the small number of postve samples s that one can easly overtran the classfers. Our strategy to allevate ths effect s to reserve a subset of the negatve tranng sequences (5% n our current settng) and one postve tranng sequence for nternal valdaton durng tranng. The sequences are randomly selected. The weght of each reserved sequence s set as the ntal weght of a tranng sequence wth the same class label. Overfttng s checked usng the reserved data at step c.3 n Fgure 1. The boostng procedure wll stop, f addng one more base classfer ncreases the error [as defned n Equaton (5)] for the reserved sequence set. Sometmes, the ensemble model may have only one base classfer, say q 1 ( ). We buld a base classfer q υ ( ) wth ts parameters as r υ and [ w 0 t υ /K ], where r υ and t υ are decded by the ntalzaton method (wthout σ ) descrbed n Secton 3.2. The weght of q υ ( ) s set as 1. We compare q υ ( ) wth q 1 ( ) and choose the one wth a smaller weghted error as defned n Equaton (5). The ratonale for ths step s that the current way for tranng base classfers may not fnd the best one. Ths lmtaton can be amended by a weghted combnaton of multple base classfers. If the fnal model has only one base classfer, q υ ( ) could be a better alternatve. 4 RESULTS 4.1 Data We used the ChIP-chp data reported n Lee et al. (2002). Postve sequences are selected usng ChIP-chp P -value as the cutoff. At ths cutoff selecton, the false postve rate s 6 10% and the false negatve rate s 33% (Lee et al., 2002). Although the data are stll nosy, they are the best genome-wde data of n vvo TF DNA bndng localzaton so far. To avod havng too few postve samples, we also requred that each selected TF should have at least 25 postve sequences. Forty TFs (Lee et al., 2002) satsfy these crtera. Negatve sequences were selected as those wth ChIP-chp 2638
4 Motf modelng usng ChIP-chp data Table 1. Data summary and cross-valdaton results for 31 ChIP-chp data TF Pos seq (no.) Neg seq (no.) Base classfers (no.) Average FP of weght matrx Average FP of boostng Improvement of boostng over weght matrx(%) ABF ACE BAS CAD CBF CIN DAL FHL FKH FKH GCN HAP HSF MBP MCM NRG PDR PHD RAP REB RLM SKN SMP STE SUM SWI SWI SWI YAP YAP YAP Columns 1, TF names; 2, number of postve sequences; 3, number of negatve sequences; 4, number of base classfers n the boosted classfer; 5, number of false postves FP w usng the weght matrx reported by Motf Regressor as a classfer; 6, number of false postves FP b of the boostng method; 7, percentage of mprovement of the boostng method over the weght matrx method, measured as (FP w FP b )/FP w. rato 1 and ChIP-chp P -value Each selected TF has 3000 negatve sequences. For each gene, we take ts upstream sequence, up to 800 bp, not overlappng wth the prevous gene. 4.2 Boostng mproves the specfcty of motf models To evaluate our method, we used the followng cross-valdaton procedure. In each run, we leave one postve sequence and 5% of randomly selected negatve sequences as the test data and tran a classfer on the remanng data. Ths procedure s repeated 10 tmes for each postve sequence. The cross-valdaton error of each run s calculated as the number of false postves f the number of the false negatves s zero. The results are then averaged for all runs and compared. The detaled data, whch nclude the sequence data, the ensemble models of the TFs, the logos of the ensemble models and all the test results, are avalable as the Supplementary data at hong2004/motfbooster/. We used Motf Regressor (Conlon et al., 2003) to fnd the seed weght matrx. For each TF, Motf Regressor called MDSCAN (Lu et al., 2002) to fnd canddate motfs of wdth 6 17 bases. At each wdth, MDSCAN reported the best 20 weght matrces enrched n the postve tranng sequences. Each weght matrx was used to score the tranng sequences. Motf Regressor then performed smple lnear regresson between the logarthm of ChIP-chp P -values and sequence scores. We chose the motf correspondng to the best regresson P -value as our seed motf. We observed that Motf Regressor dd not fnd sgnfcant enough motfs for nne TFs (DIG1, GAL4, GAT3, GCR2, IME4, IXR1, NND1, PHO4 and ROX1). It s possble that under the asynchronzed growth condton, these TFs were not actvated, or the modfed tagged TFs have changed ther bndng characterstcs. Table 1 summarzes the results for the remanng 31 TFs. Compared wth the weght matrx reported by Motf Regressor, the ensemble models performed markedly better n 27 cases and evenly n 4 cases (FKH1, FKH2, RLM1 and YAP6). A closer examnaton on the four even cases reveals that each ensemble model only has one base classfer that s a drect converson from the ntal weght matrx. The boostng approach also reported fnal models wth sngle base classfer n 5 of 27 cases that performed better. These fve TFs are CIN5, MBP1, NRG1, SKN7 and STE12. Snce the base classfer s equvalent to a weght matrx model, these results ndcate that 2639
5 P.Hong et al. Table 2. Contrbutons of the base classfers (BCs) n the leave-one-out cross valdaton tests TF BC no. Average FP of WM Average FP of BC 1 Average FP of BC (1 + 2) Average FP of BC Average FP of BC ( ) ( ) ABF ACE BAS CAD CBF DAL FHL GCN HAP HSF MCM PDR PHD RAP REB SMP SUM SWI SWI SWI YAP YAP Columns 1 7 are the TF names, number of BCs n the ensemble model, number of false postves of the weght matrx method and number of false postves of the ensemble model when ts frst 1, 2, 3 and 4 BCs are used, respectvely. We order the base classfers n each ensemble model so that ther weghts are n the descendng order. usng negatve nformaton can help dscover better weght matrces n many cases. Ths s consstent wth the fndngs of Workman and Stormo (2000). However, the frst base classfer does not always perform better than the ntal weght matrx. Table 2 summarzes the contrbutons of the base classfers for the cases where the boostng method selected more than one base classfer. The base classfers n the fnal models are arranged n the descendng order of ther weghts. The performances of 13 frst base classfers,.e. the ones wth the largest weghts, are worse than those of the weght matrces reported by Motf Regressor. Ths may suggest that when the bndng stes of a TF are heterogeneous and maybe grouped nto clusters, our boostng method fnds base classfers correspondng to dfferent cluster profles, whereas Motf Regressor reports an average profle. Thus, a sngle base classfer may be too specfc to a partcular cluster and does not dscrmnate well globally. 5 DISCUSSION For some cases, the ensemble model can reveal dependences among motf postons. For example, Fgure 2a dsplays the weght matrx found by Motf Regressor for RAP1, from whch we can see that C and T domnate n poston 5, and A and G domnate n poston 8. But there s no further nformaton on how these two postons mght correlate wth each other. In contrast, our boostng approach selected three base classfers (Fg. 2b d) to compose the fnal model. Two base classfers favored C and A n postons 5 and 8, respectvely, whereas the thrd one preferred T and G n those postons, respectvely. Ths observaton mples that postons 5 and 8 may cooperate n a certan way such that the change n one poston correlates wth the change n the other. As another example, we observe that postons 1, 10 and 13 of REB1 motf (Fg. 3) can be decomposed n a smlar way. In ts frst base classfer, poston 13 strongly prefers G; postons 1 and 10 are ambvalent about G and C, respectvely. In the second base classfer, however, poston 13 strongly dsfavors G, and postons 1 and 10 strongly favor G and C, respectvely. Ths suggests that the three postons may cooperate to facltate the proten DNA bndng. The boostng approach termnates wth an ensemble of 2 3 base classfers for most cases. Ths s atypcal for applcatons usng the boostng technque that usually can boost for hundreds to thousands of base classfers. The small number of base classfers could be due to three reasons. The frst reason mght be the unbalanced tranng data ( 100 postve versus 3000 negatve sequences). We examned the senstvty and specfcty of each base classfer alone usng the tranng samples (Fg. 4a). The senstvty of base classfers spreads out n the range of 40 90%, whle ther specfcty concentrates n the range of 75 95%. Ths suggests that t s easer to tran base classfers to recognze negatve samples n our case although the negatve samples are more heterogeneous than the postve ones. We modfy the boostng algorthm by addng more ntal weghts to the postve samples such that the ntal total weghts of two classes are equal. We note that although ths method helps to brng out a less based classfer, t s not equvalent to ncreasng the number of postve observatons. As shown n Fgure 4b, base classfers wth hgher senstvty tend to have lower generalzaton errors. A smlar trend can be observed for the specfcty of base classfers n Fgure 4c. Fgure 5a shows that t s more 2640
6 Motf modelng usng ChIP-chp data Fg. 2. Logos of the bndng models of RAP1. (a) Poston specfc probablty matrx. Logo of the weght matrx reported by Motf Regressor, drawn usng the method of (Schneder and Stephens, 1990). (b), (c) and (d): Logos of the base classfers 1, 2 and 3, respectvely n the ensemble model reported by the boostng approach (weght of base classfer 1 = 0.31; weght of base classfer 2 = 0.30; weght of base classfer 3 = 0.39). Base classfers have negatve parameters and cannot be vsualzed n the same way. (b), (c) and (d) are drawn n the followng way. The heght of a letter corresponds to the absolute magntude of ts weght scaled by a factor k (For vsualzaton purpose, k = 3 for postve weghts and k = 1 for negatve weghts.) Letters are ordered by ther weghts. The black horzontal lne represents zero. Letters above the zero lne have postve weghts, and those below the zero lne have negatve weghts. Fg. 3. Logos of the ensemble model of REB1. (a) The logo of base classfer 1 (Weght = 0.52). (b) The logo of base classfer 2 (Weght = 0.47). lkely to tran base classfers wth relatvely low tranng senstvty and specfcty when the sze of postve sequences s small. Moreover, base classfers traned wth less postve samples are more lkely to have hgher generalzaton errors (Fg. 5b). Based on the above analyses, we reason that (1) base classfers hardly overft the tranng data n most cases and (2) the small sze of postve samples does not provde enough nformaton to boost for more base classfers. Second, the bndng mechansms of some TFs may ndeed be almost lnearly dependent of nucleotde types of the motf postons. For example, ABF1 has a much larger postve sample sze (176) when compared wth other TFs. Both the weght matrx and the ensemble model of ABF1 have low and comparable generalzaton errors (Table 1). The ensemble model has two base classfers. The tranng senstvty/specfcty of the base classfers are 93.18/94.66% and 90.34/95.58%. These results suggest that the bndng mechansm of ABF1 may have lttle non-lnearty because ts samples can be well classfed by lnear decson rules ncludng the weght matrx and the base classfers. The base classfer becomes a strong learner (.e. t can explan most of the tranng data) n such a case. On the other hand, the mld performances of many other base classfers suggest that the bndng mechansms of some other TFs could have relatvely hgh non-lnearty. Fnally, our approach ntalzes a base classfer usng a seed matrx. The successve refnng step may only explore a lmted subspace around the seed matrx. The tranng of base classfers can be mproved by a samplng-based de novo motf fndng algorthm that s capable of explorng a wder range of the soluton space (e.g. by samplng at multple temperature levels). Or we can replace the base learner wth a smpler one, e.g. a smple decson tree that uses rules lke whether a poston should be C or not, etc. Wth the above modfcatons, the ensemble model could have more base classfer and capture more comprehensve features that lead to better classfcaton performance. Nonetheless, the resultant base classfers could be very dverse. Some base classfers could represent hghly degenerated motfs. One potental drawback of ths alternatve s the loss of bologcal nterpretablty of the ensemble model. Although t s stll not perfectly understood why the number of base classfers s small, our approach provdes a good balance between the nterpretablty and the performances of the boosted models. Another choce for mprovng the boosted models s to tran each base classfer only by a randomly selected subset of the full tranng set as suggested 2641
7 P.Hong et al. (a) (b) (c) Fg. 4. (a) The tranng senstvty (horzontal axs) specfcty (vertcal axs) plot of the base classfers. Star, crcle, damond and pentagram denote the senstvty/specfcty of the base classfers, 1, 2, 3 and 4 respectvely. (b) The cross-valdaton false postve rate (FPR) tranng senstvty (horzontal axs) plot of the base classfer 1. (c) The cross-valdaton FPR tranng specfcty (horzontal axs) plot of the base classfer 1. (a) (b) Fg. 5. The result plots of the frst base classfers. (a) Tranng senstvty (star) and specfcty (crcle) number of postve sequences (horzontal axs). (b) Cross-valdaton FPR number of postve sequences (horzontal axs). by Fredman (2002). It was reported that such knd of randomness has advantages n the stuatons of small samples and powerful weak learners. 6 CONCLUSION We ntroduce a boostng-based method for modelng TF DNA bndng. By repeatedly fttng weght matrx based classfers to weghted samples that focus on erroneous classfcatons, the boostng approach can buld a more accurate TF DNA bndng model as a weghted combnaton of the base classfers. The proposed approach was appled to the ChIP-chp data of S.cerevsae and showed sgnfcant mprovements on specfcty n many cases. Lke many recent studes that use mrna mcroarray data to help refne regulatory bndng motfs and nfer combnatoral rules of transcrpton regulaton (W. Wang et al., submtted for publcaton; Beer and 2642
8 Motf modelng usng ChIP-chp data Tavazoe, 2004), we found that ChIP-chp data can be used to further refne motf models and reveal novel features of TF DNA nteractons. Currently, we use Motf Regressor to generate the seed motf for boostng. However, our algorthm s not lmted to workng wth Motf Regressor and can be used to boost weght matrces reported by any motf fndng algorthm. ACKNOWLEDGEMENTS The work of W.H.W. s supported by NIH-HG The work of J.S.L. s supported by NIH-P20-CA96470 and NSF DMS The work of P.H. s supported by NIH-GM We thank the anonymous revewers for constructve suggestons that helped us to unfy the way to ntalze and tran base classfers and nspred us to thnk hard on the overfttng ssue of the ensemble models. REFERENCES Agarwal,P.K. and Bafna,V. (1998) Detectng non-adjonng correlatons wth sgnals n DNA. In Proceedngs of the Second Annual Internatonal Conference on Research n Computatonal Molecular Bology, March 22 25, 1998, New York, USA. ACM Press, pp Baley,T.L. and Elkan,C. (1994) Fttng a mxture model by expectaton maxmzaton to dscover motfs n bopolymers. Proc. Int. Conf. Intell. Syst. Mol. Bol., 2, Barash,Y. et al. (2001) A smple hyper-geometrc approach for dscoverng putatve transcrpton factor bndng stes. In Algorthms n Bonformatcs: Proceedngs of the 1st Internatonal Workshop, LNCS 2149, pp Barash,Y. et al. (2003) Modelng dependences n proten DNA bndng stes. In Prooceedngs of the 7th Annual Internatonal Conference on Computatonal Molecular Bology (RECOMB 2003), Berln, Germany, ACM Press, NY, pp Beer,M.A. and Tavazoe,S. (2004) Predctng gene expresson from sequence. Cell, 117, Bulyk,M.L. et al. (2001) Explorng the DNA-bndng specfctes of znc fngers wth DNA mcroarrays. Proc. Natl Acad. Sc. USA, 98, Bulyk,M.L. et al. (2002) Nucleotdes of transcrpton factor bndng stes exert nterdependent effects on the bndng affntes of transcrpton factors. Nuclec Acds Res., 30, Bussemaker,H.J. et al. (2001) Regulatory element detecton usng correlaton wth expresson. Nat. Genet., 27, Conlon,E.M. et al. (2003) Integratng regulatory motf dscovery and genomewde expresson analyss. Proc. Natl Acad. Sc. USA, 100, Fredman,J.H. (2002) Stochastc gradent boostng. Comput. Stat. Data Anal., 38, Fredman,J.H. et al. (2000) Addtve logstc regresson: a statstcal vew of boostng (Wth dscusson and a rejonder by the authors). Ann. Statst., 28, Lawrence,C.E. et al. (1993) Detectng subtle sequence sgnals: a Gbbs samplng strategy for multple algnment. Scence, 262, Lawrence,C.E. and Relly,A.A. (1990) An expectaton maxmzaton (EM) algorthm for the dentfcaton and characterzaton of common stes n unalgned bopolymer sequences. Protens, 7, Lee,T.I. et al. (2002) Transcrptonal regulatory networks n Saccharomyces cerevsae. Scence, 298, Lu,J.S. et al. (1995) Bayesan models for multple local sequence algnment and Gbbs samplng strateges. J. Am. Stat. Assoc., 90, Lu,X.S. et al. (2002) An algorthm for fndng proten DNA bndng stes wth applcatons to chromatn mmunoprecptaton mcroarray experments. Nat. Botechnol., 20, Man,T.K. and Stormo,G.D. (2001) Non-ndependence of Mnt repressor operator nteracton determned by a new quanttatve multple fluorescence relatve affnty (QuMFRA) assay. Nuclec Acds Res., 29, Schapre,R. and Snger,Y. (1999) Improved boostng algorthms usng confdence-rated predctons. Machne Learnng, 37, Schneder,T.D. and Stephens,R.M. (1990) Sequence logos: a new way to dsplay consensus sequences. Nuclec Acds Res., 18, Segal,E. et al. (2002) From promoter sequence to expresson: A probablstc framework. In Proceedngs of the 6th Internatonal Conference on Research n Computatonal Molecular Bology (RECOMB 02), Washngton, DC, ACM Press, pp Snha,S. (2002) Dscrmnatve motfs. In Proceedngs of the 6th Internatonal Conference on Research n Computatonal Molecular Bology (RECOMB 02), Washngton, DC, ACM Press, pp Stormo,G.D. and Hartzell,G.W.III (1989) Identfyng proten-bndng stes from unalgned DNA fragments. Proc. Natl Acad. Sc. USA, 86, Stormo,G.D. et al. (1982) Use of the Perceptron algorthm to dstngush translatonal ntaton stes n E.col. Nuclec Acds Res., 10, Takusagawa,K. and Gfford,D. (2004) Negatve nformaton for motf dscovery. Pac. Symp. Bocomput., Vlo,J. et al. (2000) Mnng for putatve regulatory elements n the yeast genome usng gene expresson data. Proc. Int. Conf. Intell. Syst. Mol. Bol., 8, Workman,C.T. and G.D. Stormo (2000) ANN-Spec: a method for dscoverng transcrpton factor bndng stes wth mproved specfcty. Pac. Symp. Bocomput., Zhou,Q. and Lu,J. (2004) Modelng wthn-motf dependence for transcrpton factor bndng ste predctons. Bonformatcs, 20,
Support Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty
More informationy and the total sum of
Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationData Mining: Model Evaluation
Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationPredicting Transcription Factor Binding Sites with an Ensemble of Hidden Markov Models
Vol. 3, No. 1, Fall, 2016, pp. 1-10 ISSN 2158-835X (prnt), 2158-8368 (onlne), All Rghts Reserved Predctng Transcrpton Factor Bndng Stes wth an Ensemble of Hdden Markov Models Yngle Song 1 and Albert Y.
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationOnline Detection and Classification of Moving Objects Using Progressively Improving Detectors
Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationTN348: Openlab Module - Colocalization
TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More informationOptimizing Document Scoring for Query Retrieval
Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationThree supervised learning methods on pen digits character recognition dataset
Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru
More informationAnnouncements. Supervised Learning
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples
More informationDetermining the Optimal Bandwidth Based on Multi-criterion Fusion
Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn
More informationContext-Specific Bayesian Clustering for Gene Expression Data
Context-Specfc Bayesan Clusterng for Gene Expresson Data Yoseph Barash School of Computer Scence & Engneerng Hebrew Unversty, Jerusalem, 91904, Israel hoan@cs.huj.ac.l Nr Fredman School of Computer Scence
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationX- Chart Using ANOM Approach
ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationProper Choice of Data Used for the Estimation of Datum Transformation Parameters
Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and
More informationCHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION
48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationUSING GRAPHING SKILLS
Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp
More informationDetection of an Object by using Principal Component Analysis
Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,
More informationMachine Learning 9. week
Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationEYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS
P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye
More informationBOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET
1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School
More informationRange images. Range image registration. Examples of sampling patterns. Range images and range surfaces
Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationAn Image Fusion Approach Based on Segmentation Region
Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua
More informationA Robust Method for Estimating the Fundamental Matrix
Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.
More informationFast Feature Value Searching for Face Detection
Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com
More informationNAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics
Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson
More informationHierarchical clustering for gene expression data analysis
Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationFitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.
Fttng & Matchng Lecture 4 Prof. Bregler Sldes from: S. Lazebnk, S. Setz, M. Pollefeys, A. Effros. How do we buld panorama? We need to match (algn) mages Matchng wth Features Detect feature ponts n both
More informationSimulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010
Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement
More informationMULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION
MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationThe Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationA Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines
A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría
More informationMachine Learning. Topic 6: Clustering
Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess
More informationLearning Ensemble of Local PDM-based Regressions. Yen Le Computational Biomedicine Lab Advisor: Prof. Ioannis A. Kakadiaris
Learnng Ensemble of Local PDM-based Regressons Yen Le Computatonal Bomedcne Lab Advsor: Prof. Ioanns A. Kakadars 1 Problem statement Fttng a statstcal shape model (PDM) for mage segmentaton Callosum segmentaton
More informationFace Detection with Deep Learning
Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationUsing Neural Networks and Support Vector Machines in Data Mining
Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss
More informationBackpropagation: In Search of Performance Parameters
Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,
More informationExercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005
Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed
More informationBiostatistics 615/815
The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More informationBioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.
[Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationActive Contours/Snakes
Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng
More informationIntelligent Information Acquisition for Improved Clustering
Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center
More informationJournal of Process Control
Journal of Process Control (0) 738 750 Contents lsts avalable at ScVerse ScenceDrect Journal of Process Control j ourna l ho me pag e: wwwelsevercom/locate/jprocont Decentralzed fault detecton and dagnoss
More informationLearning-based License Plate Detection on Edge Features
Learnng-based Lcense Plate Detecton on Edge Features Wng Teng Ho, Woo Hen Yap, Yong Haur Tay Computer Vson and Intellgent Systems (CVIS) Group Unverst Tunku Abdul Rahman, Malaysa wngteng_h@yahoo.com, woohen@yahoo.com,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationSelf-tuning Histograms: Building Histograms Without Looking at Data
Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com
More informationJournal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray
More information5 The Primal-Dual Method
5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton
More informationA Semi-parametric Regression Model to Estimate Variability of NO 2
Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz
More informationDetection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature
Detecton of hand graspng an object from complex background based on machne learnng co-occurrence of local mage feature Shnya Moroka, Yasuhro Hramoto, Nobutaka Shmada, Tadash Matsuo, Yoshak Shra Rtsumekan
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationA Background Subtraction for a Vision-based User Interface *
A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton
More informationAdaptive Regression in SAS/IML
Adaptve Regresson n SAS/IML Davd Katz, Davd Katz Consultng, Ashland, Oregon ABSTRACT Adaptve Regresson algorthms allow the data to select the form of a model n addton to estmatng the parameters. Fredman
More informationApplication of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions
Applcaton of Maxmum Entropy Markov Models on the Proten Secondary Structure Predctons Yohan Km Department of Chemstry and Bochemstry Unversty of Calforna, San Dego La Jolla, CA 92093 ykm@ucsd.edu Abstract
More informationUnsupervised Learning
Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationIncremental Learning with Support Vector Machines and Fuzzy Set Theory
The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationImplementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status
Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status
More informationA Statistical Model Selection Strategy Applied to Neural Networks
A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationCorner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity
Journal of Sgnal and Informaton Processng, 013, 4, 114-119 do:10.436/jsp.013.43b00 Publshed Onlne August 013 (http://www.scrp.org/journal/jsp) Corner-Based Image Algnment usng Pyramd Structure wth Gradent
More informationImproving Web Image Search using Meta Re-rankers
VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute
More informationCollaboratively Regularized Nearest Points for Set Based Recognition
Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,
More information430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH Boosting for Multi-Graph Classification
430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Boostng for Mult-Graph Classfcaton Ja Wu, Student Member, IEEE, Shru Pan, Xngquan Zhu, Senor Member, IEEE, and Zhhua Ca Abstract In ths
More informationSVM-based Learning for Multiple Model Estimation
SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationFeature Selection for Target Detection in SAR Images
Feature Selecton for Detecton n SAR Images Br Bhanu, Yngqang Ln and Shqn Wang Center for Research n Intellgent Systems Unversty of Calforna, Rversde, CA 95, USA Abstract A genetc algorthm (GA) approach
More informationAn Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,
More informationSHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE
SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro
More information