A Hidden Markov Model Variant for Sequence Classification

Size: px
Start display at page:

Download "A Hidden Markov Model Variant for Sequence Classification"

Transcription

1 Proceedngs of the Twenty-Second Internatonal Jont Conference on Artfcal Intellgence A Hdden Markov Model Varant for Sequence Classfcaton Sam Blasak and Huzefa Rangwala Computer Scence, George Mason Unversty sblasak@gmu.edu, rangwala@cs.gmu.edu Abstract Sequence classfcaton s central to many practcal problems wthn machne learnng. Dstances metrcs between arbtrary pars of sequences can be hard to defne because sequences can vary n length and the nformaton contaned n the order of sequence elements s lost when standard metrcs such as Eucldean dstance are appled. We present a scheme that employs a Hdden Markov Model varant to produce a set of fxed-length descrpton vectors from a set of sequences. We then defne three nference algorthms, a Baum-Welch varant, a Gbbs Samplng algorthm, and a varatonal algorthm, to nfer model parameters. Fnally, we show expermentally that the fxed length representaton produced by these nference methods s useful for classfyng sequences of amno acds nto structural classes. 1 Introducton The need to operate on sequence data s prevalent n a varety of real world applcatons rangng from proten/dna classfcaton, speech recognton, ntruson detecton and text classfcaton. Sequence data can be dstngushed from the more-typcal vector representaton n that the length of sequences wthn a dataset can vary and that the order of symbols wthn a sequence carres meanng. For sequence classfcaton, a varety of strateges, dependng on the problem type, can be used to map sequences to a representaton that can be handled by tradtonal classfers. A smple technque nvolves selectng a fxed number of elements from the sequence and then usng those elements as a fxed-length vector n the classfcaton engne. In another technque, a small subsequence length, l, s selected, and a sze M l vector s constructed contanng the counts of all length l subsequences from the orgnal sequence. Ths vector can then be used for classfcaton [Lesle et al., 2002]. A thrd method for classfyng sequence data requres only a postve defnte mappng defned over pars of sequences rather than any Fundng: NSF III drect mappng of sequences to vectors. Ths strategy, known as the kernel trck, s often used n conjuncton wth support vector machnes (SVMs) and allows for a wde varety of sequence smlarty measurements to be employed. Hdden Markov Models (HMM) [Rabner and Juang, 1986; Eddy, 1998] have a rch hstory n sequence data modelng (n speech recognton and bonformatcs applcatons) for the purposes of classfcaton, segmentaton, and clusterng. HMMs success s based on the convenence of ther smplfyng assumptons. The space of probable sequences s constraned by assumng only parwse dependences over hdden states. Parwse dependences also allow for a class of effcent nference algorthms whose crtcal steps buld on the Forward- Backward algorthm [Rabner and Juang, 1986]. We present an HMM varant over a set of sequences, wth one transton matrx per sequence, as a novel alternatve for handlng sequence data. After tranng, the per-sequence transton matrces of the HMM varant are used as fxed-length vector representatons for each assocated sequence. The HMM varant s also smlar to a number of topc models, and we descrbe t n the context of Latent Drchlet Allocaton [Ble et al., 2003]. We then descrbe three methods to nfer the parameters of our HMM varant, explore connectons between these methods, and provde ratonale for the classfcaton behavor of the parameters derved through each. We perform a comprehensve set of experments, evaluatng the performance of our method n conjuncton wth support vector machnes, to classfy sequences of amno acds nto structural classes (fold recognton and remote homology detecton problem [Rangwala and Karyps, 2006]). The combnaton of these methods, ther nterpretatons, and ther connectons to pror work consttutes a new twst on classc ways of understandng sequence data that we beleve s valuable to anyone approachng a sequence classfcaton task. 2 Problem Statement Gven a set of N sequences, we would lke to fnd a set of fxed-length vectors, A 1...N, that, when used as nput to a functon f(a), maxmze the probablty of reconstructng the orgnal set of sequences. Under our scheme, 1192

2 f(a) s a Hdden Markov Model varant wth one transton matrx, A n, assgned to each sequence, and a sngle emssons matrx, B, and start probablty vector, a, for the entre set of sequences. By maxmzng the lkelhood of the set of sequences under the HMM varant model, we wll also fnd the set of transton matrces that best represent our set of sequences. We further postulate that ths maxmum lkelhood representaton wll acheve good classfcaton results f each sequence s later assocated wth a meanngful label. 2.1 Model Descrpton We defne a Hdden Markov Model varant that represents a set of sequences. Each sequence s assocated wth a separate transton matrx, whle the emsson matrx and ntal state transton vector are shared across all sequences. We use the value of each transton matrx as a fxed-length representaton of the sequence. We defne the parameters and notaton for the model n Table 1. Parameter N T n K M a A nj B m z nt x nt Descrpton the number of sequences the length of sequence n the number of hdden symbols the number of observed symbols start state probabltes, where s ndexed by the value of the frst hdden state transton probabltes, where n s an ndex of a tranng sequence, the orgnatng hdden state, and j the destnaton hdden state emsson probabltes, where ndcates the hdden state and m the observed symbol assocated wth the hdden state the hdden state at poston t n sequence n the observed symbol at poston t n sequence n Table 1: HMM Varant model parameters The jont probablty of the model s shown below: (1) p(x, z a, A, B) = N a zn1 A nznt 1 z nt B znt x nt n=1 t=1 Ths dffers from the standard hdden Markov model only n the addton of a transton matrx, A n (hghlghted n bold n Equaton 1), for each sequence, where the ndex n ndcates a sequence n the tranng set. Under the standard HMM, a sngle transton matrx, A, would be used for all sequences. To regularze the model, we further augment the basc HMM by placng Drchlet prors on a, each row of A, and each row of B. The pror parameters are the unform Drchlet parameters γ, α, and β for a, A, and B respectvely. The probablty of the model wth prors s shown below, where the pror probabltes are the frst three terms n the product below and take the form Dr(x; a, K) = Γ(Ka) Γ(a) K xa 1 : (2) p(x, z, a, A, B α, β, γ) = ( ) Γ(Kγ) a γ 1 ( ) Γ(Kα) Γ(γ) K A α 1 Γ(Mβ) Γ(α) K nj B β 1 Γ(β) M m n j m N a zn1 A nznt 1 z nt B znt x nt n=1 t=1 One potental dffculty that could be expected n classfyng smple HMMs by transton matrx s that the probablty of a sequence under an HMM does not change under a permutaton of the hdden states. Ths problem s avoded when we force each sequence to share an emssons matrx, whch locks the meanng of each transton matrx row to a partcular emsson dstrbuton. If the emsson matrx were not shared, then two HMMs wth permuted hdden states could have transton matrces that wth large Eucldean dstances. For nstance, the followng HMMs have dfferent transton matrces, but the probablty of an observed sequence s the same under each: [ HMM 1: A 1 = [ HMM 2: A 2 = ] [, B 1 = ], B 2 = [ However, a Eucldean dstance between ther two transton matrces, A 1 and A 2 s large. 3 Background 3.1 Mxtures of HMMs Smyth ntroduces a mxture of HMMs n [Smyth, 1997] and presents an ntalzaton technque that s smlar to our model n that an ndvdual HMM s learned for each sequence, but dffers from our model n that the emsson matrces are not shared between HMMs. In [Smyth, 1997], these ntal N models are used to compute the set of all parwse dstances between sequences, defned as the symmetrzed log lkelhood of each element of the par under the other s respectve model. Clusters are then computed from ths dstance matrx, whch are used to ntalze a set of K<NHMMs where each sequence s assocated wth one of K labels. Smyth notes that whle the log probablty of a sequence under an HMM s an ntutve dstance measure between sequences, t s not ntutve how the parameters of the model are meanngful n terms of defnng a dstance between sequences. In ths research, we demonstrate expermentally that the transton matrx of our model s useful for sequence classfcaton when combned wth standard dstance metrcs and tools. 3.2 Topc Models Smpler precursors of LDA [Ble et al., 2003] and plsi [Hofmann, 1999], whch represent an entre corpus of documents wth a sngle topc dstrbuton vector, are very smlar to the basc Hdden Markov Model, whch assgns a sngle transton matrx to the entre set of sequences that are beng modeled. To extend the HMM to a plsi analogue, all that s needed s to splt the sngle transton matrx nto a per-sequence transton matrx. To extend ths model to an LDA analogue, we must go a step further and attach Drchlet prors to the transton matrces, as n our model. Inference of the LDA model (Fgure 1a) on a corpus of documents learns a matrx of document-topc proba- ] ] 1193

3 bltes. A row of ths matrx, sometmes descrbed as a mxed-membershp vector, can be vewed as a measurement of how a gven document s composed from the set of topcs. In our HMM varant (Fgure 1b), a sngle transton matrx, A n, can be thought of as the analogue to a document-topc matrx row and can be vewed as a measurement of how a sequence s composed of pars of adjacent symbols. The LDA model also ncludes a topc-word matrx, whch ndcates the probablty of a word gven a topc assgnment. Ths matrx has the same meanng as the emssons matrx, B, n the HMM varant. The Fsher kernel [Jaakkola and Haussler, 1999] and the Probablstc Product Kernel [Jebara et al., 2004] (PPK), are prncpled methods that allow probablstc models to be ncorporated nto SVM kernels. The HMM varant s smlar to these methods n that t uses latent nformaton from a generatve model as nput to a dscrmnatve classfer. It dffers from these methods, however, both n whch portons of the generatve model that are ncorporated nto the dscrmnatve classfer and n the assumptons about how dfferences n generatng dstrbutons comparsons between tranng examples. 4 Learnng the model parameters 4.1 Baum-Welch A well-known method for learnng HMM model parameters s the Baum-Welch algorthm. The Baum-Welch algorthm s an expectaton maxmzaton algorthm for the standard HMM model, and the basc algorthm s easly modfed to learn the multple transton matrces of our varant. The parameter updates shown below converges to a maxmum a posteror (MAP) estmate of p(z, a, A, B x, γ, α, β) [Rabner and Juang, 1986]: (3) a n f n(1)b n(1) + γ 1 (a) (4) (5) A (new) nj B (new) m Tn f n(t 1)A njb jxt b nj(t) +α 1 f n(t)b nj(t) +β 1 n t:x t =m where f and b are the forward and backward recursons defned below: (6) { j f n(t) = fnj(t 1)AnjBx t, t > 1 a B x1, t =1 (b) Fgure 1: Plate dagrams of the (a) LDA model, expanded to show each word separately and the (b) HMM varant. The model parameters n the LDA model are defned as follows: K - number of topcs, φ k - a vector of word probabltes gven topc k, β - parameters of the Drchlet pror of φ k, θ n - a vector of topc probabltes n document n, α - parameters of the Drchlet pror of θ n. A row of the matrx B n the HMM varant has exactly the same meanng as a topc-word vector, φ k, n the LDA model. (7) b n(t) = { j AnjBjx bnj(t +1), t+1 t < Tn 1 K, t = Tn The complexty of the Baum-Welch-lke algorthm for our varant s dentcal to the complexty of Baum-Welch for the standard HMM. The update for A j n the orgnal HMM nvolves summng over n T n terms, whle the update for a sngle A nj s a sum over T n terms, makng the total number of terms over all the A n s n our varant, n T n, the same number as the orgnal algorthm. 4.2 Gbbs Samplng Two Gbbs samplng schemes are commonly used to nfer Hdden Markov Model parameters [Scott, 2002]. Unlke the Baum-Welch algorthm whch returns a MAP estmate of the parameters, these samplng schemes allow the expectaton of the parameters to be computed over the posteror dstrbuton p(z, a, A, B x, γ, α, β). In the Drect Gbbs sampler (DG), hdden states and parameters are ntally chosen at random, then new hdden states are sampled usng the current set of parameters: (8) p(z (new) t z t 1,z t+1) A zt 1 B xt A zt+1 In the Forward Backward sampler (FB), the ntal settngs and parameter updates are the same as the DG scheme, but the hdden states are sampled n order from T n down to 1 usng values from the forward recurson. Specfcally, each hdden state z nt s sampled gven z nt+1 = j from a multnomal wth parameters (9) (10) p(z (new) ntn x n1:tn ) fn(tn) p(z (new) nt x n1:tn,z (new) nt+1 )=p(z(new) nt x n1:t,z (new) nt+1 ) f n(t)a nj, t < T n In both algorthms, after the hdden states are sampled, parameters are sampled from Drchlet condtonal dstrbutons, shown for A below, where I(ω) =1f ω s true and 0 otherwse: (11) Tn p(a nj z n,α)=dr( I(z nt 1 = )I(z nt = j) +α) The FB sampler has been shown to mx more quckly than the DG sampler, especally n cases where adjacent hdden states are hghly correlated [Scott, 2002]. We therefore use the FB sampler n our mplementaton. 1194

4 4.3 Varatonal Algorthm Another approach for nference of the HMM varant parameters s through varatonal technques. We employ a mean feld varatonal algorthm that follows a smlar pattern as EM. When the varatonal update steps are run untl convergence, Kullback-Lebler dvergence between the varatonal dstrbuton, q(z, a, A, B), and the model s condtonal probablty dstrbuton, p(z, a, A, B x, γ, α, β), s mnmzed. The transton matrces returned by the varatonal algorthm are the expectatons of those matrces under the varatonal dstrbuton. Thus, lke the Gbbs samplng algorthm, the parameters returned by the varatonal algorthm approxmate the expectatons of the parameters under the condtonal dstrbuton. Our mean feld varatonal approxmaton s shown below: N K K (12) q(z, a, A, B) =q(a) q(a n) q(b ) q(z nt) n=1 =1 =1 nt ( Γ( = γ) ) a γ 1 Γ( j αnj) Γ( γ) n j Γ( αnj) A α nj 1 nj j ( Γ( β ) m m) m Γ( β B β m 1 m h z nt nt m) m nt wth varatonal parameters h nt, whch approxmate each z nt, and α nj, β m, and γ, whch can be thought of as Drchlet parameters approxmatng α, β, and γ. When we maxmze the varatonal free energy wth respect to the varatonal parameters, we obtan the followng update equatons, where Ψ(x) = d log Γ(x) dx : (13) α nj = h nt 1h ntj + α t (14) β m = h nt + β nt:x t =m (15) γ = h n1 + γ n ( (16) h nt exp h nt 1 Ψ( α n ) Ψ( α n j ) + j h nt+1 j Ψ( α ( nj) Ψ( α nj ) + Ψ( β xnt ) Ψ( )) β m), j j m Notce that the update for h nt depends only on the adjacent h s, h nt 1 and h nt+1 as well as the expectatons of the transton probabltes from the adjacent h s and the expectaton of the emsson probabltes from the current h nt. Ths mean feld algorthm can therefore be understood as an equvalent of the Drect Gbbs samplng method except that at subsequent tme steps nteractons occur between varatonal parameters rather than through the sampled values of z. A complete dervaton of the varatonal algorthm s ncluded on the authors webste Class categores, SCOP 1.67, 25% Baum Welch Gbbs Samplng Varatonal Fold categores, SCOP 1.67, 25% Baum Welch Gbbs Samplng Varatonal Fold categores, SCOP 1.67, 40% Baum Welch Gbbs Samplng Varatonal Superfamly categores, SCOP 1.67, 40% Baum Welch Gbbs Samplng Varatonal Table 2: AUC results from all of the mult-class SVM experments are dsplayed. The best performng algorthm, the best performng settng of K, and the best combnaton of K and algorthm s marked n bold. The Gbbs-Samplng-derved representaton most frequently returned the best AUC score on the majorty of the datasets. 5 Expermental Setup 5.1 Protocol To evaluate our fxed-length representaton scheme, for each dataset (descrbed n Secton 5.2), we created three sets of fxed-length representatons per tral over ten trals by runnng each of the three nference algorthms: () Baum-Welch, () Gbbs Samplng, and () the mean feld varatonal algorthm, on the entre set of nput data. We vared the number of hdden states, K, from 5 to 20 n ncrements of 5. Ths procedure created a total of 120 (3 10 4) fxed-length representatons for each dataset. The fxed-length vector data was then used as nput to a support vector machne (SVM) classfer 2. We used the SVM to ether perform ether multway classfcaton on the dataset under the Crammer-Snger [Crammer and Snger, 2002] constructon or the one-versus-rest approach, where a bnary classfer was traned for each of the classes. We compare classfcaton results from our model wth results from the Spectrum(2) kernel for all experments. The Spectrum(l) kernel s a strng kernel whose vector representaton s the set of counts of substrngs of observed symbols length l n a gven strng [Lesle et al., 2002]. For the one-versus rest experments, we compare our results to more bologcally senstve kernels for proten classfcaton, descrbed n Rangwala et. al [Rangwala and Karyps, 2005]. 5.2 Proten Datasets The Structural Classfcaton of Protens (SCOP) [Murzn et al., 1995] database categorzes protens nto a multlevel herarchy that captures commonaltes between proten structure at dfferent levels of detal. To evaluate our representaton, we ran sets of proten class- 2 We used SVM-lght and SVM-struct for classfcaton ( [Joachms, 1999]. 1195

5 fcaton experments on the three top levels of the SCOP taxonomy: class, fold, and superfamly. Our datasets, whch were obtaned from prevous studes [Rangwala and Karyps, 2006; Kuang et al., 2004], were derved from ether the SCOP 1.67 or the SCOP 1.53 versons and fltered at 25% and 40% parwse sequence denttes. A proten sequence dataset fltered at 25% dentty wll have no two sequences wth more than 25% sequence dentty. We parttoned the data nto a sngle test and tranng set for each category. At the class level, the orgnal dataset was splt randomly n to tranng and test sets. To elmnate hgh levels of smlarty between sequences that could lead to trvally good classfcaton results, we mposed constrants on the tranng/test set parttonng for classfcaton n the fold and superfamly experments. For the fold level classfcaton problem, the tranng sets were parttoned so that no examples that shared the fold and superfamly labels were ncluded n both the tranng and test sets. Smlarly, for the superfamly level classfcaton problem (referred to as the remote homology detecton problem [Lesle et al., 2002; Rangwala and Karyps, 2005]), no examples that shared the superfamly and famly levels were ncluded n both the tranng and test sets. 5.3 Evaluaton Metrcs We evaluated each classfcaton experment by computng the area under the ROC curve (AUC), a plot of the true postve rate aganst the false postve rate, constructed by adjustng the SVM s ntercept parameter. We also computed the AUC50 value, whch s a normalzed computaton of the area under the ROC curve untl the frst 50 false postves have been detected. We were worred about varance over dfferent Baum-Welch runs due to convergence of the algorthm to dfferent local optma. To mtgate ths concern, we ran both the Baum-Welch algorthm and the other nference algorthms, for consstency, 10 separate tmes on each dataset. The results presented for each nference method are averages over ndvdual results of the 10 trals across the dfferent classes. 6 Results and Dscusson 6.1 Proten Sequence Classfcaton Table 2 shows a comparson of results (average AUC scores) across the nference algorthms n three taxonomc categores (class, fold, and superfamly) usng the multclass SVM. Although the AUC scores are close for each algorthm, n most cases, the Gbbs samplng algorthm outperforms the other algorthms. Table 3 shows a comparson of results over the nference algorthms but only for the one-versus-rest superfamly classfcaton experment on the SCOP 1.53 dataset. Smlar to the multclass experments usng the lnear kernel, the Gbbs samplng algorthm outperforms the other nference methods n the one-versus-rest experments. Although the values of the best performng algorthm s AUC and AUC50 scores do not sgnfcantly change from the lnear to the Gaussan kernel, the var- Lnear Kernel Metrc AUC AUC Baum Welch Gbbs Samplng Varatonal Gaussan Kernel Metrc AUC AUC Baum Welch Gbbs Samplng Varatonal Table 3: AUC and AUC50 results for proten superfamly classfcaton AUC results on the SCOP 1.53 wth 25% Astral flterng over a selected set of 23 superfamles usng Gaussan and lnear kernels n one-versus-rest SVM classfcaton. atonal algorthm shows a large mprovement, rangng from 6% to 30%. 6.2 Analyss of nference algorthms The dfferences n AUC values resultng from the dfferent tranng algorthms (Tables 2 and 3) can be explaned, at least n part, by a hgh level overvew of how each algorthm operates. Whle the Baum-Welch algorthm returns MAP parameters of the model, both the Gbbs samplng method and the varatonal algorthm return expectatons of the parameters under an approxmate of the posteror dstrbuton. The MAP soluton from the Baum-Welch algorthm s lkely to reach a local maxmum of the posteror, whle the other algorthms should tend to average over posteror parameters. The Gbbs samplng algorthm and the varatonal algorthm each compute expectatons of the parameters under an approxmate posteror dstrbuton, but each uses a dfferent method to construct ths approxmaton. The varatonal algorthm wll be less lkely to converge to a good approxmaton of the margnal dstrbuton because the mean feld varatonal approxmaton necessarly does away wth the drect couplng between adjacent hdden states characterstc of the HMM. 6.3 Comparatve Performance Tables 4 and 5 show a comparson between the HMM varant and common classfcaton methods for the multclass and one-versus rest experments respectvely. The AUC and AUC50 scores ndcate that our scheme produces a representaton that s roughly equvalent n power to the Spectrum kernel for proten classfcaton. In defense of the HMM varant, the sze of the vector representaton produced by the spectrum kernel s sgnfcantly larger than the typcal representatons produced by our HMM varant. The Msmatch(5,1) kernel, used for SCOP 1.53 superfamly classfcaton (Table 5), s smlar to the Spectrum(5) kernel but also counts substrngs of length 5 that dffer by one amno acd resdue from those found n an observed sequence. The sze of the vector representaton assocated wth ths kernel can be up to Ths value s large compared to the largest vector representaton n our experments, whch s 400 for the HMM varant wth 20 hdden states. Nearly 1196

6 Dataset/Kernel HMM Varant Spectrum Class Fold (25 Categores) Fold (27 Categores) Superfamly Table 4: A comparson of results between the Spectrum kernel and the HMM varant under experments usng the multclass SVM formulaton. The HMM varant scores are the best performng from Table 2. Algorthm AUC AUC50 HMM Varant (best) Spectrum(2) [Lesle et al., 2002] Msmatch(5,1) [Lesle et al., 2003] Fsher [Jaakkola et al., 2000] SW-PSSM [Rangwala and Karyps, 2005] Table 5: A selecton of AUC and AUC50 scores for the Remote Homology Detecton problem usng a varety of SVM kernels on the SCOP 1.53, 25% dataset usng 1-vs-rest classfcaton. The HMM varant scores are the best performng from Table 3. all of these hgh-performng kernel methods, unlke the HMM varant, employ doman specfc knowledge, such as carefully tuned poston-specfc scorng matrces, to ad classfcaton. In contrast, the only parameter that needs to be adjusted n the HMM varant s the number of hdden states. 7 Conclusons and Future Work Our HMM varant s an extenson of the standard HMM that assgns ndvdual transton matrces to each sequence n a dataset but keeps a sngle emssons matrx for the entre dataset. We descrbe three nference algorthms, two of whch, a Baum-Welch-lke algorthm and a Gbbs samplng algorthm, are smlar to standard methods used to nfer HMM parameters. A thrd, the varatonal nference algorthm, s related to algorthms used for nference on topc models and more complex HMM extensons. We demonstrate, by comparng results on proten sequence classfcaton usng our method n conjuncton wth SVMs, that each of these algorthms nfers transton matrces that capture useful characterstcs of ndvdual sequences. Because our model fts wthn a large exstng body of work on generatve models, we are especally nterested n related models that perform classfcaton drectly. References [Ble et al., 2003] D.M. Ble, A.Y. Ng, and M.I. Jordan. Latent drchlet allocaton. The Journal of Machne Learnng Research, 3: , [Crammer and Snger, 2002] K. Crammer and Y. Snger. On the algorthmc mplementaton of multclass kernel-based vector machnes. The Journal of Machne Learnng Research, 2: , [Eddy, 1998] S. Eddy. Profle hdden markov models. Bonformatcs, 14(9): , [Hofmann, 1999] T. Hofmann. Probablstc latent semantc ndexng. In Proceedngs of the 22nd annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, pages ACM, [Jaakkola and Haussler, 1999] T.S. Jaakkola and D. Haussler. Explotng generatve models n dscrmnatve classfers. Advances n neural nformaton processng systems, pages , [Jaakkola et al., 2000] T. Jaakkola, M. Dekhans, and D. Haussler. A dscrmnatve framework for detectng remote proten homologes. Journal of Computatonal Bology, 7(1-2):95 114, [Jebara et al., 2004] T. Jebara, R. Kondor, and A. Howard. Probablty product kernels. The Journal of Machne Learnng Research, 5: , [Joachms, 1999] T. Joachms. SVMLght: Support Vector Machne. SVM-Lght Support Vector Machne joachms. org/, Unversty of Dortmund, [Kuang et al., 2004] R. Kuang, E. Ie, K. Wang, K. Wang, M. Sddq, Y. Freund, and C. Lesle. Profle-based strng kernels for remote homology detecton and motf extracton. Computatonal Systems Bonformatcs, pages , [Lesle et al., 2002] C. Lesle, E. Eskn, and W. S. Noble. The spectrum kernel: A strng kernel for svm proten classfcaton. Proceedngs of the Pacfc Symposum on Bocomputng, pages , [Lesle et al., 2003] C. Lesle, E. Eskn, W. S. Noble, and J. Weston. Msmatch strng kernels for svm proten classfcaton. Advances n Neural Informaton Processng Systems, 20(4): , [Murzn et al., 1995] A.G. Murzn, S.E. Brenner, T. Hubbard, and C. Chotha. SCOP: a structural classfcaton of protens database for the nvestgaton of sequences and structures. Journal of molecular bology, 247(4): , [Rabner and Juang, 1986] L. Rabner and B. Juang. An ntroducton to hdden Markov models. IEEE ASSp Magazne, 3(1 Part 1):4 16, [Rangwala and Karyps, 2005] H. Rangwala and G. Karyps. Profle-based drect kernels for remote homology detecton and fold recognton. Bonformatcs, 21(23):4239, [Rangwala and Karyps, 2006] Huzefa Rangwala and George Karyps. Buldng multclass classfers for remote homology detecton and fold recognton. BMC Bonformatcs, 7:455, [Scott, 2002] S.L. Scott. Bayesan methods for hdden Markov models: Recursve computng n the 21st century. Journal of the Amercan Statstcal Assocaton, 97(457): , [Smyth, 1997] P. Smyth. Clusterng sequences wth hdden Markov models. Advances n neural nformaton processng systems, pages ,

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Modeling Waveform Shapes with Random Effects Segmental Hidden Markov Models

Modeling Waveform Shapes with Random Effects Segmental Hidden Markov Models Modelng Waveform Shapes wth Random Effects Segmental Hdden Markov Models Seyoung Km, Padhrac Smyth Department of Computer Scence Unversty of Calforna, Irvne CA 9697-345 {sykm,smyth}@cs.uc.edu Abstract

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence 2nd Internatonal Conference on Software Engneerng, Knowledge Engneerng and Informaton Engneerng (SEKEIE 204) Text Smlarty Computng Based on LDA Topc Model and Word Co-occurrence Mngla Shao School of Computer,

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions Applcaton of Maxmum Entropy Markov Models on the Proten Secondary Structure Predctons Yohan Km Department of Chemstry and Bochemstry Unversty of Calforna, San Dego La Jolla, CA 92093 ykm@ucsd.edu Abstract

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines An Evaluaton of Dvde-and-Combne Strateges for Image Categorzaton by Mult-Class Support Vector Machnes C. Demrkesen¹ and H. Cherf¹, ² 1: Insttue of Scence and Engneerng 2: Faculté des Scences Mrande Galatasaray

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Applying Continuous Action Reinforcement Learning Automata(CARLA) to Global Training of Hidden Markov Models

Applying Continuous Action Reinforcement Learning Automata(CARLA) to Global Training of Hidden Markov Models Applyng Contnuous Acton Renforcement Learnng Automata(CARLA to Global Tranng of Hdden Markov Models Jahanshah Kabudan, Mohammad Reza Meybod, and Mohammad Mehd Homayounpour Department of Computer Engneerng

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

The Rate Adapting Poisson Model for Information Retrieval and Object Recognition

The Rate Adapting Poisson Model for Information Retrieval and Object Recognition for Informaton Retreval and Object Recognton Peter V. Gehler PGEHLER@TUEBINGEN.MPG.DE Max Planck Insttute for Bologcal Cybernetcs, Spemannstrasse 38, 72076 Tübngen, Germany Alex D. Holub HOLUB@VISION.CALTECH.EDU

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article Avalable onlne www.jocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2512-2520 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 Communty detecton model based on ncremental EM clusterng

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification Fast Sparse Gaussan Processes Learnng for Man-Made Structure Classfcaton Hang Zhou Insttute for Vson Systems Engneerng, Dept Elec. & Comp. Syst. Eng. PO Box 35, Monash Unversty, Clayton, VIC 3800, Australa

More information

From Comparing Clusterings to Combining Clusterings

From Comparing Clusterings to Combining Clusterings Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Colour Image Segmentation using Texems

Colour Image Segmentation using Texems XIE AND MIRMEHDI: COLOUR IMAGE SEGMENTATION USING TEXEMS 1 Colour Image Segmentaton usng Texems Xanghua Xe and Majd Mrmehd Department of Computer Scence, Unversty of Brstol, Brstol BS8 1UB, England {xe,majd}@cs.brs.ac.u

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning Journal of Computer Scence 7 (3): 400-408, 2011 ISSN 1549-3636 2011 Scence Publcatons SRBIR: Semantc Regon Based Image Retreval by Extractng the Domnant Regon and Semantc Learnng 1 I. Felc Raam and 2 S.

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Learning to Detect Information Outbreaks in Social Networks

Learning to Detect Information Outbreaks in Social Networks Learnng to Detect Informaton Outbreaks n Socal Networks Jayuan Ma jayuanm@stanford.edu Stanford Unversty Xncheng Zhang xnchen2@stanford.edu Stanford Unversty 1. INTRODUCTION Ths s the nformaton age. Everyday

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Bayesian Classifier Combination

Bayesian Classifier Combination Bayesan Classfer Combnaton Zoubn Ghahraman and Hyun-Chul Km Gatsby Computatonal Neuroscence Unt Unversty College London London WC1N 3AR, UK http://www.gatsby.ucl.ac.uk {zoubn,hckm}@gatsby.ucl.ac.uk September

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Query classification using topic models and support vector machine

Query classification using topic models and support vector machine Query classfcaton usng topc models and support vector machne Deu-Thu Le Unversty of Trento, Italy deuthu.le@ds.untn.t Raffaella Bernard Unversty of Trento, Italy bernard@ds.untn.t Abstract Ths paper descrbes

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

An Ensemble Learning algorithm for Blind Signal Separation Problem

An Ensemble Learning algorithm for Blind Signal Separation Problem An Ensemble Learnng algorthm for Blnd Sgnal Separaton Problem Yan L 1 and Peng Wen 1 Department of Mathematcs and Computng, Faculty of Engneerng and Surveyng The Unversty of Southern Queensland, Queensland,

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Factor Graphs for Region-based Whole-scene Classification

Factor Graphs for Region-based Whole-scene Classification Factor Graphs for Regon-based Whole-scene Classfcaton Matthew R. Boutell Jebo Luo Chrstopher M. Brown CSSE Dept. Res. and Dev. Labs Dept. of Computer Scence Rose-Hulman Inst. of Techn. Eastman Kodak Company

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE Journal of Theoretcal and Appled Informaton Technology 30 th June 06. Vol.88. No.3 005-06 JATIT & LLS. All rghts reserved. ISSN: 99-8645 www.jatt.org E-ISSN: 87-395 RECOGNIZING GENDER THROUGH FACIAL IMAGE

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Learning to Classify Documents with Only a Small Positive Training Set

Learning to Classify Documents with Only a Small Positive Training Set Learnng to Classfy Documents wth Only a Small Postve Tranng Set Xao-L L 1, Bng Lu 2, and See-Kong Ng 1 1 Insttute for Infocomm Research, Heng Mu Keng Terrace, 119613, Sngapore 2 Department of Computer

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information