Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

Size: px
Start display at page:

Download "Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction"

Transcription

1 Georga State Unversty Georga State Unversty Computer Scence Dssertatons Department of Computer Scence Clusterng System and Clusterng Support Vector Machne for Local Proten Structure Predcton We Zhong Follow ths and addtonal works at: Recommended Ctaton Zhong, We, "Clusterng System and Clusterng Support Vector Machne for Local Proten Structure Predcton." Dssertaton, Georga State Unversty, Ths Dssertaton s brought to you for free and open access by the Department of Computer Scence at Georga State Unversty. It has been accepted for ncluson n Computer Scence Dssertatons by an authorzed admnstrator of Georga State Unversty. For more nformaton, please contact scholarworks@gsu.edu.

2 CLUSTERING SYSTEM AND CLUSTERING SUPPORT VECTOR MACHINE FOR LOCAL PROTEIN STRUCTURE PREDICTION by We Zhong Under the Drecton of Y Pan ABSTRACT Proten tertary structure plays a very mportant role n determnng ts possble functonal stes and chemcal nteractons wth other related protens. Expermental methods to determne proten structure are tme consumng and expensve. As a result, the gap between proten sequence and ts structure has wdened substantally due to the hgh throughput sequencng technques. Problems of expermental methods motvate us to develop the computatonal algorthms for proten structure predcton. In ths work, the clusterng system s used to predct local proten structure. At frst, recurrng sequence clusters are explored wth an mproved K-means clusterng algorthm. Carefully constructed sequence clusters are used to predct local proten structure. After obtanng the sequence clusters and motfs, we study how sequence varaton for sequence clusters may nfluence ts structural smlarty.

3 Analyss of the relatonshp between sequence varaton and structural smlarty for sequence clusters shows that sequence clusters wth tght sequence varaton have hgh structural smlarty and sequence clusters wth wde sequence varaton have poor structural smlarty. Based on above knowledge, the establshed clusterng system s used to predct the tertary structure for local sequence segments. Test results ndcate that hghest qualty clusters can gve hghly relable predcton results and hgh qualty clusters can gve relable predcton results. In order to mprove the performance of the clusterng system for local proten structure predcton, a novel computatonal model called Clusterng Support Vector Machnes (CSVMs) s proposed. In our prevous work, the sequence-to-structure relatonshp wth the K-means algorthm has been explored by the conventonal K-means algorthm. The K-means clusterng algorthm may not capture nonlnear sequence-to-structure relatonshp effectvely. As a result, we consder usng Support Vector Machne (SVM) to capture the nonlnear sequence-tostructure relatonshp. However, SVM s not favorable for huge datasets ncludng mllons of samples. Therefore, we propose a novel computatonal model called CSVMs. Takng advantage of both the theory of granular computng and advanced statstcal learnng methodology, CSVMs are bult specfcally for each nformaton granule parttoned ntellgently by the clusterng algorthm. Compared wth the clusterng system ntroduced prevously, our expermental results show that accuracy for local structure predcton has been mproved notceably when CSVMs are appled. INDEX WORDS: K-means clusterng algorthm, PISCES (Proten Sequence Cullng Server), HSSP (Homology-Derved Secondary Structure of Protens), sequence motf, hydrophobcty ndex, evolutonary dstance, PDB (Proten Data Bank), SVM (Support Vector Machne), proten structure predcton, granular computng.

4 CLUSTERING SYSTEM AND CLUSTERING SUPPORT VECTOR MACHINE FOR LOCAL PROTEIN STRUCTURE PREDICTION by WEI ZHONG A Dssertaton Submtted n Partal Fulfllment of Requrements for the Degree of Doctor of Phlosophy In the College of Arts and Scences Georga State Unversty 2006

5 Copyrght by We Zhong 2006

6 CLUSTERING SYSTEM AND CLUSTERING SUPPORT VECTOR MACHINE FOR LOCAL PROTEIN STRUCTURE PREDICTION by WEI ZHONG Major Professor: Commttee: Y Pan Phang C. Ta Robert Harrson Martn D. Fraser Electronc Verson Approved: Offce of Graduate Studes College of Arts and Scences Georga State Unversty August 2006

7 v ACKNOWLEDEMENTS The dssertaton would not have been possble wthout the help of so many people. I would lke to take ths opportunty to express my deep apprecaton to all those who helped me n ths hard but extremely rewardng process. Frst, I would lke to thank my advsor, Professor Y Pan, for all hs help, advce, support, gudance, and patence. Whenever I have dffcultes and problems, Dr. Pan always encourages me to make harder efforts so that I can overcome those problems. Dr. Pan provdes me so many valuable advces about how to conduct research and choose my career. Wthout hs help, I could not make rapd progress n my research and dssertaton wrtng. I am grateful to Dr. Robert Harrson, Dr. Phang. C. Ta, and Dr. Martn D. Fraser for servng on my Ph.D commttee, and for ther tme and cooperaton n revewng ths work. Dr. Harrson provdes me nsghts nto how to cluster the sequence segments and how to express my deas clearly n my paper. I would lke to thank Dr. Fraser for gvng me many valuable suggestons about statstcal technques for SVM learnng. I would lke to thank Dr. Ta for provdng me a lot of bologcal knowledge so that I can combne the computatonal methods wth bologcal experments smoothly. I would lke to thank Professor Roland L. Dunbrack for provdng the data set from PISCES. Ths research was supported n part by the U.S. Natonal Insttutes of Health under grants R01 GM S1, and P20 GM A1, and the U.S. Natonal Scence Foundaton under grants ECS , and ECS I am also supported by a Georga State Unversty Molecular Bass of Dsease Program Fellowshp.

8 v I would lke to thank Dr. Raj Sunderraman for hs great patence and support durng my job search and Ph.D. study. The more mportant thanks are reserved for the last. Many thanks to my parents, Zhong Shunlong and We Shufang, for ther constant support, concern, and motvaton. My parents gve many mportant advces about how to lead a successful lfe durng my Ph.D. study.

9 v TABLE OF CONTENTS ACKNOWLEDEMENTS... v TABLE OF CONTENTS...v LIST OF FIGURES... x LIST OF TABLES... XI LIST OF ACRONYMS...XII Chapter 1 Introducton Research Motvatons and Contrbutons Local Proten Structure Predcton Clusterng System for Local Proten Structure Predcton Clusterng Support Vector Machne for Local Proten Structure Predcton Dssertaton Organzaton Chapter 2 Proten Structure Predcton Proten Structure Representatons and Proten Structure Determnaton Comparatve Homology Modelng Threadng or Fold Recognton Ab Into Methods Chapter 3 Dscovery of Sequence Clusters and Sequence Motfs wth Improved K-means Algorthms Several Major Motf Dscovery Methods K-means Clusterng Algorthms Tradtonal K-means Clusterng Algorthm New Greedy Intalzaton Method for the K-means Algorthm Experment Setup Expermental Parameters Data Set Generaton and Representaton of Sequence Segments Evolutonary Dstance and Cluster Membershp Calculaton for Sequence Segments Secondary Structure Assgnment Measure of Structural Smlarty for a Gven Cluster Evaluaton of Performance for the Clusterng Algorthm and Generaton of Frequency Profles for Sequence Motfs Expermental Results Comparson of Performance for the Tradtonal and Improved K-means Algorthm Sequence Motfs Result Comparson wth Other Research Chapter 4 Parallel K-means Algorthm usng Pthread and OpenMP over Hyper-Threadng Technology Parallelzaton Hyper-Threadng Technology Pthread and OpenMP Programmng Envronment and Implementaton Detals Comparng Pthread and OpenMP Implementatons Chapter 5 Relatonshp between Sequence Varaton and Correspondng Structural Smlarty for Sequence Clusters and Sequence Motfs... 66

10 5.1 Prevous Studes for Sequence and Structural Varaton of Sequence Clusters Expermental Setup Recurrent Clusterng Data Set Clusterng of Sequence Segments n the Sequence Space Generaton of Frequency Profle for Sequence Clusters Evaluaton of Dstrbuton of Amno Acd for Each Poston of Frequency Profle Measure of Sequence Varaton for a Gven Sequence Cluster Measure of Sequence Varaton by Average of Relatve Entropy Values for All Poston of Sequence Profles Measure of Sequence Varaton by the Number of Important Postons for Sequence Profles Measure of Secondary Structure Smlarty for a Gven Sequence Cluster Measure of Tertary Structural Varaton by dmrsms_sc for A Gven Sequence Cluster Average Dstance Matrx (ADM) among Sequence Segments for a Gven Sequence Cluster dmrmsd_sc for a Gven Sequence Cluster Results Analyss Chapter 6 Local Proten Structure Predcton by the Clusterng System Data Set and Sequence Segment Generaton Tranng Set and Independent Test Set Clusterng of Sequence Segments Belongng to the Tranng Set Representatve Structure for Each Cluster Representatve Secondary Structure Average Dstance Matrx (ADM) Representatve Torson Angle Local Structure Predcton by the Clusterng System Dstance score of a Gven Sequence Segment for Each Cluster Relablty Score of Each Cluster for a Gven Sequence Segment Structure Predcton by Dstance Score and Relablty Score for a Gven Sequence Segment Predcton Accuracy Calculaton Secondary Structure Accuracy Dstance Matrx Root Mean Square Devaton (dmrmsd) Torson Angle RMSD (tarmsd) Classfcaton of Clusters nto Dfferent Groups Expremental Results Chapter 7 Support Vector Machne Optmal Hyperplane for Separable Case Optmzaton Problem to Buld Optmal Hyperplane Some Propertes of Hyperplane and One Algorthm to buld Optmal Hyperplane Optmal Hyperplane for Nonseparable Sets Δ-margn Separatng Hyperplanes Soft Margn Generalzaton Expected Rsk Bounds for Optmal Hyperplane v

11 7.4 Mercer s Theorem to Deal wth Hgh Dmensonalty Fundamental Concept of SVM Mercer s Theorem for Hgh Dmensonalty Constructon of SVM Constructng SVM wth Quadratc Optmzaton Constructng SVM usng Lnear Optmzaton Method SVM Kernels Selecton of SV Machne Usng Bounds Polynomal Functons Radal Bass Functons Two-layer Neural Networks Multclass Classfcaton Chapter 8 Implementaton of SVM for a Very Large Dataset Frst Class of Algorthms for Large Dataset Decomposton Algorthm Sequental Mnmal Optmzaton (SMO) Boostng Algorthm to Scale up SVM Second Class of Algorthm for Large Dataset Tranng Random Selecton Actve Learnng wth SVM Classfyng Large Datasets usng SVM wth Herarchcal Clusters Chapter 9 Clusterng Support Vector Machnes for Proten Local Structure Predcton Revew of Prevous Work Method Granular Computng K-means Clusterng Algorthm as the Granulaton Method Generaton of Sequence Segments by the Sldng Wndow Method Dstance Score and Relablty Score of a Gven Sequence Segment Cluster Membershp Assgnment for Each Sequence Segment Support Vector Machne Clusterng Support Vector Machnes (CSVMs) Advantages of CSVMs Tranng CSVMs for Each Cluster Local Proten Structure Predcton by CSVMs Expermental Setup Tranng Set and Independent Test Set Predcton Accuracy Calculaton for Each Sequence Segment Classfcaton of Clusters nto Dfferent Groups Expermental Results and Analyss Average Accuracy, Precson and Recall of CSVMs for Dfferent Cluster Group Comparson of Independent Predcton Accuracy for Dfferent Cluster Groups n Terms of Three Metrcs between the Clusterng Algorthm and the CSVM Model Comparson of Accuracy Crtera One and Accuracy Crtera Two between the Clusterng System and the CSVMs Model Summary Chapter 10 Conclusons and Future Work v

12 Bblography x

13 x LIST OF FIGURES Fgure 1. Two Physcal Processors and Four Logcal Processors Fgure 2. Four Physcal Processors Behavng Lke Eght Logcal Processors Fgure 3. Implementaton Detals of Fve Pthreads Fgure 4. Pthread Code Fgure 5. OpenMP Code Fgure 6. Speedup Values for Pthread and OpenMp Fgure 7 Relatonshp between Varablty and the Relatve Entropy for Each Poston of Sequence Profles for Sequence Cluster Fgure 8 Percentages of Sequence Clusters wth the Specfed Number of Important Postons n the Specfed Ranges of Secondary Structure Smlarty Fgure 9 Comparson of the Important Postons between the Percentage of Clusters Wth dmrmsd_sc > 2.0 Å and the Percentage of Clusters Wth dmrmsd_sc < 1.5 Å Fgure 10 Secondary Structure Accuracy for the Clusterng System Fgure 11 dmrmsd for the Clusterng System Fgure 12 tarmsd for the Clusterng System Fgure 13 Accuracy Crtera One for the Clusterng System Fgure 14 Accuracy Crtera Two for the Clusterng System Fgure 15 The CSVMs Model Fgure 16 Comparson of Accuracy, Precson and Recall of CSVMs Fgure 17 Comparson of Secondary Structure Accuracy between the Clusterng System and CSVMs Model Fgure 18. Comparson of dmrmsd between the Clusterng System and CSVMs Model Fgure 19. Comparson of tarmsd between the Clusterng System and CSVMs Model Fgure 20. Comparson of Accuracy Crtera One between the Clusterng System and The CSVMs Model for Dfferent Cluster Groups Fgure 21. Comparson of Accuracy Crtera Two between the Clusterng System and The CSVMs Model for Dfferent Cluster Groups

14 XI LIST OF TABLES Table 1. Comparson of the Percentage of Sequence Segments Belongng to Clusters wth Hgh Structural Smlarty Table 2. Comparson of the Number of Clusters wth Hgh Structural Smlarty Table 3 Standard to Classfy Clusters nto Dfferent Groups Table 4 the Threshold for Evaluatng Accuracy Crtera One and Accuracy Crtera Two for Each Cluster... 85

15 XII LIST OF ACRONYMS Clusterng Support Vector Machnes Support Vector Machne Nuclear Magnetc Resonance The Basc Local Algnment Search Tool Proten Structure Predcton and Evaluaton Computer Toolkt Crtcal Assessment of Technques for Proten Structure Predcton Poston Specfc Iteratve BLAST Poston Specfc Scorng Matrx Structural Classfcaton of Protens Root Mean Squared Devaton Hdden Markov Models Proten Sequence Cullng Server Homology-derved Secondary Structure of Protens Proten Data Bank Smultaneous Multthreadng Average Dstance Matrx Sequental Mnmal Optmzaton Dstance Matrx Root Mean Square Devaton Torson Angle Root Mean Square Devaton CSVMs SVM NMR BLAST PROSPECT CASP PSI-BLAST PSSM SCOP RMSD HMM PISCES HSSP PDB SMT ADM SMO dmrmsd tarmsd

16 1 Chapter 1 Introducton 1.1 Research Motvatons and Contrbutons Local Proten Structure Predcton Protens are polymers of amno acds connected by formaton of covalent peptde bonds. Protens have four levels of structures ncludng prmary structure, secondary structure, tertary structure and quaternary structure. Based on hydrogen bondng nteractons between adjacent amno acd resdues, the polypeptde chan can arrange tself nto secondary structure. The polypeptde chans of proten molecules fold nto the natve structure. Multple nteractng polypeptde chans of characterstcs tertary structure develop nto proten quaternary structure. Proten structure can be determned expermentally by X-ray crystallography, Nuclear Magnetc Resonance (NMR) and electron mcroscopy. When X-ray crystallography s appled, crystallsaton of protens s a very dffcult task. Compared to X-ray crystallography, experments related to NMR are carred out n soluton rather than a crystal lattce. However, NMR can only be applcable to determne structures of small and medums-szed molecules due to lmtaton of the prncple that make NMR possble. Knowledge about proten functons can be used to nfer how the proten nteracts wth other molecules. The proten functons are largely determned by ther structures. As a result, understandng proten structures s a very mportant task. Determnaton of proten structure by expermental methods s a long and tedous process. Dffcultes of determnng proten

17 2 structures expermentally requre us to predct proten structures usng computatonal methods. Comparatve homology modelng, threadng, and Ab Into method are three major methods for proten structure predcton. The classfcaton of these three major methods s based on how each method utlzes the avalable resources n the current database. Comparatve homology modelng produces the best predcton results so far. The tertary structure and functons are hghly conserved durng the evolutonary process. As a result, proten sequences wth hgh sequence smlarty usually share smlar structures. The predcton accuracy of homology modelng depends on whether proten sequences n the proten data bank that have hgh sequence smlarty wth target proten sequences can be found. Sequence algnment algorthms are used to fnd proten sequences sharng hgh smlarty wth target sequences whose structure to be predcted. Based on sequence algnment algorthms, the algned resduals of the structure templates from proten sequences sharng hgh smlarty wth target sequences are used to construct the structural model. In ths process, the qualty of sequence algnment algorthms s the key factor to determne whether sutable structural templates can be selected and how well the target proten can be algned wth structural templates. For the comparatve homology modelng, local sequence algnment s used to fnd out segments of the proten sequences wth hgh smlarty. Local sequence algnment ncludes parwse algnment and profle-based algnment. Profle-based methods perform much better than the parwse comparson such as the Basc Local Algnment Search Tool (BLAST) when sequence smlarty s less than 30%. If sequence algnment algorthms cannot fnd correct folds for the target sequence, threadng or fold recognton can be utlzed to provde the correct folds to the target sequence. Based on the concept that only a small number of dstnct proten folds exst for proten famles, a lbrary

18 3 of representatve local structures s scanned n order to fnd structure analogs to proten sequences. After the lbrary s set up, the energy functon s used to select the sutable lbrary entres servng as the templates for target sequences. Proten Structure Predcton and Evaluaton Computer Toolkt (PROSPECT) s one of the best threadng programs n the Crtcal Assessment of Technques for Proten Structure Predcton (CASP) competton (Xu et. al., 2001). The threadng methods are computatonally expensve because each entry of the lbrary havng thousands of possble folds s requred to be algned n all possble ways. The energy functon used n threadng methods are not sophstcated enough to fnd the correct proten folds. Ab Into methods can be used to predct proten structures from the sequence nformaton when approprate structure templates cannot be found. Most Ab Into predcton methods restrct the conformaton space to the reasonable sze usng reduced proten representaton and select those energy functons related to the most mportant nteractons responsble for proten foldng n ts natve form Clusterng System for Local Proten Structure Predcton Recurrng sequence motfs of protens are explored wth an mproved K-means clusterng algorthm. Informaton about local proten sequence motfs s very mportant to the analyss of bologcally sgnfcant conserved regons of proten sequences. These conserved regons can potentally determne the dverse conformaton and actvtes of protens. Carefully constructed sequence motfs from sequence clusters are used to predct local proten structure. PROSITE, PRINTS and PFAM are popular methods to create sequence motfs. Snce sequence motfs and profles of PROSITE, PRINTS and PFAM are developed from multple sequence algnments, these sequence motfs and profles only search conserved elements of

19 4 sequence algnments from the same proten famly and carry lttle nformaton about conserved sequence regons, whch transcend proten famles. Furthermore, the knowledge about the bologcally mportant regons or resdues s the precondton of fndng these motfs. As a result, the dscovery of sequence motfs and profles requres ntensve human nterventon. Whle these methods to produce the popular sequence motfs requre human nterventon to explore the bologcally sgnfcant regons of proten sequences, the clusterng technque provdes an automatc, unsupervsed dscovery process. All these advantages, n comparson to these methods to create popular sequence motfs, motvate us to develop an mproved K-means clusterng algorthm. Han and Baker have used the K-means clusterng program to fnd recurrng local sequence motfs for protens (Han and Baker, 1995; Han and Baker, 1996). In ther work, a set of ntal ponts for cluster centers s chosen randomly (Han and Baker, 1995). Snce the performance of K-means clusterng s very senstve to ntal pont selecton (Jan, Murty and Flynn, 1999), ther technque may not yeld satsfactory results. To overcome potental problems of random ntalzaton, the new greedy ntalzaton method tres to choose sutable ntal ponts so that fnal parttons can represent the underlyng dstrbuton of the data samples more consstently and accurately (Zhong et.al, 2004b). Each ntal pont s represented by one local sequence segment. In the new ntalzaton method, the clusterng algorthm wll only be performed for several teratons durng each run. After each run, ntal ponts, whch can be used to form the cluster wth good structural smlarty, are chosen and ther evolutonary dstance s checked aganst that of all ponts already selected n the ntalzaton array. If the mnmum evolutonary dstance of new ponts s greater than the specfed dstance, these ponts wll be added to the ntalzaton array. Satsfacton of the mnmum evolutonary dstance can guarantee that each

20 5 newly selected pont wll be well separated from all the exstng ponts n the ntalzaton array and wll potentally belong to dfferent natural clusters. Ths process wll be repeated several tmes untl the specfed number of ponts s chosen. After ths procedure, these carefully selected ponts can be used as the ntal centers for the K-means clusterng algorthm. Analyss of the clusterng process of the tradtonal clusterng algorthm reveals that some of the ntal ponts are very close to each other, creatng strong nterferences wth each other. Strong nterferences among ntal ponts wll affect fnal parttonng negatvely. The results of our mproved K-means algorthm show the average percentage of sequence segments belongng to clusters wth structural smlarty greater than 60% steadly mproves wth ncreasng mnmum evolutonary dstances among ntal ponts. Ths mproved percentage results from decreased nterferences among ntal ponts when the evolutonary dstances among ntal ponts are ncreased. Comparson between sequences motfs obtaned by both algorthms suggests that the mproved K-means clusterng algorthm may dscover some relatvely weak and subtle sequence motfs. These motfs are undetectable by the tradtonal K-means algorthm because random selecton of ponts may choose two startng ponts that are wthn one natural cluster. For example, some of the weak amphpathc helces and sheets dscovered by the mproved K-means algorthm have not been reported n the lterature. In addton, the number of repeated substtuton patterns of sequence motfs found by the tradtonal K-means algorthms s less than that of the mproved K-means algorthms. Our results reveal much more detaled hydrophobcty patterns for helces, sheets and cols than the prevous study (Han and Baker, 1995). These elaborate hydrophobcty patterns are supported by varous bochemcal experments. Increased nformaton about hydrophobcty patterns assocated wth these sequence motfs can expand our knowledge of how protens fold

21 6 and how protens nteract wth each other. Furthermore, the analyss of dscovered sequence motfs shows that some elaborate and subtle sequence patterns such as Pattern 1, 9, 22 have never been reported n prevous works. Especally, ncreased number of repeated substtuton patterns reported n ths study may provde addtonally strong evdences for structurally conservatve substtutons durng the evolutonary process for proten famles. The sequence motfs dscovered n ths study ndcate conserved resdues that are structurally and functonally mportant across proten famles because proten sequences used n ths study share less than 25% sequence denttes. These mportant features from our sequence motfs may help to compensate for some of the weak ponts of those created by PROSITE, PRINTS, PFAM and BLOCKS (Attwood et al., 2002; Henkoff, Henkoff and Petrokovsk, 1999; Sonnhammer et.al., 1998). Our sequence motfs may reflect general structural or functonal characterstcs shared by dfferent proten famles whle sequence motfs from PROSITE, PRINTS, PFAM and BLOCKS represent structural or functonal constrants specfc to a partcular proten famly. Due to the hgh throughput sequencng technques, the number of known proten sequences has ncreased rapdly n recent years. However, nformaton about functonally sgnfcant regons of these new protens may not be avalable. As a result, automatc dscovery of bologcally mportant sequence motfs n ths study s a much more powerful tool to explore underlyng correlatons between proten sequences, structures and functons than other methods requrng gudance from exstng scentfc results. In our study, the cluster number of 800 s chosen emprcally. However, 800 may not be the optmal cluster number. Therefore, the mproved K-means algorthm wll be run several tmes wth dfferent values of k n order to dscover the most sutable number of clusters. Wth the nformaton about the optmal cluster number, clusterng results may be potentally closest to

22 7 underlyng dstrbuton patterns of the sample space. However, the tme spent searchng for the good ntal ponts grows substantally when the mnmum evolutonary dstance and structural smlarty threshold are ncreased. For example, t wll take 18 days to obtan approprate ntal ponts wth the dstance threshold of 1500 when the sample sze s very large. Due to the tme and processng power constrants, the search for the optmal cluster number has not been completed. The long searchng tme for ntal ponts motvates us to mplement the parallel K- means algorthm n order to reduce the searchng tme for sutable ntal ponts to one to two days. The parallelzaton of the mproved K-means algorthm wll make exploraton of the optmal cluster number possble. We predct that the performance gans for the mproved K- means algorthm wll be ncreased further after the optmal cluster number s found. As a result, Pthread and OpenMP are employed to parallelze K-means clusterng algorthm n the Hyper- Threadng enabled Intel archtecture. Speedup for 16 Pthreads s 4.3 and speedup for 16 OpenMP threads s 4 n the 4 processors shared memory archtecture. Wth the new parallel K-means algorthm, K-means clusterng can be performed for multple tmes n reasonable amount of tme. Our research also shows that Hyper-Threadng technology for Intel archtecture s effcent for ths parallel bologcal algorthm. After we propose an mproved K-means clusterng algorthm to dscover the sequence clusters and sequence motfs automatcally and to mplement the parallel K-means clusterng algorthm, we want to dscuss how sequence varaton for sequence clusters may nfluence ts structural smlarty. Analyss of the relatonshp between the sequence varaton and correspondng structural varaton for sequence clusters s one of open questons for proten structure and sequence analyss (Rahman and Zomaya, 2005). Some researchers have evaluated the structural varaton for sequence clusters. Kasuya and Thornton (1999) and Jonassen et al.

23 8 (1999) have used crmsd to analyze structural varaton for sequence motfs. Bystroff and Baker (1998) have used the K-means clusterng algorthm to fnd sequence clusters and to assess structural varaton for these sequence clusters. Bystroff and Baker ncorporated structural nformaton durng the clusterng process (1998). As a result, fnal sequence clusters are contamnated by usage of structural nformaton durng the clusterng process. Our mplementaton of the K-means clusterng s sgnfcantly dfferent from Bystroff s work (1998) because we only use recurrent clusters and do not nclude structural nformaton n the clusterng process. To the best of our knowledge, no researchers have conducted n-depth analyss of the relatonshp between sequence varaton and correspondng structural varaton for sequence clusters (Zhong et.al, 2005a). Ths work focuses on systematc and detaled analyss of the relatonshp between sequence varaton and correspondng structural varaton for sequence clusters. Understandng ths relatonshp s very mportant to mprove the qualty of local sequence algnment and low homology proten foldng. Sequence clusters wth tght sequence varaton can be used to establsh structural templates for low homology proten foldng. Frequency profle of sequence clusters wth tght sequence varaton also can be used to fnd sequence segments wth smlar local structure n the local sequence algnment algorthm. Snce the average of relatve entropy values for all postons of frequency profles cannot determne the sequence varaton for sequence clusters, we use the number of mportant poston to defne the sequence varaton for sequence clusters. If the relatve entropy n the specfed poston of the frequency profles s greater than 0.2, ths poston s defned as the mportant poston for frequency profles. Our statstcs ndcate that an average of fve amno acds occupy 60% of the frequency space f the relatve entropy n that poston of the frequency profles s

24 9 greater than 0.2. Statstcally, each of twenty amno acds may occur wth the frequency of 5%. Therefore, fve amno acds may occupy 25% of the frequency space. As a result, the dstrbuton of amno acds s hghly dsproportonate n the mportant postons. The number of mportant postons s used to ndcate the extent of sequence varaton for sequence clusters. Increased number of mportant postons n the frequency profles reflects more postons n the frequency profles have hghly dsproportonate dstrbuton of 20 amno acds. As a result, sequence varaton for sequence clusters s more compact. In contrast, relatvely small number of mportant poston ndcates the sequence varaton for sequence clusters s wde. Our results ndcate that defnng sequence varaton for sequence clusters by the number of mportant poston s more effectve n dstngushng the sequence clusters wth hgh structural varaton and low structural varaton. The sequence varaton and structural varaton of sequence clusters havng sequence segments wth the specfed length are analyzed separately. The length of sequence segments ranges from 5 to 15 n our study. Sequence clusters havng sequence segments wth dfferent lengths show the smlar relatonshp between sequence varaton and structure varaton for sequence clusters. Due to lmtaton of space, we focus on the sequence cluster contanng sequence segments wth the length of nne. All the results shown n the followng are related to the sequence clusters havng sequence segments wth the length of nne. Analyss of our results reveals that on average, the number of mportant postons for clusters wth low structural varaton s greater than the number of mportant postons for clusters wth hgh structural varaton. Low structural varaton for sequence clusters ndcates that structural varaton s compact. A large number of mportant postons ndcate that sequence varaton for sequence clusters s tght. In other words, our results ndcate the mportant pattern that sequence

25 10 clusters wth tght sequence varaton tend to have tght structural varaton and sequence clusters wth wde sequence varaton tend to have wde structural varaton. After we explan the mproved K-means algorthm for sequence motf dscovery and how sequence varaton for sequence clusters may nfluence ts structural smlarty, the clusterng system s developed for local proten structure predcton. Our prelmnary results show that the sequence segments wth the length of nne are long enough to have some structural features and are short enough to have a statstcally sgnfcant number of samples. It s clear that other segment lengths are mportant and the analyss presented here can be appled to them as well. Due to huge amount of computaton, we plan to analyze the sequence segments from the length rangng from 5 to 15 n the next step. Average dstance matrx, representatve torson angle and representatve secondary structure are the representatve structure of each cluster. The frequency profle for a gven sequence segment s compared wth the centrod of the each cluster n order to calculate dstance score. A smaller dstance score shows that the frequency profle of the gven sequence segment s closer to the centrod for a gven cluster. The relablty score of a gven sequence segment for a cluster s determned by the sum of the frequency of the matched amno acd n the correspondng poston of the average frequency profle of a cluster. The dstance score of each cluster for a gven sequence segment s calculated n order to flter out some less sgnfcant cluster. If the dfference of the cluster s dstance score and the smallest dstance score s wthn 100, ths cluster s selected. Other clusters are dscarded snce they are less sgnfcant. The cluster wth the hghest relablty score among the selected clusters fnally functons to predct the structure of ths sequence segment. Our results ndcate that clusters wth hgh qualty provde the relable predcton results and clusters wth average

26 11 qualty produces hgh qualty results. Specal cauton need be taken aganst predcton results by the bad cluster group Clusterng Support Vector Machne for Local Proten Structure Predcton The central deas of support vector machnes are to map the nput space nto another hgher dmensonal feature space usng the kernels functon and to buld an optmal hyperplane n that feature space (Vapnk, 1998). One of mportant questons s that how we can buld the hyperplane that has strong generalzaton capablty n the hgh dmensonal feature space. The second queston s that how we can avod the curse of dmensonalty n ths hgh dmensonal feature space. The Mercer s Theorem helps us avod mappng the nput space nto another hgher dmensonal space explctly. Mercer s theorem ndcates that any kernel functon satsfyng Mercer s condton can calculate the nner product of two vectors n some hgh dmensonal Hlbert space. Based on Mercer theorem, the hgh-dmensonal feature space need not be consdered drectly durng the process of fndng the optmal hyperplane. Instead, the nner products between support vectors and the vectors n the feature space can be calculated. SVM has two layers. In the frst layer, nput vectors are mplctly transformed and each nner product between the nput vector and support vectors are calculated based on the kernel functon. In the second layer, the lnear decson functon s bult n the hgh dmensonal feature space. The best SV machne wth the smallest expected rsks has smallest VC dmenson. SVMs are based on the dea of mappng data ponts to a hgh dmensonal feature space where a separatng hyperplane can be found. SVMs are searchng the optmal separatng hyperplane by solvng a convex quadratc programmng (QP). The typcal runnng tme for the convex quadratc programmng s Ω (m 2 ) for the tranng set wth m samples. The convex quadratc programmng s NP-complete n the worst case (Vavass, 1991). Therefore, SVMs are not

27 12 favorable for a large dataset (Chang and Ln, 2001). Our dataset contans a half mllons samples. Expermental results show that tranng of SVM for a half mllons samples s not complete after one month on the poweredge6600 server wth four processors from Dell. Many algorthms and mplementaton technques have been developed to enhance SVMs n order to ncrease ther tranng performance wth large data sets. The most well-known technques nclude chunkng (Vapnk, 1998), Osuna s decomposton method (Osuna, Freund, and Gros, 1997), Sequental Mnmal Optmzaton (SMO) (Platt, 1999) and boostng algorthms (Pavlov, Mao and Dom, 2000). The success of these methods depends on dvdng the orgnal quadratc programmng (QP) problem nto a seres of smaller computatonal problems n order to reduce the sze of each QP problem. Although these algorthms accelerate the tranng process, these algorthms do not scale well wth the sze of the tranng data. The second class of algorthms tres to speed up the tranng process by reducng the number of tranng data. Snce some data ponts such as the support vectors are more mportant to determne the optmal soluton, these algorthms provde SVMs wth hgh qualty data ponts durng the tranng process. Random Selecton (Balcazar, Da and Watanabe, 2001) and clusterng analyss (Yu, Yang, and Han, 2000) are representatves of these algorthms. Ther algorthms are hghly scalable for the large data set whle the performance of tranng depends greatly on the selecton of tranng samples. In order to solve the problems related to large sample tranng, Clusterng Support Vector Machnes are proposed n ths work. Understandng proten sequence-to-structure relatonshp s one of the most mportant tasks of current bonformatcs research. The knowledge of correspondence between the proten sequence and ts structure can play very mportant role n proten structure predcton (Rahman and Zomaya, 2005). Han and Baker have used the K-means

28 13 clusterng algorthm to explore proten sequence-to-structure relatonshp. Proten sequences are represented wth frequency profles. Wth the K-means clusterng algorthm, hgh qualty sequence clusters have been produced (Han and Baker, 1996). They have used these hgh qualty sequence clusters to predct the backbone torson angles for local proten structure (Bystroff and Baker, 1998). In ther work and our prevous works, the K-means clusterng algorthm s essental to understand how proten sequences correspond to local 3D proten structures. However, the conventonal clusterng algorthms such as the K-means and K-nearest neghbor algorthm assume that the dstance between data ponts can be calculated wth exact precson. When ths dstance functon s not well characterzed, the clusterng algorthm may not reveal the sequence-to-structure relatonshp effectvely. As a result, some of clusters provde poor correspondence between proten sequences and ther structures. SVM can handle the nonlnear classfcaton by mplctly mappng nput samples from the nput feature space nto another hgh dmensonal feature space wth the nonlnear kernel functon. Therefore, SVM may be more effectve to reveal the nonlnear sequence-to-structure relatonshp than K-means clusterng does. The superor performance for non-lnear classfcaton nspres us to explore the relatonshp between the proten sequence and ts structure wth SVM. Tranng SVM over the whole feature space contanng almost half mllon data samples takes a long tme. Furthermore, each subspace of the whole feature space corresponds to dfferent local 3D structures n our applcaton. As a result, constructon of one SVM for the whole feature space cannot take advantage of the strong generalzaton power of SVM effcently. The dsadvantage of buldng one SVM over the whole feature space motvates us to consder the theory of granular computng.

29 14 Granular computng decomposes nformaton n the form of some aggregates such as subsets, classes, and clusters of a unverse and then solves the targeted problems n each granule (Yao, 2004). Granular constructon and computng are two major tasks of granular computng (Yao, 2005). Granular computng conceptualzes the whole feature space at dfferent granulartes and swtch among these granulartes (Yao, 2004). Wth the prncples of dvde-and-conquer, granular computng breaks up the complex problems nto smaller and computatonally smpler problems and focuses on each small problem by omttng unnecessary and rrelevant nformaton. As a result, granular computng can ncrease ntellgence and flexblty of data mnng algorthms. To combne the theory of granular computng and prncples of the statstcal learnng algorthms, we propose a new computatonal model called Clusterng Support Vector Machnes (CSVMs) n our work. In ths new computatonal model, one SVM s bult for each nformaton granule defned by sequence clusters created by the clusterng algorthm. CSVMs are modeled to learn the nonlnear relatonshp between proten sequences and ther structures n each cluster. SVM s not favorable for large amount of data samples. However, CSVMs can be easly parallelzed to speed up the modelng process. After ganng the knowledge about the sequence to structure relatonshp, CSVMs are used to predct dstance matrces, torson angles and secondary structures for backbone α-carbon atoms of proten sequence segments. Compared wth the clusterng system ntroduced prevously, CSVMs can estmate how close frequency profles of proten sequences correspond wth local 3D structures by usng the nonlnear kernel. Introducton of CSVMs can potentally mprove the accuracy of local proten structure predcton.

30 15 CSVMs are bult from nformaton granules, whch are ntellgently parttoned by clusterng algorthms. Intellgent parttonng by clusterng algorthms provdes true and natural representatons of nherent data dstrbuton of the system. Because of data parttonng, a complex classfcaton problem s converted nto multple smaller problems so that learnng tasks for each CSVM are more specfc and effcent (He et al., 2006). Each CSVM can concentrate on hghly related samples n each feature subspace wthout beng dstracted by nosy data from other clusters. As a result, CSVMs can potentally mprove the generalzaton capablty for classfcaton problems. Snce granulaton by K-means clusterng may ntroduce nose and rreverent nformaton nto each granule, the machne learnng technques are requred to dentfy the strength of correspondence between frequency profles and 3D local structure for each sequence segment belongng to the same nformaton granule. After learnng the relatonshp between frequency profle dstrbuton and 3D local structures, CSVMs can flter out potentally unrelable predcton and can select potentally relable predcton for each granule. Because our unpublshed results reveal that the dstrbuton patterns for frequency profles n each cluster s qute dfferent, the functonalty and tranng of CSVMs s customzed for each cluster belongng to dfferent cluster groups. The CSVMs for clusters belongng to the bad cluster group are desgned to dentfy sequence segments whose structure can be relably predcted. The CSVMs for clusters belongng to the good cluster group are traned to flter out sequence segments whose structure cannot be relably predcted. Local proten structure predcton by CSVMs s based on the predcton method from the clusterng algorthm. At frst, the sequence segments whose structures to be predcted are assgned to a specfc cluster n the cluster group by the clusterng algorthm. Then CSVM

31 16 traned for ths specfc cluster s used to dentfy how close the frequency profle of ths sequence segment s nonlnearly correlated to the 3D local structure of ths cluster. If the sequence segment s predcted as the postve sample by CSVM, the frequency profle of ths segment has the potental to be closely mapped to 3D local structure for ths cluster. Consequently, the 3D local structure of ths cluster can be safely assgned to ths sequence segment. The method to decde the 3D local structure of each cluster can be found n Chapter 12. If the sequence segment s predcted as the negatve sample by CSVM, the frequency profle of ths segment does not closely corresponds to the 3D local structure for ths cluster. The structure of ths segment cannot be relably predcted by ths cluster. Ths cluster s removed from the cluster group. The cluster membershp functon calculatng dstance scores and relablty scores s used to select the next cluster from the remanng clusters of the cluster group. The prevous procedure wll be repeated untl one SVM modeled for the selected cluster predct the gven sequence segment as postve. Important knowledge about the correspondence between frequency profles and the 3D local structure provded by CSVMs can provde the addtonal dependable metrc of cluster membershp assgnment. Average accuracy for CSVMs s over 80%, whch ndcates that the generalzaton power for CSVMs s strong enough to recognze the complcated pattern of sequence-to-structure relatonshps. CSVM modeled for dfferent cluster group obtans good capablty to dscrmnate between postve samples and negatve samples. CSVMs for the bad cluster group are able to select frequency profles of sequence segments whose structure can be relably predcted. The recall value for CSVMs belongng to the good cluster group reaches 96%. Ths hgh value reveals that CSVMs dd not msclassfy frequency profles of sequence segments whose structure can be accurately predcted. The precson value for CSVMs belongng to the good cluster group

32 17 reaches 86%. The hgh precson value demonstrates that CSVMs belongng to the good cluster group obtan the capablty to flter out the frequency profles of sequence segments whose structure cannot be relably predcted. Compared wth the clusterng system ntroduced prevously, our expermental results show that accuracy for local structure predcton has been mproved notceably when CSVMs are appled. 1.2 Dssertaton Organzaton Ths dssertaton has been dvded nto four parts. In the frst part of dssertaton, I dscuss how proten structures are represented and why proten structure predcton s mportant. The frst part covers Chapter 2. In the second part of dssertaton, I dscuss the new mproved K-means clusterng for sequence cluster and motf dscovery. Then I explan how sequence varaton for sequence clusters may nfluence ts structural smlarty. Based on the above nformaton, the clusterng system s developed n order to carry out local proten structure predcton. The second part expands from Chapter 3 to Chaper 8. The thrd part of the dssertaton dscusses the new clusterng support machne to perform local proten structure predcton snce the clusterng system used n the second part may not capture non-lnear sequence to structure relatonshp effectvely. The thrd part of the dssertaton also explans the conclusons and future work. The thrd part covers Chapter 9. The fourth part of the dssertaton wll provde the conclusons and future work. The fourth part covers Chapter 10. In Chapter 2, four levels of proten structure are explaned frst. Then how proten structure can be expermentally determned s ntroduced. In the thrd part of ths chapter, three major computatonal methods to predct proten structure are dscussed n detals.

33 18 In Chapter 3, an mproved K-means clusterng algorthm s ntroduced n order to explore recurrng sequence motfs of protens. Informaton about local proten sequence motfs s very mportant to the analyss of bologcally sgnfcant conserved regons of proten sequences. Ths chapter has been dvded nto fve sectons. Frst, the major motf dscovery methods are dscussed. Then, the major characterstcs of the tradtonal and mproved K-means algorthms are compared. In secton 3.3, the expermental setup s explaned. In secton 3.4, expermental results are presented to show that the mproved K-means algorthm s better than the tradtonal K-means algorthm and to gve evdence that our research fnd some prevously undscovered sequence motfs. In secton 3.5, our research s compared to other state-of-art approaches n order to emphasze the advantages of our research. The long searchng tme for ntal ponts motvates us to mplement the parallel K-means algorthm n order to reduce the searchng tme for sutable ntal ponts to one to two days. In Chapter 4, the parallel K-means algorthm s ntroduced. The parallelzaton of the mproved K- means algorthm wll make exploraton of the optmal cluster number possble. We predct that the performance gans for the mproved K-means algorthm wll be ncreased further after the optmal cluster number s found. In ths chapter, two mportant parallelzaton technques for the K-means clusterng algorthm are dscussed. Then programmng envronment and mplementaton detals are explaned. Fnally, expermental results for speedup values are presented. In Chapter 5, we want to dscuss how sequence varaton for sequence clusters may nfluence ts structural smlarty. How sequence varaton for sequence clusters may nfluence ts structural smlarty s one of the most mportant tasks of current bonformatcs research. In ths chapter, prevous studes for sequence and structural varaton of sequence clusters are revewed

34 19 frst. Then recurrent clusterng, data set and generaton of sequence segments are ntroduced. Evaluaton of sequence varaton and structural smlarty s dscussed n detal. Fnally, results of analyss about the relatonshp between sequence varaton and structural varaton are gven. In Chapter 3 and 5, we have dscussed the mproved K-means algorthm for sequence motf dscovery and how sequence varaton for sequence clusters may nfluence ts structural smlarty. Based on above knowledge, the clusterng system s developed for local proten structure predcton n the Chapter 6. In ths chapter, how to cluster sequence segments nto clusters s explaned frst. Then the method to calculate the representatve structure for each cluster s explaned. Dstance score and relablty score to decde the cluster membershp s dscussed. The performance evaluaton and expermental results are explaned n the last part of ths chapter. In Chapter 7, Support Vector Machnes wll be explaned n detals. Support Vector Machnes are a new generaton of learnng machnes, whch have been successfully appled to a wde varety of applcaton domans (Crstann and Shawe Taylor, 2000) ncludng bonformatcs (Schoelkopf, Tsuda and Vert, 2000). Constructon of optmal hyperplane that can separate samples belongng to the frst class from samples belongng to the second class wth the maxmal margn s the essental task of SVM. In ths chapter, the concept of optmal hyperplane and optmzaton problems to construct optmal hyperplane n the lnearly separable case and n the lnearly nonseparable case wll be dscussed frst. Then the expected rsk bounds are evaluated to assess the effectveness of support vector machnes. In addton, the quadratc optmzaton and lnear optmzaton method to buld SVMs are dscussed. SVM Kernels play key roles n calculatng the nner products between support vectors and the vectors mplctly n the hgh dmensonal feature space, several mportant SVM kernels are ntroduced n ths secton. In real

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Protein Secondary Structure Prediction Using Support Vector Machines, Nueral Networks and Genetic Algorithms

Protein Secondary Structure Prediction Using Support Vector Machines, Nueral Networks and Genetic Algorithms Georga State Unversty ScholarWorks @ Georga State Unversty Computer Scence Theses Department of Computer Scence 5-3-2007 Proten Secondary Structure Predcton Usng Support Vector Machnes, Nueral Networks

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Face Recognition Method Based on Within-class Clustering SVM

Face Recognition Method Based on Within-class Clustering SVM Face Recognton Method Based on Wthn-class Clusterng SVM Yan Wu, Xao Yao and Yng Xa Department of Computer Scence and Engneerng Tong Unversty Shangha, Chna Abstract - A face recognton method based on Wthn-class

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Correlative features for the classification of textural images

Correlative features for the classification of textural images Correlatve features for the classfcaton of textural mages M A Turkova 1 and A V Gadel 1, 1 Samara Natonal Research Unversty, Moskovskoe Shosse 34, Samara, Russa, 443086 Image Processng Systems Insttute

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions Applcaton of Maxmum Entropy Markov Models on the Proten Secondary Structure Predctons Yohan Km Department of Chemstry and Bochemstry Unversty of Calforna, San Dego La Jolla, CA 92093 ykm@ucsd.edu Abstract

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Human Face Recognition Using Generalized. Kernel Fisher Discriminant Human Face Recognton Usng Generalzed Kernel Fsher Dscrmnant ng-yu Sun,2 De-Shuang Huang Ln Guo. Insttute of Intellgent Machnes, Chnese Academy of Scences, P.O.ox 30, Hefe, Anhu, Chna. 2. Department of

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Discriminative classifiers for object classification. Last time

Discriminative classifiers for object classification. Last time Dscrmnatve classfers for object classfcaton Thursday, Nov 12 Krsten Grauman UT Austn Last tme Supervsed classfcaton Loss and rsk, kbayes rule Skn color detecton example Sldng ndo detecton Classfers, boostng

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Disulfide Bonding Pattern Prediction Using Support Vector Machine with Parameters Tuned by Multiple Trajectory Search

Disulfide Bonding Pattern Prediction Using Support Vector Machine with Parameters Tuned by Multiple Trajectory Search Proceedngs of the 9th WSEAS Internatonal Conference on APPLIED IFORMAICS AD COMMUICAIOS (AIC '9) Dsulfde Bondng Pattern Predcton Usng Support Vector Machne wth Parameters uned by Multple rajectory Search

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information