Intelligent Information Acquisition for Improved Clustering

Size: px
Start display at page:

Download "Intelligent Information Acquisition for Improved Clustering"

Transcription

1 Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn Mkhal Blenko Mcrosoft Research Prem Melvlle IBM T.J. Watson Research Center Maytal Saar-Tsechansky Unversty of Texas at Austn 1. Introducton and motvaton In many data mnng and machne learnng tasks, datasets nclude nstances that have mssng feature values that can be acqured at a cost. However, both the acquston cost and the usefulness wth respect to the learnng task may vary dramatcally for dfferent feature values. Whle ths observaton has nspred a number of approaches for actve and cost-senstve learnng, most work n these areas has focused on classfcaton settngs. Yet, the problem of obtanng most useful mssng data cost-effectvely s equally mportant n unsupervsed settngs, such as clusterng, snce the amount by whch acqured nformaton may mprove performance vares sgnfcantly across nstances and features. For example, clusterng algorthms are commonly used to dentfy users wth smlar preferences, so as to produce personalzed product recommendatons. Wth nstances correspondng to ndvdual consumers and features descrbng consumers ratngs of a gven product/servce, ndvdual features of partcular nstances may be mssng as customers may have not provded feedback on all the tems they purchased. Furthermore, because consumers are often reluctant to provde feedback, acqurng feedback on unrated tems may ental costly ncentves, such as free or dscounted products or servces. However, obtanng dfferent feature values may have varyng effect on accuracy of subsequently obtaned clusterng of consumers. Thus, choosng whch ratngs to acqure va ncentves that wll beneft the clusterng task most cost-effectvely s an mportant decson --- as acqurng feedback for all mssng ratngs s prohbtvely expensve. In ths paper, we address the problem of actve feature-value acquston (AFA) for clusterng: gven a clusterng of ncomplete data, the task s to select feature values whch, when acqured, are lkely to provde the hghest mprovement n clusterng qualty wth respect to acquston cost. To the best of our knowledge, ths general problem has not been consdered prevously, as pror research focused ether on acqurng parwse dstances ([3],[4]) or cluster labels for complete nstances [1]. Pror work addressed the AFA task for supervsed learnng, where mssng feature values are acqured n a cost-effectve manner for tranng classfcaton models [6]. However, ths approach explots supervsed nformaton to estmate the expected mprovement n model accuracy for prospectve acqustons. The prmary challenge addressed n ths paper les n a pror estmaton of the value of a potental acquston n the absence of any supervson (.e., t s not known to whch cluster each nstance actually belongs). We employ an expected utlty acquston framework and present an nstantaton of our overall framework for K-means, where the value of prospectve acqustons s derved from ther expected mpact on the clusterng confguraton (see [8] for an nstantaton of our framework for herarchcal agglomeratve clusterng algorthm). Emprcal results demonstrate that the proposed utlty functon effectvely dentfes acqustons that mprove clusterng qualty per unt cost sgnfcantly better than acqustons selected unformly at random. In addton, we show that our polcy performs well for dfferent feature cost structures. 2. Task defnton and algorthm The clusterng task s tradtonally defned as the problem of parttonng a set of nstances nto dsjont subsets, or clusters, where each cluster contans smlar nstances. We focus our attenton on clusterng n domans where nstances nclude mssng feature values that can be acqured at a cost. A

2 dataset consstng of m n-dmensonal nstances s represented by an m-by-n data matrx X, where x corresponds to the value of the j-th feature of the -th nstance. Intally, the data matrx X s ncomplete,.e., ts elements correspondng to mssng values are undefned. For each mssng feature value x, there s a correspondng cost C at whch t can be acqured. Let q refer to the query for the value of x. Then, the general task of actve feature-value acquston s the problem of selectng the nstance-feature query that wll result n the hghest ncrease n clusterng qualty per unt cost. The overall framework for the generalzed AFA problem s presented n Algorthm 1. Informaton s acqured teratvely, where at each step all possble queres are ranked based on ther expected contrbuton to clusterng qualty normalzed by cost. The hghest-rankng query s then selected, and the feature value correspondng to ths query s acqured. The dataset s approprately updated, and ths process s repeated untl some stoppng crteron s met, e.g., desrable clusterng qualty has been acheved. To reduce computatonal costs, multple queres can be selected at each teraton. Whle ths framework s ntutve, the crux of the problem les n devsng effectve measures for the utlty of acqustons. In subsequent sectons, we address challenges related to performng ths task accurately and effcently. Algorthm 1: Actve Feature-value Acquston for Clusterng Gven: X ntal (ncomplete) nstance-feature matrx, L clusterng algorthm, b sze of query batch, C cost matrx for all nstance-feature pars. Output: M = L(X) fnal clusterng of the dataset ncorporatng acqured values 1. Intalze TotalCost to ntal cost of X 2. Intalze set of possble queres Q = {q : x s mssng}. 3. Repeat untl stoppng crteron s met 4. Generate a clusterng M = L(X) 5. q Q compute utlty score 6. Select a subset S of b queres wth the hghest scores 7. q S : Acqure values for x : X = X x. TotalCost = TotalCost + C 8. Remove S from Q 9. Return M = L(X) At every step of the AFA algorthm, the feature value whch n expectaton wll result n the hghest clusterng mprovement per unt cost s acqured. Fundamental to our approach s a utlty functon U ( x = x, C ) whch quantfes the beneft from a specfc value x for feature x acqured va the correspondng query q at cost C. Then, expected utlty for query q, ( q ), s defned as the expectaton of the utlty over the margnal dstrbuton for the feature x : ( q ) = U ( x = x, C ) P( x. Snce the true margnal dstrbuton of each mssng feature x value s unknown, an emprcal estmate of P( x can be obtaned usng probablstc classfers. For example, n the case of dscrete (categorcal) data, for each feature j, a näve Bayes classfer M j can be traned to estmate the feature's probablty dstrbuton based on the values of other features of a gven nstance. Then, the expectaton can be easly computed by pecewse summaton over the possble values. For contnuous attrbutes, computaton of expected utlty can be performed ether usng computatonal methods such as Monte Carlo estmaton, or va dscretzng them and usng probablstc classfers as descrbed above.

3 2.1 Capturng the utlty from a prospectve acquston Devsng a utlty functon U to capture the benefts of possble acquston outcomes s the crtcal component of the AFA framework. Acqustons am to mprove clusterng qualty. Clusterng qualty measures proposed n pror work can be loosely dvded nto external measures, such as parwse F- measure [7], whch are derved from a category dstrbuton unseen at clusterng tme, and nternal measures, e.g., rato between average nter-cluster and ntra-cluster dstances, whch use only data that s avalable to the clusterng algorthm. Snce external measures cannot be assessed at the tme of clusterng, an acquston polcy must capture the value of acqustons usng merely the dataset at hand. Most clusterng algorthms optmze a specfc objectve functon, whch allows defnng utlty as mprovement n ths objectve per unt cost. For example, the objectve of the popular K-Means algorthm [5] s to mnmze the sum of squared dstances between every nstance x and the centrod of the 2 nstance's cluster, µ : J ( X ) = ( x µ ), where y s the ndex of the cluster to whch nstance y { } k h x x s assgned, y h = 1 computaton. Thus, the objectve-based utlty from acquston outcome X y, and mssng feature values are omtted from the squared dstance x = x can be defned as the J ( X x J ( X ) cost-normalzed reducton n the value of the objectve functon: Obj U ( x, C ) C where the objectve functon value after the acquston J ( X x x) s estmated followng the relocaton of cluster centrods caused by the acquston. = =, Whle an objectve-based utlty functon provdes a well-motvated acquston strategy, t may select feature values that mprove cluster centrod locatons wthout sgnfcantly changng cluster assgnments whch often underle external measures of clusterng outcome. The effect of such wasteful acqustons can be sgnfcant, renderng an objectve-based utlty a suboptmal strategy for mprovng external evaluaton measures. Because nternal objectve functons may not relate well to external measures we propose an alternatve utlty measure whch approxmates the qualtatve mpact on clusterng confguraton caused by the acquston. We defne ths utlty as the number of nstances for whch cluster membershp changes as the result of an acquston, gven a certan value of the acqured feature. Formally, gven the current data ( X ) ( X x matrx X, let y be the cluster assgnment of the pont x before the acquston, and y be the cluster assgnment of x after the acquston. Then, the perturbaton-based utlty of acqurng value x for feature x s defned as follows: assgnments after the acquston, { } M U Pert M ( X x y = 1 ( X ) y ( x = x, C ) =. For K-Means, the cluster C ( X x ( X x Y y = 1 = can be obtaned by re-estmatng the cluster centrod to whch nstance x s currently assgned, assumng the value x for feature x. Then, performng a sngle assgnment step for all ponts would provde the new set of cluster assgnments ( X x Y. As we show below, ths utlty measure dentfes hghly nformatve acqustons. Henceforth, we refer to ths perturbaton-based utlty as Expected Utlty (); we refer to the use of the objectvebased utlty as Expected-Utlty-Objectve (-Objectve). 2.2 Effcency consderatons: Instance-based samplng A sgnfcant challenge les n the fact that exhaustvely evaluatng all potental acqustons s computatonally nfeasble for datasets of even moderate sze. We propose to make ths selecton tractable by evaluatng only a sub-sample of the avalable queres. We specfy an exploraton parameter α whch controls the complexty of the search. To select a batch of b queres, frst a sub-sample of αb queres s

4 selected from the avalable pool, and then the expected utlty of each query n ths sub-sample s evaluated. The value of α can be set dependng on the amount of tme the user s wllng to spend on ths process. One approach s to draw ths sample unformly at random to make the computaton feasble. However, t may be possble to mprove performance by applyng Expected Utlty estmaton to a partcularly nformatve sample of queres. In partcular, because the goal of clusterng s to defne boundares between potental classes, nstances near these boundares have the most mpact on cluster formaton. Consequently, mssng features of these nstances gve us the most decsve nformaton to adjust the clusterng boundares. Formally, f µ and µ are respectvely the closest and second closest y centrods for nstance x n the current clusterng, we defne the margn δ ( x ) of nstance x as the dfference between ther dstances from x, accordng to the dstance metrc D beng used for clusterng: δ ( x ) = D( x,µ ) - D( x, µ y ). Gven ncomplete nformaton about the poston of nstances n the feature space, smaller margns for nstances correspond to lower confdence n ther current cluster assgnment. For these nstances, obtanng a better estmate of ther poston n the feature space s more lkely to mprove our ablty to assgn them to the correct cluster than for nstances wth large margns. Followng ths ratonale, we rank all nstances n ascendng order of ther margns based on the current cluster assgnments. Then, a set of αb queres from the top-ranked nstances are selected for evaluaton; where b s the desred batch sze and α s the exploraton parameter. Ths canddate set of queres s then subjected to the same expected utlty samplng descrbed n the prevous secton. We refer to ths approach as Instance-Based Samplng Expected Utlty (IBS-). 3. Expermental evaluaton We evaluated our proposed approach on four datasets from the UCI repostory [2]: rs, wne, lettersl, and proten, whch have been prevously used n a number of clusterng studes. Features wth contnuous values n these datasets were dscretzed nto 10 bns of equal wdth. Snce feature acquston costs are not avalable for these datasets, n our frst set of experments, we assume that acquston costs are unform for all feature values followed by experments for other cost dstrbutons. Dscrete feature values enable the use of a pecewse summaton for the expectaton calculaton and s computatonally preferable. However, n prncple, contnuous values can also be used. We compare the proposed acquston polces wth a strategy that selects queres unformly at random, and usng the K-means clusterng algorthm. The samplng parameter α of our methods s set to 10. We report results obtaned from 100 runs for each actve acquston polcy. In each run, a small fracton of features s randomly selected for ntalzaton for each nstance n the dataset 1, and we evaluate clusterng performance after each acquston step. Lastly, because the datasets we consder have underlyng class labels, we employ an external metrc, parwse F-measure, to evaluate clusterng qualty. We have found that emprcally there are no qualtatve dfferences n our results for dfferent external measures. Gven a clusterng and underlyng class labels, parwse precson and recall are defned as the proporton of same-cluster nstances that have the same class, and the proporton of same-class nstances that have been placed n the same cluster, respectvely. Then, F-measure s the harmonc mean of precson and recall, F1 = ((2 Precson Recall)/(Precson+Recall)). The performance comparson for any two acquston schemes A and B can be summarzed by the average percentage ncrease n parwse F-measure of A over B over all acquston phases. We refer to ths metrc as the average % F-measure ncrease. 4. Results Table 1 presents summary results for, -Objectve, IBS-, and IBS-, whch acqures feature values drawn unformly at random from nformatve nstances selected by IBS. Let us 1 We randomly selected 1 out of 4 features for each nstance n the rs dataset, 2 out of 4 features for wne, and 3 out of 16 and 20 features for the letter-l and proten data sets, respectvely.

5 frst examne the relatve performance of the polcy whch dentfes acqustons that are lkely to mpact the cluster assgnments and -Objectve whch targets acqustons whch are expected to mprove the clusterng algorthm s nternal objectve functon. Fgure 1(a) presents clusterng performance as a functon of acquston costs for the proten data set, obtaned wth, -Objectve, and random samplng. For all data sets, leads to better clusterng than random query samplng. The mprovements n performance range from a 10% to 32% ncrease n F-measure on the top 20% of acquston phases. One can also observe the cost benefts of usng to obtan a desred level of performance. For example, on the rs data set, Expected Utlty acheves a parwse F-measure of 0.8 wth less than 300 feature values, whle random samplng requres twce as many acqustons to acheve the same result. In contrast to Expected Utlty, usng the objectve-based utlty functon n -Objectve s rather neffectve n mprovng parwse F-measure. Ths s because the K-means objectve s focused on producng tghter clusters, and the acquston strategy based on t may select feature values that reduce ths objectve wthout changng any cluster assgnments, resultng n no mprovement wth respect to external evaluaton measures. Data Set % F-measure Increase over -Objectve IBS- IBS- rs wne letters proten Table 1: Performance of dfferent acquston polces for clusterng Parwse F Measure Objectve Number of feature-values acqured (a) Parwse F Measure IBS- IBS Number of feature-values acqured Fgure 2: Learnng curves for alternatve acquston polces (b) Now, let us examne the beneft to from evaluatng a subset of acqustons from partcularly nformatve nstances as captured by our Instance-Based samplng approach. Table 1 presents summary performance for -IBS and IBS-, and for the rs data set, Fgure 1(b) show clusterng qualty after each acquston phase obtaned by, IBS-, and IBS-. On 3 of the 4 datasets, IBS- produces the hghest average ncrease n parwse F-measure compared to random samplng. On these datasets, IBS- also performs substantally better than random. These results demonstrate that our margn measure effectvely dentfes partcularly nformatve nstances for acquston. Consequently, IBS- focuses the evaluaton of Expected Utlty to a more promsng set of queres, leadng to better models on average. However, the mprovements of IBS- over are not very large. Lastly, we evaluated the polces when appled to the rs dataset under dfferent cost dstrbutons. We assgned each feature a cost drawn unformly at random from a range between 1 and 100. For ths evaluaton we nclude a cost-senstve benchmark polcy, Cheapest-frst, whch selects acqustons n order of ncreasng cost. The results for all randomly assgned cost dstrbutons show that

6 IBS- and Expected Utlty consstently results n better clusterng than random acquston for a gven cost. Fgure 3 presents F-measure versus acquston costs for two representatve cost dstrbutons. As shown, n settngs where features have varyng nformaton value wth non-neglgble costs, 's ablty to capture the value of dfferent feature values per unt cost s more crtcal. In such cases, acqurng an unnformatve feature value for a substantal cost results n a sgnfcant loss and, as shown, and IBS- are more lkely to avod such losses. In contrast, the performance of Cheapest-frst s nconsstent. It performs well when ts underlyng assumpton holds and the cheapest features are also nformatve. In such cases, does not perform as well, snce t mperfectly estmates the expected mprovement from each acquston. When many nexpensve features are also unnformatve Cheapest-frst can perform poorly, as shown by the early acquston stages of Fgure 3., however, estmates the trade-off between cost and expected mprovement n clusterng qualty, and although the estmaton s mperfect, t consstently selects better queres than random acqustons for all cost structures Parwse F Measure Parwse F Measure IBS- Cheapest-Frst Costs (a) Inexpensve features are also nformatve 0.5 IBS- 5 Cheapest-Frst Costs (b) Some expensve feature are nformatve Fgure 3: Performance under dfferent feature-value cost structures 5. Conclusons In ths paper, we proposed an expected utlty approach to actve feature-value acquston for clusterng, where nformatve feature values are obtaned based on the estmated expected mprovement n clusterng qualty per unt cost. Experments show that the Expected Utlty approach consstently leads to better clusterng than random samplng for a gven acqustons cost. 6. References [1] S. Basu, A. Banerjee, and R. J. Mooney. Actve sem-supervson for parwse constraned clusterng. In Proceedngs of the 2004 SIAM Internatonal Conference on Data Mnng (SDM-04), Apr [2] C. L. Blake and C. J. Merz. UCI repostory of machne learnng databases. mlearn/mlrepostory.html, [3] J. M. Buhmann and T. Zller. Actve learnng for herarchcal parwse data clusterng. In ICPR, pages , [4] T. Hofmann and J. M. Buhmann. Actve data clusterng. In Advances n Neural Informaton Processng Systems 10, [5] J. MacQueen. Some methods for classfcaton and analyss of multvarate observatons. In Proceedngs of 5th Berkeley Symposum on Mathematcal Statstcs and Probablty, pages , [6] P. Melvlle, M. Saar-Tsechansky, F. Provost, and R. Mooney. An expected utlty approach to actve feature-value acquston. In Proceedngs of the Internatonal Conference on Data Mnng, pages , Houston, TX, November [7] M. Stenbach, G. Karyps, and V. Kumar. A comparson of document clusterng technques. In Proceedngs of the KDD-2000 Workshop on Text Mnng, [8] D. Vu, M. Blenko, P, Melvlle, M. Saar-Tsechansky. Actve nformaton acquston for mproved clusterng, Workng Paper, McCombs School of Busness, May 2007.

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data

Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data Contrary to Popular Belef Incremental Dscretzaton can be Sound, Computatonally Effcent and Extremely Useful for Streamng Data Geoffrey I. Webb Faculty of Informaton Technology, Monash Unversty, Vctora,

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

APPLIED MACHINE LEARNING

APPLIED MACHINE LEARNING Methods for Clusterng K-means, Soft K-means DBSCAN 1 Objectves Learn basc technques for data clusterng K-means and soft K-means, GMM (next lecture) DBSCAN Understand the ssues and major challenges n clusterng

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts

Selecting Query Term Alterations for Web Search by Exploiting Query Contexts Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Generalized Team Draft Interleaving

Generalized Team Draft Interleaving Generalzed Team Draft Interleavng Eugene Khartonov,2, Crag Macdonald 2, Pavel Serdyukov, Iadh Ouns 2 Yandex, Russa 2 Unversty of Glasgow, UK {khartonov, pavser}@yandex-team.ru 2 {crag.macdonald, adh.ouns}@glasgow.ac.uk

More information

Understanding K-Means Non-hierarchical Clustering

Understanding K-Means Non-hierarchical Clustering SUNY Albany - Techncal Report 0- Understandng K-Means Non-herarchcal Clusterng Ian Davdson State Unversty of New York, 1400 Washngton Ave., Albany, 105. DAVIDSON@CS.ALBANY.EDU Abstract The K-means algorthm

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data

Learning Semantics-Preserving Distance Metrics for Clustering Graphical Data Learnng Semantcs-Preservng Dstance Metrcs for Clusterng Graphcal Data Aparna S. Varde, Elke A. Rundenstener Carolna Ruz Mohammed Manruzzaman,3 Rchard D. Ssson Jr.,3 Department of Computer Scence Center

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

A Heuristic for Mining Association Rules In Polynomial Time*

A Heuristic for Mining Association Rules In Polynomial Time* Complete reference nformaton: Ylmaz, E., E. Trantaphyllou, J. Chen, and T.W. Lao, (3), A Heurstc for Mnng Assocaton Rules In Polynomal Tme, Computer and Mathematcal Modellng, No. 37, pp. 9-33. A Heurstc

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012 Performance Evoluton of Dfferent Codng Methods wth β - densty Decodng Usng Error Correctng Output Code Based on Multclass Classfcaton Devangn Dave, M. Samvatsar, P. K. Bhanoda Abstract A common way to

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES 1 Fetosa, R.Q., 2 Merelles, M.S.P., 3 Blos, P. A. 1,3 Dept. of Electrcal Engneerng ; Catholc Unversty of

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

A Heuristic for Mining Association Rules In Polynomial Time

A Heuristic for Mining Association Rules In Polynomial Time A Heurstc for Mnng Assocaton Rules In Polynomal Tme E. YILMAZ General Electrc Card Servces, Inc. A unt of General Electrc Captal Corporaton 6 Summer Street, MS -39C, Stamford, CT, 697, U.S.A. egemen.ylmaz@gecaptal.com

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

A Multivariate Analysis of Static Code Attributes for Defect Prediction

A Multivariate Analysis of Static Code Attributes for Defect Prediction Research Paper) A Multvarate Analyss of Statc Code Attrbutes for Defect Predcton Burak Turhan, Ayşe Bener Department of Computer Engneerng, Bogazc Unversty 3434, Bebek, Istanbul, Turkey {turhanb, bener}@boun.edu.tr

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks

Federated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

A Two-Stage Algorithm for Data Clustering

A Two-Stage Algorithm for Data Clustering A Two-Stage Algorthm for Data Clusterng Abdolreza Hatamlou 1 and Salwan Abdullah 2 1 Islamc Azad Unversty, Khoy Branch, Iran 2 Data Mnng and Optmsaton Research Group, Center for Artfcal Intellgence Technology,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information