Vehicle Fault Diagnostics Using Text Mining, Vehicle Engineering Structure and Machine Learning

Size: px
Start display at page:

Download "Vehicle Fault Diagnostics Using Text Mining, Vehicle Engineering Structure and Machine Learning"

Transcription

1 Internatonal Journal of Intellgent Informaton Systems 205; 4(3): Publshed onlne July 8, 205 ( do: 0.648/.s ISSN: (Prnt); ISSN: (Onlne) Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng Y Lu Murphey, Lpng Huang, Hao Xng Wang, Ynghao Huang Department of Electrcal and Computer Engneerng, Unversty of Mchgan-Dearborn, Dearborn, USA Emal address: ylu@umch.edu (Y. L. Murphey) To cte ths artcle: Y Lu Murphey, Lpng Huang, HaoXng Wang, Ynghao Huang. Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng. Internatonal Journal of Intellgent Informaton Systems. Vol. 4, No. 3, 205, pp do: 0.648/.s Abstract: Ths paper presents an ntellgent vehcle fault dagnostcs system, SeaProSel(Search-Prompt-Select). SeaProSel takes a casual descrpton of vehcle problems as nput and searches for a dagnostc code that accurately matches the problem descrpton. SeaProSel was developed usng automatc text classfcaton and machne learnng technques combned wth a prompt-and-select technque based on the vehcle dagnostc engneerng structure to provde robust classfcaton of the dagnostc code that accurately matches the problem descrpton. Machne learnng algorthms are developed to automatcally learn words and terms, and ther varatons commonly used n verbal descrptons of vehcle problems, and to buld a TCW(Term-Code-Weght) matrx that s used for measurng smlarty between a document vector and a dagnostc code class vector. When no exactly matched dagnostc code s found based on the drect search usng the TCW matrx, the SeaProSel system wll search the vehcle fault dagnostc structure for the proper questons to pose to the user n order to obtan more detals about the problem. A LSI (Latent Semantc Indexng) model s also presented and analyzed n the paper. The performances of the LSI model and TCW models are presented and dscussed. An n-depth study of dfferent term weght functons and ther performances are presented. All experments are conducted on real-world vehcle dagnostc data, and the results show that the proposed SeaProSel system generates accurate results effcently for vehcle fault dagnostcs. Keywords: Vehcle Fault Dagnostcs, Text Data Mnng, Machne Learnng, Vehcle Dagnostc Engneerng Structure, TCW, LSI. Introducton As computers and networks grow more powerful and data storage devces become more plentful and less costly, the amount of nformaton n dgtal form s exploded. The maorty of such dgtal data are n text form. Text data mnng has many applcatons ncludng text document search and categorzaton, webste search, customer servces, and automatc dagnostc systems [~4]. In ths research we focus on text documents that are casually typed or recorded. Many text mnng applcatons requre processng casual text data, whch often are n semstructured or unstructured text, such as clncal document analyss [3, 5], emals, nstant messages, free-text of medcal records, operatonal notes, emals, nstant messages, etc., and the applcaton of ths research s n automotve dagnostc text mnng. In automotve ndustry there are abundant nformaton avalable n casual natural language descrpton form that contan valuable vehcle fault dagnostc knowledge, marketng nformaton, consumer evaluaton or satsfacton of certan vehcle models, styles, accessores, etc [6]. For example, several thousands of vehcle problems are reported daly to varous auto servce shops. It s mportant to fnd root-cause of a vehcle problem quckly and accurately. In a typcal vehcle fault dagnostc process, vehcle problems are frst descrbed by customers n casual words and terms. A servce advsor records the customer s complants or descrpton of symptoms verbatm on the repar order, and then searches for a dagnostc code that matches the descrpton. The dagnostc code s used to gude the dagnoss and reparng processes. Due to the complexty of modern vehcles, the number of dagnostc codes can be n hundreds, whch makes manual searchng of correct dagnostc code dffcult and may lead to a lengthy and less accurate dagnoss and repar process, and, possbly, unnecessary part replacements. In order to mprove the

2 59 Y Lu Murphey et al.: Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng accuracy and effcency of vehcle fault dagnostcs, t s mportant to develop an automated system that can help customers to report problems descrbed n casual words and terms, and techncans to quckly fnd the correct dagnostc code,.e., the root cause of the problems. There are several challenges nvolved n ths problem. The descrptons of vehcle problems provded by customers are often ll-structured. Most of such descrptons do not follow the Englsh grammar, and contan many msspelled words, self-nvented acronyms and shorthand descrptons. The descrptons of a problem by dfferent people vary based on the educaton and/or cultural background of the customers, and ther famlarty of vehcle termnologes and knowledge of automotve engneerng. For example, the term trunk used Unted States means the same thng as the term boot used n UK. One faulty symptom, for example, a nose s heard from the engne and the engne runs rough can be descrbed by customers n varous ways, such as engne knocks, hood squeak, engne msses dle, engne lopes, etc. The followng are examples of customer descrptons of the same vehcle problem: Customer : WENT ON A SALES ROAD TEST WITH CUST, VEHICLE WOULDNOT START, Customer 2: CHECK CAR WONT START, Customer 3: CK BATTERY HARD TO START. Hgh dmensons of terms and document classes. Snce there are typos and self-nvented acronyms and abbrevatons frequently occurrng n customer descrptons, the number of dstnct terms used n these documents s several tmes more than formally prnted documents. For example, the word engne has more than 20 dfferent spellngs n customers descrptons n our data collectons. Snce the output vector represents all the dagnostc codes used by a car manufacturng company, there can easly be several hundreds of dfferent document classes. These two hgh dmenson ssues pose challenges for generatng effcent and effectve response for a gven problem descrpton. In ths paper we present an ntellgent vehcle fault dagnostcs system, Search-Prompt-Select(SeaProSel), whch s developed by combnng automatc text categorzaton technques wth vehcle engneerng structure and machne learnng to provde effectve search functons for the dagnostc code that accurately matches a gven problem descrpton. The SeaProSel system uses machne learnng technques to automatcally learn words, terms and ther varatons commonly used n verbal descrpton of vehcle problems from tranng data, and ncorporate a vehcle fault dagnostc engneerng structure nto the search process to provde accurate dagnostc code that matches the problem descrpton. Ths paper s organzed as follows. Secton 2 presents a bref overvew of the state-of-art technologes for text mnng and document categorzaton, Secton 3 presents the proposed system, SeaProSel, Secton 4 presents the experment results generated from real-world vehcle dagnostc data, and Secton 5 concludes the paper. 2. Research n Text Mnng and Document Classfcaton Untl the late 80s, the most popular approach to text categorzaton are based on knowledge engneerng [7~9]. These approaches usually consst of a set of predefned rules that are encoded wth expert knowledge. Each rule s represented as a dsunctve normal form (DNF formula) followed by a category name. A document s classfed under a specfc category f t satsfes the DNF formula of the category. Ths DNF expresson s mostly defned by doman experts. If categores are updated or ported to a dfferent doman, doman experts need to ntervene to redefne DNF expressons for new categores from scratch. In recent years most technques used n text document classfcaton and categorzaton are developed based on machne learnng technologes, whch automatcally buld document classfers by learnng the characterstc of document categores from a set of tranng documents. The advantage of machne learnng s that t does not heavly rely on manual labors durng the model constructon stage. Its effectveness level, n many cases, s superor to that of professonal human work. Consequently, automatc text categorzaton has become a maor research area of machne learnng. Many text categorzaton systems have been developed usng dfferent machne learnng algorthms, ncludng k-nearest neghbor (K-NN), neural networks, latent semantc ndexng, probablstc models, support vector machne, and etc. The ntal applcaton of k-nearest Neghbor (K-NN) to text document categorzaton and classfcaton was ntroduced by Masand and hs colleagues [7, 0], and later t became a wdely used method n text classfcaton [~3]. In text document classfcaton and categorzaton, a document s often represented as a vector composed of a seres of selected words called as feature vector. A K-NN based text categorzaton system s to fnd the K documents n the tranng data that are most smlar to an nput unknown document. The category contans the maorty among the K best matched documents s consdered as the category to whch the unknown document belongs. The smlarty between the unknown document and each tranng document s measured by a smlarty functon, whch s crtcal n generatng accurate results [~3]. Neural networks (NNs) have been popular n text categorzaton and document retreval [4~6]. The most popular neural network archtecture for text classfcaton s a multlayer neural network traned wth the well-known backpropagaton algorthm usng supervsed learnng [7~22]. Self-Organzed Map(SOM), also known as the Kohonen network [23], s a popular unsupervsed neural network used n text classfcaton. A SOM network attempts to cluster the tranng data whle preservng the topologcal propertes of the nput space. Durng the tranng process, t bulds the network,.e. the map, by applyng a compettve process to nput examples. A fully traned SOM network can be used as a pattern classfer [3]. SOM has also been used n feature

3 Internatonal Journal of Intellgent Informaton Systems 205; 4(3): selecton for text categorzaton [24] and text clusterng [25, 26]. Probablstc Modelng has been used n text document classfcaton. A wdely used framework of probablstc model for text document classfcaton s derved from the Bayesan theorem of condtonal probablty [8, 27]: P( c) P( d c) P( c d ) =, P( d ) where d s the nput document, d s the feature vector that represents d, c s the th document class, P( d ) s the probablty that a document d represented by vector d occurs randomly, P( c ) the probablty of a randomly pcked document belongs to category c, P( d c) s the probablty of document d occurrng gven that document d s n document class c, and P( c d ) s the probablty of d, represented by vector d belongng to document class c. To smplfy the calculaton of the condtonal probablty, Naïve Bayes (NB) classfer has been appled to document classfcaton [9]. A Naïve Bayes classfer assumes that the condtonal probablty of each term n the feature vector for a gven class s ndependent of the condtonal probablty of other terms n the feature vector for a gven class. Ths assumpton s called class condtonal ndependence. It makes the computaton of the NB classfer far more effcent than the exponental complexty of a non-naïve Bayes approach snce t does not requre the term combnaton as predctors. Studes comparng dfferent classfcaton models have shown the performance of Naïve Bayes classfer s comparable wth neural network classfers and batch lnear classfer [28, 29]. Support vector machne (SVM) approach was developed based on the structural rsk mnmzaton theores n statstcal learnng [30]. A SVM maps the nput feature space to a hgh dmensonal space through a kernel functon. It then chooses the hyperplane wth the maxmum margn that can separate the postve from negatve examples n the feature space. Accordng to the structural rsk mnmzaton, the generalzaton error s bounded by the sum of the tranng set error and a term derved from the Vapnk-Chervonenks (VC) dmenson of the learnng machne. Unlke tradtonal artfcal neural networks (ANNs), whch mnmze the emprcal tranng error, SVM ams at mnmzng the upper bound of the generalzaton error, whch represents the error on unseen data for a classfer. Thus hgh generalzaton performance can be acheved. SVM can potentally learn a larger set of patterns and be able to scale better than artfcal neural networks [3]. Many publshed lteratures show that SVM learnng can lead to hgh performance n a broad range of pattern classfcaton applcatons [3~34]. SVMs have been popular n text classfcaton and categorzaton [35, 36]. SVM s desgned for two-class pattern classfcaton. However n text document classfcaton, most applcatons nvolve more than two categores of documents. Therefore a text categorzaton system developed usng SVMs usually use one of the followng two approaches to buld a multclass SVM system. Let N > 2 be the number of text categores. The frst approach would desgn N SVM classfers, each of whch dscrmnates the kth class aganst the remanng N- classes, k = to N. The SVM assocated wth the class k seeks a decson surface n the feature space that separates class k from all other classes. Collectvely the N SVM models result n N decson boundares [37]. When a new document x s submtted to the system, all N SVMs are appled to x, and the class represented by the SVM that generates the largest output value s assgned to the nput document x. The second approach s to tran N(N-)/2 SVMs, each of whch s traned to par-wsely separate two dfferent classes n the tranng data set. Dfferent votng strateges, such as Max-Wns [37] or drected acyclc graph (DAG) can be used to make the fnal classfcaton decson based on the results from the N(N-)/2 par-wse SVMs [38]. Even wth the advanced technologes dscussed above text mnng contnuous to be a challengng research area. In ths paper we present an nnovatve technque that combnes automated text document classfcaton wth doman knowledge to derve a search result that precsely matches the nput query. 3. SeaProSel: an Intellgent Vehcle Fault Dagnostc System All automotve companes develop ts own dagnostc codes that are used n ther vehcle fault dagnostc processes. Some companes may have several sets of dagnostc codes wth names such as CSC (Customer Symptom Codes), CCC (Customer Concern Codes), and etc. Wthout losng generalty, we refer to such a code system as a vehcle dagnostc code (VDC). The SeaProSel system s desgned to map a query descrpton to a specfc VDC that accurately matches the problem descrpton. Ths query-to-vdc mappng s a M-to-M mappng. Multple descrptons can be mapped to the same VDC, and one query can be mapped to multple VDCs due to ambguty n language. For example, the three customer descrptons gven n Secton have the same dagnostc code that represent the problem of ENGINE WOULD NOT START. In addton to the synonymy and polysemy problems exstng n general text documents, the documents occurrng n vehcle fault dagnostcs pose partcular challenges: typos, grammar errors, self-nvented terms and acronyms, napproprate usages of punctuatons, and etc. The proposed SeaProSel s desgned to deal wth these challengng ssues n order to generate a unque VDC that accurately matches the nput query. SeaProSel has a herarchcal matchng and searchng system that uses text data mnng technology to quckly retreve dagnostc code that accurately matches the

4 6 Y Lu Murphey et al.: Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng query,.e. the problem descrpton provded by a user. In the cases that no unque dagnostc code s found, the system wll follow the gven automotve dagnostc system to prompt questons to user n an attempt to obtan more nformaton from the user about the vehcle problem. It then uses the answer provded by the user to search for the correct dagnostc code. Ths prompt and search process can be repeated untl a unque dagnostc code s found. Fgure llustrates the system archtecture of the SeaProSel. At the frst stage the SeaProSel drectly searches for a VDC that matches the nput query by usng a Vector Space Model. We present two approaches, frst s a Term- Code-Weght (TCW) model and the second a latent semantc ndexng (LSI) model. The TCW model s a weghted matrx that s obtaned through a machne learnng algorthm. The LSI model uses the reduced-rank matrces to approxmate the orgnal TCW matrx. Each category and query s converted nto a low-dmensonal vector n a LSI space. Both models along wth the crtcal research ssues related to the two models, such as term selecton and weght functons, are dscusson n depth n secton 3.A. If no exactly matched VDC s found, and the system outputs a lst of best matched VDCs and then enters the second stage, Prompt & Select, whch follows the vehcle dagnostc engneerng herarchy to prompt questons for user to select the more detaled and better descrbed vehcle problem. Based on the user s answers, the system generates a new lst, VDC_lst2, whch s a sublst of VDC_lst and contans the VDCs that satsfy the user selected descrptons. The Prompt & Select process s repeated untl a satsfactory VDC s found. Ths process s descrbed n Secton 3.B. Fgure. Overvew of SeaProSel System. A. Buldng a dagnostc document classfcaton model usng machne learnng In text data mnng, a Vector Space Model (VSM) s an algebrac representaton of text documents that contans vectors of dentfers or ndex terms such as words or phrases [39, 40]. In a VSM, all documents are represented n term weghted vectors. In ths research we nvestgate two VSM models, TCW(Term-Code-Weght) matrx and LSI matrx. Both models are bult usng machne learnng algorthms to represent vehcle fault dagnostc knowledge. The two-dmensonal TCW matrx, denoted as A MxN, s generated from a tranng data set Tr, where M s the number of effectve terms n Tr, and N s the number of dagnostc categores n Tr. The tranng data set contans samples of customer descrptons of vehcle problems, and each descrpton s assocated wth a correct dagnostc code assgned by automotve dagnostc experts. The machne learnng algorthm conssts of two maor computatonal components: Term Extracton, and TCW Matrx Constructon. The Term Extracton process nvolves the detecton of a lst of dstnctve and effectve terms, removal of punctuatons and stoppng words, word stemmng and word varaton detecton. The TCW matrx contans M terms generated by the Term Extracton process, and N vehcle dagnostc codes, whch represent document classes. An entry n a TCW, A MxN (,), represents the weght of the th term assocated wth the th dagnostc code, whch s generated based on the statstcal analyss of the th term occurrng n the tranng documents labeled wth th dagnostc code. The TCW matrx s used to map drectly from a problem descrpton to the best matched dagnostc codes. The TCW matrx s bult usng a machne learnng algorthm that contans the followng maor processes, Document preprocessng Document ndexng Term weght generaton Once we obtan the TCW matrx, the VDC that matches an nput document can be generated by the process, Vehcle Fault Dagnostcs usng a smlarty functon. ) Document Preprocessng: Two data nose problems assocate wth casual text documents, one s mproper use of punctuatons and specal symbols, and another mslabelng document categores n tranng data. The msuse of punctuatons n the text documents makes the text categorzaton less accurate. For example, the punctuatons n ACCEL., Dead.NEEDS make the two terms dfferent from ther correct forms, ACCEL and Dead and NEEDS. The exstence of such napproprate use of punctuaton results n a lot of addtonal entres n the term lst that actually should not exst, and cause msleadng statstcs. Three dfferent types of processes are mplemented. Frst type of processes nvolves the search for specal symbols or punctuatons appearng at one end or both ends of a word, for example, ACCEL., *DIESEL*, ****

5 Internatonal Journal of Intellgent Informaton Systems 205; 4(3): TOW, and IN ****. These symbols can be removed drectly wthout any possblty of alternatng the meanng of the word. The second type of processes s to search for specal symbols or punctuatons such as., (, ), &, *, /, +, etc, These symbols are ether removed or replaced by a space. The thrd type of processes s more sophstcated. It manly concerns the punctuatons such as,, :,., etc. When they occur as a part of ntals, numercal or tme/date formats, such as A.C., P.I.D., 2,000, 6.00, 4:00, 4 4, they are kept as part of the orgnal strng. If they occur between two words for example, engne,check, the punctuatons are replaced wth a space. In supervsed machne learnng, each tranng data sample s assgned a target class code,.e. dagnostc code n our applcaton. The class code assgnment s stll been done largely by dagnostc experts. Some documents may have the class code mssng, others may be assgned of multple codes because the person who assgn the class codes s not sure whch one s correct. We developed the followng procedure to deal wth ths problem. For all the documents wth mssng labels, we buld a standard dagnostc code matrx, DC, based on standard descrptons of dagnostc code, whch are avalable n mechancs handbook. We extract the tranng documents wth specfc dagnostc codes to form a subset of tranng data, denoted as TrC, whch s then used to generate the TCW p q matrx C R and term lst T_Lc, where p s the number of ndex terms, and q s the number of codes. For a document q wth n labels, X, X2,, Xn, the relevance between q and X s calculated usng the cosne smlarty functon shown below: s = sm( q, CX ) = Let the smlarty scores between q and the n dagnostc code vectors be s, s 2,, s n, and S max = Max{s =,, n }. The dagnostc code correspondng to S max s assgned to document q. In the cases that multple dagnostc codes are useful, we set a threshold th, and f the dfference between S max and s s smaller than th, dagnostc code X s also assgned to the document q. 2) Document Indexng: The terms used n the TCW matrx need to be derved automatcally from tranng documents, and carefully selected so they effectvely represent document contents. We developed the followng document ndexng algorthm to extract effectve ndexng terms automatcally from tranng data. Let us assume a collecton of documents are to be classfed nto N dagnostc codes or categores, (C, C 2, C N ), and we have tranng documents Tr, Tr 2,, Tr N, where Tr contans the tranng documents belongng to category, =,, N. The obectve of the followng algorthm s to generate a lst of ndexng p = q ( C ) X p p 2 2 ( q) (( CX ) ) = = 2. terms, T_L, where each term t T_ L that effectvely represents the contents n the documents contaned n Tr, where Tr = Tr... Tr N. The document ndexng algorthm contans the followng maor computatonal components. Step : Extract all dstnct terms from Tr to form an ntal term lst T_L. Step 2: Generate a stop word lst, stop_word lst, whch s used to make sure those words do not occur n the term lst T_L. T_L contans the words, such as the, about, an, and, etc. that provde lttle nformaton for document class dscrmnaton. It also contans words have no specfc meanng n a gven applcaton doman. For example, n vehcle fault dagnostc documents, terms such as customer, states, sad, ck, cust, drvng, etc. occur n documents of all classes. Step 3: We mplemented the well-known Porter Stemmng algorthm (or Porter stemmer ) [4] and appled t to the tranng data to generate groups of words that have the same stem, and the varant word forms s represented by one root word. The Porter Stemmng algorthm s based on the dea that the suffxes n the Englsh language mostly consst of a combnaton of smaller and smpler suffxes. It has fve computatonal processes. In each step, f a suffx rule matches wth a word, then the condtons attached to that rule are tested on the resultng stem. A condton, for example, may be the number of stem length after suffx removal must be greater than the threshold. For example, the suffx of ng can be safely removed from the word sngng, and the remanng part,.e. the stem sng replaces the orgnal word. Stemmng word processng reduces the dmensonalty to the word lst sgnfcantly. Step 4: Elmnatng low-frequency words. Two frequency thresholds d_th, w_th, are defned to remove words occurrng nfrequently. A term s removed from T_L f ts occurrng frequency n the number of dfferent vehcle dagnostc code categores s less than d_th or ts occurrng frequency n all tranng documents s less than w_th tmes. The optmal values for d_th and w_th can be obtaned through experments. Step 5: Elmnatng words evenly dstrbuted across all categores. Words that have even dstrbutons among all document categores are also removed from T_L, snce they appear n the same frequency over all document categores. Step 6. Output T_L, whch s used for buldng the TCW matrx descrbed below. 3) Modelng dagnostc documents usng a TCW Matrx: An entry n a TCW matrx A MxN, denoted as a,, s the weght of the th term n T_L belongng to the th VDC, for =,, M and =,, N. The weght n each entry n A MxN s a functon of the occurrence frequency of a term wth respect to a category. The functon s referred to a weght functon. Term weghtng s an mportant component for mprovng performance n the VSM based text mnng [42]. Terms need to be weghted accordng to ther mportance for a partcular

6 63 Y Lu Murphey et al.: Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng document category and for the whole document collecton. A useful ndex term must fulfll a dual functon: t occurs n the documents of the same category wth hgh frequency so as to render the document retrevable, and t s useful to dstngush the documents of one category from the others. A term weght functon s usually a combnaton of a local weght and a global weght functon. The followng descrbes three popular local weght functons. Term Frequency: l = tf, whch s the occurrence frequency of term wthn document category, and 0 f tf = 0 Bnary: B =, f > 0 Log functon: l = Log 2 (tf +). A local weght functon provdes a measure of how well that a term descrbes the document contents n a partcular category. However, usng only local weght s not enough to evaluate the mportance of a term n the document classfcaton. Some terms, due to ther rarty use n a partcular category of documents, are more mportant n dentfyng these documents than others do. Some terms, however, because they appear n many documents, are not useful to dscrmnate documents n one category from the others. A global weght measure s used to reflect the overall mportance of the ndex term n the entre document collecton. Four well-known global weghts ntroduced by Dumas [43] are: Table. Popular weght functons. Normal: 2, tf gf GfIdf:, df ndocs Idf: log2 +, df plog( p) tf Entropy: where p =, log( ndocs) gf where df, the document frequency, s the total number of documents n the document collecton,.e. tranng data, that contan term, gf, the global frequency, s the frequency of term occurrng n the entre document collecton, and ndocs s the total number of documents n the document collecton. Dfferent weght functons transform the raw occurrence frequency of a term n a document to dfferent weghts. In general, the entry a, of a TCW matrx A s a functon of a local and a global weght components. The most commonly used term weght functons are lsted n Table. These weght functons have been evaluated through extensve experments and the results are dscussed n Secton 4. Based on these experments, the proposed SeaProSel system uses the tf-df weght functon n ts VCD drect search component. Entropy Gfdf Normal tf-df B-df tf * tf *( p ) tf * ndocs ndocs log( p ) gf 2 tf * log + B * log + log( ndocs) tf df df df B-normal Log-df Log-entropy log-gfdf log-norm B * log(tf +) * log(tf +) * log(tf +) * gf ndocs ( ) log(tf +) * 2 p (log + log( p) tf ) df log( ndocs) df tf 2 Fgure 2. A rank-k approxmaton matrx.

7 Internatonal Journal of Intellgent Informaton Systems 205; 4(3): ) Modelng dagnostc documents usng a LSI Matrx: A popular varant of TCW matrx s constructed usng latent semantc ndexng (LSI) method [44~46]. It uses the reduced-rank matrces to approxmate the orgnal TCW matrx. Each category and query s converted nto a low-dmenson vector and mapped nto the LSI space. Relevance measures for the user query are also performed n ths space. A LSI matrx s bult from the TCW matrx. A s T decomposed nto the product of three matrces A = UΣV, T T where U U = V V = I n, and Σ = dag( σ,..., σ n ), σ >0 for r, σ = 0for r+. Matrces U and V contan left and rght sngular vectors of A, respectvely, and dagonal matrx Σ contans sngular values. A rank-k approxmaton to A s T represented Ak = UkΣkVk, where Uk and Vk are constructed by takng only the k largest sngular values of Σ along wth ther correspondng columns n the matrces U and V respectvely. Ak s the unque matrx of rank k that s closest n the least squares sense to A. Fg. 2 llustrates the relatonshp between A and Ak. The SVD method attempts to capture most mportant underlyng structure n the assocaton of terms and documents. Snce k s usually much smaller than the number of terms m, some nose are elmnated by deletng low rankng columns. Because SVD s a strctly mathematcal method, the contents of the matrces are not nterpretable wth respect to the documents or terms t analyzes. The best rank k n the SVD model depends on the tranng data, whch wll be further dscussed n the experment secton. However, t s a powerful technque to reduce the dmenson of any term-by-document matrx. 5) Vehcle Fault Dagnostcs usng TCW and LSI matrces: The obectve of vehcle fault dagnostcs s to classfy the user query to a dagnostc code that accurately matches the nput query. Vehcle Fault Dagnostcs usng TCW algorthm conssts of two processes, formulatng query vector, and measurng smlarty between the term vector and a column vector n the TCW. An nput problem descrpton d s frstly preprocessed usng the same procedures as descrbed earler, ncludng removng unnecessary punctuatons, stop words and word stemmng, etc. It s then transformed nto a term vector q wth the same length M of T_L. Let q = ( q,..., q ) T M, where q s the frequency of the th term on T_L occurred n the query document d, =,, M. The classfcaton decson on whch VDC category that best matches wth q s made based on the smlarty measure between q and each column vector of A. Let the column vectors of A be a (,..., ) T = a am, =,, N. We use the followng cosne based smlarty measure to generate a smlarty score between the vector qand the column vector a, q a c r( q, a) = = q a After smlarty score s obtaned for every VDC category and the nput query, there are two approaches by whch our system can use to determne f the dagnostc codes wth the best smlarty score should be returned as matched code class. One method s to use a threshold: all dagnostc categores wth smlarty scores larger than the threshold are regarded as relevant and assgned to the query. The second method s to output the VDC category represented by the column vector n the TCW matrx that has the hghest smlarty score wth the nput query vector. In the LSI model, a user s query s represented by a vector n the reduced-rank space. From a user query, we frst construct the same term vector q as n the TCW model. Then q s converted nto the vector n the reduced-rank T space by the followng formula: qk = q UkΣ k. The classfcaton decson on whch VDC category that best matches wth s made based on the smlarty measure q k q k between and each column vector of Ak, the same process as n the TCW classfcaton process descrbed above. Both TCW based and LSI based VDC drect search system wll be evaluated n Secton 4. B. Integratng automatc search wth vehcle engneerng structure The proposed vehcle fault dagnostc system, SeaProSel, s an ntegraton of drect search usng the TCW matrx and the progressve prompt and select process based on a herarchcal vehcle fault dagnostc engneerng structure. A dagnostc code system s usually organzed n a herarchcal structure that contans multple levels of functonal descrptons, and each level provdes descrptons about a class of symptoms, specfc functon or component faults, condtons, and etc. In ths representaton, the vehcle fault dagnostc codes are represented n the leaf nodes, the root of the tree s the entre vehcle system, and the subsequent levels represent the herarches of subsystems, components or devces. Fgure 3 shows an example. The hghest layer has three functon groups. Under each functon group, there are sub-functon groups. For each functon group at level 2, there are component groups. Under each component group, there are dfferent categores of devatons, under each of whch, there s a layer of condtons. Each node n the tree s accompaned wth a bref descrpton. For example, a descrpton for a functon group could be Engne wth mountngs and equpment, a descrpton for a component category under the functon group could be startng, the descrptons for condtons under such functon group could be engne turns, cold start or unsure when, and a descrpton for a VDC could be, ENGINE WOULD NOT START. M qa M M 2 2 q a = =

8 65 Y Lu Murphey et al.: Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng The SeaProSel system uses the TCW matrx to drectly obtan hghly matched dagnostc codes, nteracts wth user by promptng dagnostc questons based on the vehcle engneerng structure, takes user s selecton/answer to ether generate a dagnostc code that accurately match the user s answers or lead to the next level of functonal prompts. Fgure 4 shows the archtecture of the SeaProSel system developed based on the vehcle fault dagnostc system llustrated n Fgure 3. In Fgure 4, SQ represents the nput problem descrpton, SQ 2 represents the selected level 2 functon group, and SQ 3 represents the selected component/functonalty. The SeaProSel algorthm has the followng maor computatonal steps. Fgutre 3. A herarchcal vehcle fault dagnostc system archtecture. Fgure 4. Overvew of processes n SeaProSel for vehcle fault dagnostcs. Step : process the nput query document SQ usng the procedure, VDC Drect Search usng TCW. Let the output of the procedure be F (SQ) = {VDC, conf,, VDC k, conf k }, where conf conf 2, conf k Step 2: If = conf - conf 2 s hgh, then output VDC and ext Step 3: Fnd c k such that c = conf c conf c+ s hgh Step 4: Fnd the node H n the tree structure such that

9 Internatonal Journal of Intellgent Informaton Systems 205; 4(3): Found_Codes = {VDC,, VDC c } are all H s descendants, and no other nodes n the tree has ths property except H s parent nodes. Example : If Found_Codes = {VDC 8, VDC 9, VDC 0 }, the H code s Devaton 3. Example 2: If Found_Codes = {VDC 6, VDC 9, VDC 0 }, the H code s Functon Group. Step 5: Call the followng Prompt & Select procedure. Step 5. Followng the H node s drect descendants, and present the descrptons assocated wth the descendants to the user. For example, the descrptons of Condton, Condton 2 and Condton 3 under Devaton 3 are presented to the user Step 5.2 Based on the user selecton, f the unque VCD s found, output the VCD and ext the program. Step 5.3 Follow the descendant node selected by the user and fnd the VCDs that match the user s selectons, and denote them F2 (SQ2) = {VDC, conf,, VDC k, conf k }, where conf conf 2, conf k Step 5.4 If 2 = conf conf 2, s hgh, then output VDC and ext Step 5.5: goto Step Experments We were provded by an automotve company wth the herarchcal vehcle fault dagnostc system llustrated n Fgure 3. The herarchcal vehcle fault dagnostc system has 540 vehcle dagnostc codes, and each node s accompaned wth a general descrpton of the vehcle problems the node covers. We conducted three dfferent sets of experments to evaluate, respectvely, dfferent weght functons, TCW matrx verse the LSI matrx, and the entre SeaProSel system. The TCW and LSI matrces were all traned on a data set of 200,000 real-world customer descrptons of vehcle problems. After removng extraneous documents and elmnatng documents wth wrong labels, we had 99,552 vald documents as tranng data. After data preprocessng such as punctuaton preprocessng, Portel stemmng and typo removal, the number of ndex terms generated from the tranng data were reduced from 7033 to The TCW and LSI components as well as the weght functons were evaluated on TEST6K, a testng set of 6000 vehcle dagnostc documents collected from dfferent retaler servce shops n USA durng one week tme perod. All test documents were labeled wth true dagnostc codes by auto techncans. The followng evaluaton crtera are used to analyze system performances. For each nput query, two levels of matchng accuracy are measured: the low level of matchng (LLM) and hgh level matchng (HLM). If the VDC Drect Search system usng ether TCW or LSI matrx returns the correct VDC code,.e. t exactly matches one of the leaf nodes n the herarchcal vehcle fault dagnostc system shown Fgure 3, then t s a low level matchng. In ths case, the ProSeaSel system wll output the VDC code and termnate the search. If the VDC Drect Search system returns the code that does not match any of the leaf nodes but matches the correct hgher level categores n the herarchcal system shown n Fgure 3, then t s a hgh level matchng. For example, f an nput query s true VDC s vdc, but the output of the VDC Drect Search system s vdc5. Snce vdc and vdc5 have the same devaton category, the system has a wrong LLM, but a correct HLM. Based on these two types of matchng crtera, we defne two accuracy measures of system performances when a batch of test queres s used as test data, Exact Match Rate (EMR) and Category Match Rate (CMR). EMR s defned as the number of correct matched outputs n LLM over the sze of the tranng data, and CMR the number of correct outputs n HLM over the sze of the tranng data. EMR 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 0.00% 0.00% VSM(Cosne) VSM(Spearman) entropy Gfdf normal tf-df B-df B-normal log-df logentropy TCW(Cosne) TCW(Pearson) VSM(Pearson) TCW(Spearman) LSI(cosne) log-gfdf log-norm weght schemes Fgure 5. Effects of Weght Schemes and Smlarty Functons.

10 67 Y Lu Murphey et al.: Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng Fgure 6. Performances of the LSI systems of varous K-values. (a) (b) Fgure 7. Two examples of query processes by SeaProSel system

11 Internatonal Journal of Intellgent Informaton Systems 205; 4(3): A. Evaluaton of weght functons As descrbed before, several weght schemes can be used n both TCW and LSI models. All weght functons shown n Table were mplemented n both the VSM and the LSI models. We used three dfferent smlarty functons: Cosne, Pearson, and Spearman n the query classfcaton processes. The results are shown n Fgure 5. It shows that the best result s generated by the TCW model that uses the tf-df weght functon combned wth cosne smlarty functon. Pearson smlarty measure used n the TCW model combned wth the tf-df weght functon acheved the smlar performance to that of cosne measure. Both of them are better than the Spearman smlarty functon. B. Evaluaton of TCW and LSI models The accuracy of the LSI model s heavly affected by value of rank, K. Snce the results above ndcates that the weght scheme tf-df combned wth cosne measure produced the hghest EMR, they are used n both TCW and LSI systems. In order to explore the effects of parameter K, we appled varous K values to construct the sngular value T decomposton matrces, Ak = UkΣ kvk, and compared the performances of these SVD matrces wth the TCW matrx. Theoretcally, the range of K-value s between to the number of columns of TCW matrx. However, our experments shows that EMR degrades rapdly when K-value s larger than 40 n ths applcaton. Fgure 6 showed that the CMR and EMR of LSI models wth K values between and 40, as well as the performances of the TCW classfcaton system. It appears that the best performance has been acheved wth K equal to 7. But the TCW outperformed the best LSI system by more than %. C. Evaluaton of SeaProSel We tested the entre SeaProSel system on a set of 3273 query examples, none of whch were ncluded n the tranng data. The performances are analyzed as follows. 97% of the test queres were answered wth unque and correct VDCs by the VDC drect search usng TCW Model wthout gong to the Prompt & Select process. The other 3% of the test queres were processed through subsequent Prompt & Select processes. At the end of the processes, correct VDCs were found for all test queres. Overall the prompt and select process was used at the rate of 0.065/query. Fgure 7 shows the processes of two test queres by the SeaProSel system. In Fgure 7 (a), the test query was Instrumentaton warnng lghts and chmes. The VDC drect search component returned multple mached VDCs. By fndng the H node of these VDCs, the Prompt & Select component present to the user the descrptons of dfferent problems at the component functonalty level (see Fgure 3) related to the nput query. After the user selected Text wndow and warmng symbol, the SeaProSel went on to the questons at the devaton level. When the user selected Red symbol and text message, a unque VDC code s found that matches the nput query as well as the answers selected by the user durng the Prompt & Select processes. In the second example (see Fgure 7 (b)), the test query s Engne would not start. The VDC drect search component returned multple VDCs, whch s represented as VDC_Lst. By fndng the H node of these VDCs on the VDC_lst, the Prompt & Select component gave descrptons of dfferent problems at level 2 functon groups related to the VDC_Lst. After the user selected Engne wth mountngs and equpment, the SeaProSel went on to dsplay the questons under the selected node at the Component Functonalty level. When the user selected Engne, the SeaProSel went on to dsplay the questons under the selected node at the Devaton level. When the user selected Engne turns, the SeaProSel went on to dsplay the questons under the selected node at the Condton level. When the user selected clckng sound at start attempt, a unque VDC, E2, s found that matches the nput query as well as the answers selected by the user durng the Prompt & Select processes. 5. Concluson We have presented an ntellgent vehcle fault dagnostc system, SeaProSel. SeaProSel conssts of two maor components, VDC Drect Search usng a VSM, and Prompt & Select. Two VSM technologes were developed, mplemented and evaluated, a TCW model and a LSI model. Both models were developed based on machne learnng and text mnng technques. The Prompt & Select component s a system that s bult upon a vehcle fault dagnostc engneerng structure wth a progressve process of query, select, and search to acheve effcent and accurate classfcaton of vehcle problem descrptons. We also presented algorthms for preprocessng text documents that contans spellng errors, typos and self-nvented terms, choosng effectve weght functons, and buldng an effectve TCW matrx and LSI matrces from a gven tranng data set. We have conducted extensve experments to evaluate the algorthms and the entre SeaProSel system. Based on our expermental results we conclude that, n the applcaton doman of vehcle fault dagnostc text documents, the tf-df weght functon gves the best performance when t s used n ether TCW or LSI models wth the smlarty functon beng ether Cosne or Pearson. In terms of the optmal ranks n the LSI systems, our experments show that the optmal K values are n the range of K=3 through K=9, wth K=7 gvng the best performance. When we compare the performances of the TCW model wth the best LSI model,.e. the LSI system used K=7, we notce that the TCW model outperforms the best LSI system by more than %. The SeaProSel system s mplemented based on a realworld vehcle dagnostc code system wth 540 dfferent code classes, and s evaluated on 3273 query documents, whch are verbatm vehcle problem descrptons by customers. The SeaProSel system acheved 97% accuracy n fndng the dagnostc codes drectly based on the VDC Drect Search usng TCW. Through the nnovatve processes of Prompt and Select procedure, the SeaProSel was able to fnd the correct dagnostc code 00% for all test queres.

12 69 Y Lu Murphey et al.: Vehcle Fault Dagnostcs Usng Text Mnng, Vehcle Engneerng Structure and Machne Learnng Our maor contrbutons are summarzed as follows. () Presented an nnovatve computatonal framework, SeaProSel, that combnes automatc search wth engneerng structural search through a Prompt & Select strategy. Expermental results show that SeaProSel s effectve n searchng for dagnostc code accurately matchng a gven problem descrpton. (2) Presented new algorthms for learnng automotve dagnostc code usng TCW matrx and LSI model. Based on our expermental results, TCW s more effectve n the applcaton of casual text document categorzaton (3) Presented new algorthms for preprocessng engneerng dagnostc documents. Although the applcaton doman we presented s n the area of vehcle fault dagnostc documents, some technques we presented are applcable to other applcatons that nvolve processng casual text documents such as automated queston answerng servces, and classfcaton of Tweets, nstant messages, and e-mal messages. References [] Fang, J., Guo, L., Wang, X. D., & Yang, N Ontology- Based Automatc Classfcaton and Rankng for Web Documents. Fourth Internatonal Conference on Fuzzy Systems and Knowledge Dscovery -FSKD, [2] Zhuang, F. Z.; Luo, P.; Shen, Z. Y.; He, Q.; Xong, Y. H.; Sh, Z. Z. & Xong, H Mnng Dstncton and Commonalty across Multple Domans Usng Generatve Model for Text Classfcaton. IEEE Transactons on Knowledge and Data Engneerng, Volume: 24, Issue:, Page(s): , 202. [3] Huang, Y.H., Selya, N., Murphey, Y. L., & Fredenthal, R. B Classfyng Independent Medcal Examnaton Reports usng SOM networks. Proceedng of the 6th Internatonal conference on Data Mnng, Las Vegas, Nevada, USA, 200, p [4] Mencıa, E. L., Park, S. H., & Fürnkranz, J Effcent votng predcton for parwse multlabel classfcaton. Neuro computng 73 pp.64 76, 200. [5] Zeng, Q.; Zhang, X.; Zhang, W.; L, Z. & Lu, L Extractng Clncal Informaton from Free-text of Pathology and Operaton Notes va Chnese Natural Language Processng. 200 IEEE Internatonal Conference on Bonformatcs and Bomedcne Workshops, pp , Hong Kong, 200. [6] Huang, Y. H., Murphey, Y. L., & Ge, Y Automotve dagnoss typo correcton usng doman knowledge and machne learnng. IEEE Symposum Seres on Computatonal Intellgence, 203. [7] Creecy, R.M., Masand, B. M., Smth, S. J., and Waltz, D. L Tradng MIPS and memory for knowledge engneerng: classfyng census returns on the Connecton Machne, Communcatons of the ACM, 35(8): p , 992. [8] Sebastan, F Machne learnng n automated text categorzaton. ACM Computng Surveys, (): p [9] Yang, Y. & Lu, X A re-examnaton of text categorzaton methods. Proc. 22th ACM Int. Conf. on Research and Development n Informaton Retreval (SIGIR'99) Berkeley, CA. [0] Masand, B., Lnoff, G., & Waltz, D Classfyng news stores usng memory based reasonng. Development n Informaton Retreval, 992: ACM Press, New York, US. [] Radovanovć, M. & Ivanovć, M Text mnng: approaches and applcatons, Nov Sad J. Math. Vol. 38, No. 3, 2008, [2] Lu, F. & Ba, Q. Y Refned weghted K-Nearest Neghbors algorthm for text categorzaton. Internatonal Conference on Intellgent Systems and Knowledge Engneerng (ISKE), 200. [3] Balwan. V., Kumar, V., Kumar, P., & Pascual, J KNN based Machne Learnng Approach for Text and Document Mnng. Internatonal Journal of Database Theory and Applcaton, Vol. 7, No., 204, pp [4] Baeza-Yates, R., Rbero-Neto, B., Modern Informaton Retreval, 999: Addson Wesley. [5] Syu, I., Lang, S.D. & Deo, N.; 996. Incorporatng latent semantc ndexng nto a neural network model for nformaton retreval. Proceedngs of the ffth nternatonal conference on Informaton and knowledge management, 996. [6] Chen. Z.H., N, C. W. and Murphey, Y. L., Neural Network Approaches for Text Document Categorzaton. IEEE Internatonal Jont Conference on Neural Networks, July, [7] Zhang, M.L. and Zhou, Z. H Multlabel Neural Networks wth Applcatons to Functonal Genomcs and Text Categorzaton. IEEE Transacton n Knowledge and Data Engneerng, Vol. 8, Issue 0, Oct [8] Cho, S.B. and Lee, J. H., Learnng Neural Network Ensemble for Practcal Text Classfcaton. Lecture Notes n Computer Scence, Volume 2690, Pages , [9] Yu, B.; Xu, Z. B. & L, C. H Latent semantc analyss for text categorzaton usng neural network. Knowledge- Based Systems, 2- pp , 2008 [20] Th, H. N. T.; Huu, O. N. & Ngoc, T. N. T.;203. A supervsed learnng method combne wth dmensonalty reducton n Vetnamese text summarzaton. IEEE Computng, Communcatons and IT Applcatons Conference (ComComAp), 203. [2] Vnodhn, G. & Chandrasekaran, R.M.; 204. Sentment classfcaton usng prncpal component analyss based neural network model. 204 Internatonal Conference on Informaton Communcaton and Embedded Systems (ICICES), 204 [22] L, C. H. and Park, S. C., An effcent document classfcaton model usng an mproved back propagaton neural network and sngular value decomposton. Expert Systems wth Applcatons, 36, pp , [23] Kohonen, T The self-organzng map. Proc. of the IEEE, 9, , 990.

13 Internatonal Journal of Intellgent Informaton Systems 205; 4(3): [24] Manomasupat, P., and Abmad k. Feature Selecton for text Categorzaton Usng Self Orgnzng Map. 2nd Internatonal Conference on Neural Network and Bran, 2005, IEEE press Vol 3, pp , [25] Lu, Y.C.; Wang, X.L.; & Wu, C.; ConSOM: A conceptonal self-organzng map model for text clusterng. Neurocomputng, 7(4-6), , [26] Lu, Y.C., Wu, C., & Lu, M. 20. Research of fast SOM clusterng for text nformaton. Expert Systems wth Applcatons, 38(8), , 20. [27] Lews, D.D Nave (Bayes) at forty:the ndependence assumpton n nformaton retreval. Proceedngs of ECML-98. Sprnger Verlag, Hedelberg, 998. [28] Fredman, N.; Geger, D.; Goldszmdt. M.; 997. Bayesan Network Classfers. Machne Learnng, November 997, Volume 29, Issue 2-3, pp [29] Theodords, S.; 205. Machne Learnng: A Bayesan and Optmzaton Perspectve. Academc Press, 205. [30] Vapnk, V.; 995. The Nature of Statstcal Learnng Theory. Sprnger Verlag, New York, 995. [3] Mukkamala, S., Janosk, G., Sung, A H Intruson Detecton Usng Neural Networks and Support Vector Machnes. Proceedngs of IEEE Internatonal Jont Conference on Neural Networks, IEEE Computer Socety Press, pp [32] Murphey, Y.L.; Chen, Z.H.; Putrus, M. & Feldkamp, L.A SVM learnng from large tranng data set. IEEE Internatonal Jont Conference on Neural Networks, July, [33] Hong, H.B.; Murphey, Y.L.; Gutchess, D. & Chang, T.S Identfyng knowledge doman and ncremental new class learnng n SVM. IEEE Internatonal Jont Conference on Neural Networks, July, [34] Chapelle, O. & Vapnk, V Model selecton for support vector machnes. In S.A. Solla, T.K. Leen, and K.R. Muller, edtors, Advances n Neural Informaton Processng Systems, volume 2. MIT Press, Cambrdge, MA, [35] Zhang, W.; Yoshda, T.; & Tang, X Text Classfcaton based on Mult-word wth Support Vector Machne. Knowledge-Based Systems, vol. 2, [36] Fenerer, I. & Karatzoglou, A., 200. Support Vector Machnes for Large Scale Text Mnng n R. 9th Internatonal Conference on Computatonal Statstcs, 200. [37] Hsu, Chh-We and Ln, Chh-Jen, A Comparson of Methods for Multclass Support Vector Machnes. IEEE Transactons On Neural Networks, VOL. 3, NO. 2, MARCH [38] Platt, J. C., Crstann, N., and Shawe-Taylor, J., Large margn DAG s for multclass classfcaton. Advances n Neural Informaton Processng Systems. Cambrdge, MA: MIT Press, vol. 2, pp , [39] Huang, L.P Intellgent Systems for text categorzaton and retreval. M.S. Thess, Department of Electrcal and Computer Engneerng, Unversty of Mchgan-Dearborn, [40] Raghavan, V.V., & Wong, S.K.M A Crtcal Analyss of Vector Space Model for Informaton Retreval. Journal of the Amerca Socety for Informaton Scence, (5): [4] Porter, M.F An algorthm for suffx strppng. Readngs n Informaton Retreval, 997. Morgan Kaufmann Publshers Inc. San Francsco, CA, USA. [42] Dumas, S.T., 99. Improvng the retreval of nformaton from external sources. Behavor Research Methods, Instruments and Computers, (2): p [43] Dumas, S.T., 990. Enhancng performance n latent semantc ndexng (LSI) retreval. Techncal Report Techncal Memorandum, Bellcore, 990. [44] Dumas, S.T., Furnas, G. W., Landauer, T. K. and Deerwester, S Usng latent semantc analyss to mprove nformaton retreval,. In Proceedngs of CHI'88: Conference on Human Factors n Computng New York: ACM. [45] Jessup, E. R., & Martn, J.H., 200. Takng a new look at the latent semantc analyss approach to nformaton retreval. Computatonal nformaton retreval, 200: p [46] Sebastan, F. & Rcerche, C. N., Machne learnng n automated text categorzaton. Journal of ACM Computng Surveys, Volume 34, Issue, March 2002.

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Impact of a New Attribute Extraction Algorithm on Web Page Classification

Impact of a New Attribute Extraction Algorithm on Web Page Classification Impact of a New Attrbute Extracton Algorthm on Web Page Classfcaton Gösel Brc, Banu Dr, Yldz Techncal Unversty, Computer Engneerng Department Abstract Ths paper ntroduces a new algorthm for dmensonalty

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning Journal of Computer Scence 7 (3): 400-408, 2011 ISSN 1549-3636 2011 Scence Publcatons SRBIR: Semantc Regon Based Image Retreval by Extractng the Domnant Regon and Semantc Learnng 1 I. Felc Raam and 2 S.

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples

Reliable Negative Extracting Based on knn for Learning from Positive and Unlabeled Examples 94 JOURNAL OF COMPUTERS, VOL. 4, NO. 1, JANUARY 2009 Relable Negatve Extractng Based on knn for Learnng from Postve and Unlabeled Examples Bangzuo Zhang College of Computer Scence and Technology, Jln Unversty,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD

User Tweets based Genre Prediction and Movie Recommendation using LSI and SVD User Tweets based Genre Predcton and Move Recommendaton usng LSI and SVD Saksh Bansal, Chetna Gupta Department of CSE/IT Jaypee Insttute of Informaton Technology,sec-62 Noda, Inda sakshbansal76@gmal.com,

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Professional competences training path for an e-commerce major, based on the ISM method

Professional competences training path for an e-commerce major, based on the ISM method World Transactons on Engneerng and Technology Educaton Vol.14, No.4, 2016 2016 WIETE Professonal competences tranng path for an e-commerce maor, based on the ISM method Ru Wang, Pn Peng, L-gang Lu & Lng

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information