ABSTRACT. WEIQING, JIN. Fuzzy Classification Based On Fuzzy Association Rule Mining (Under the direction of Dr. Robert E. Young).

Size: px
Start display at page:

Download "ABSTRACT. WEIQING, JIN. Fuzzy Classification Based On Fuzzy Association Rule Mining (Under the direction of Dr. Robert E. Young)."

Transcription

1 ABSTRACT WEIQING, JIN. Fuzzy Classfcaton Based On Fuzzy Assocaton Rule Mnng (Under the drecton of Dr. Robert E. Young). In fuzzy classfcaton of hgh-dmensonal datasets, the number of fuzzy rules ncreases exponentally wth the ncrease of attrbutes. Fuzzy assocaton rule mnng wth approprate threshold values can help to desgn a fuzzy classfer by sgnfcantly decreasng the number of nterestng rules. In ths dssertaton, we nvestgate the way to ntegrate fuzzy assocaton rule mnng and fuzzy classfcaton. Frst, the framework of fuzzy assocaton rule mnng s presented whch ncorporates fuzzy set modelng n an assocaton rule mnng technque. It avods the sharp boundary problem caused by arbtrary determnaton of ntervals on the doman of quanttatve attrbutes, meanwhle presentng natural and clear knowledge n the form of lngustc rules. We study the mpact of dfferent fuzzy aggregaton operators on the rule mnng result. The selecton of the operator should depend on the applcaton context. Based on the framework of fuzzy assocaton rule mnng, we propose a heurstc method to construct the fuzzy classfer based on the set of fuzzy class assocaton rules. We call ths method the FCBA approach, where FCBA stands for fuzzy classfcaton based on assocaton. The obectve s to buld a classfer wth strong classfcaton ablty. In the FCBA approach, we use the composte crtera of fuzzy support and fuzzy confdence as the rule weght to ndcate the sgnfcance of the rule. Through our study, we fnd t s mportant to fnd a good combnaton of these two rule nterestngness threshold values. The classfcaton of each record s acheved by applyng the classc fuzzy reasonng method, n whch each record s classfed as the consequent of the rule wth the maxmum product of the compatblty grade and the rule weght. We use the well-known classfcaton problems such as the Irs dataset, and hgh-dmensonal classfcaton problem, the Wne dataset to compare the proposed FCBA approach wth other non-fuzzy and fuzzy classfcaton approaches. The emprcal study shows that the FCBA approach performs well on these datasets on both accuracy and nterpretablty.

2 FUZZY CLASSIFICATION BASED ON FUZZY ASSOCIATION RULE MINING by WEIQING JIN A dssertaton submtted to the Graduate Faculty of North Carolna State Unversty In partal fulfllment of the Requrements for the degree of Doctor of Phlosophy In INDUSTRIAL ENGINEERING Ralegh, NC 2004 APPROVED BY: Dr. Robert E. Young Char of Advsory Commttee Dr. Mchael G. Kay Advsory Commttee Dr. Dens R. Cormer Advsory Commttee Dr. Laure Wllams Advsory Commttee

3 BIOGRAPHY Weqng Jn s a Ph.D student of Department of Industral Engneerng at North Carolna State Unversty. He receved hs dual B.S. degrees n Materal Engneerng and Internatonal Fnance from Shangha Jao Tong Unversty, Shangha, P.R. Chna, n 996. Then he obtaned the M.S. degree n Mechancal Engneerng from the same unversty n 999. In the fall of 2000, he came to North Carolna State Unversty to start hs Ph.D. study n the Department of Industral Engneerng. Hs research nterests nclude data mnng, machne learnng, artfcal ntellgence, pattern classfcaton, fuzzy reasonng. Whle workng on hs Ph.D., he also served as teachng assstant for the database applcatons course, and was a database engneer (CO-OP) n ndustry.

4 ACKNOWLEDGEMENTS I would lke to express my apprecaton to all persons who have supported ths research. I would lke to express my sncere grattude to my advsor, Dr. Robert E. Young, for hs encouragement, thoughtful advce, kndness, and patence throughout my Ph.D. study. Hs gudance was not only a good nspraton n the past, but also wll be a very valuable asset to me n the future. Specal thanks s extended to Dr. Mchael G. Kay, for hs constructve suggestons and valuable advce. I am also grateful to Dr. Dens R. Cormer and Dr. Laure Wllams for ther teachng and servce as my commttee members, and ther thoughtful comments. I also would lke to thank Dr. Mark Walker for beng the Graduate Representatve. I am thankful to Dr. Davd Dckey n the Statstcs department for hs excellent data mnng course that helped to expand my research horzon. I am deeply ndebted to my famly for ther support. Ther love and encouragement made ths study possble.

5 Table of Contents Lst of Tables Lst of Fgures v v. Introducton. Fuzzy Classfcaton Integraton of Assocaton Rule Mnng and Classfcaton Research Scope and Obectve Dssertaton Organzaton Classfcaton 6 2. Defnton of Classfcaton Data Preparaton for Classfcaton Tradtonal Classfcaton Technques Statstcal Classfcaton Methods Decson Tree Method Classfcaton Accuracy Assocaton Rule Mnng 8 3. Defnton of Assocaton Rule Mnng Apror Algorthm Downward Clouse Property Iteratve procedure Hash Tree Implementaton Quanttatve Assocaton Rule Mnng Fuzzy Aggregaton Operators and Fuzzy Rules Fuzzy Sets Fuzzy Aggregaton Operator v

6 4.2. t-norm and t-conorm Operator Compensatory Operators Fuzzy Relatons and Fuzzy Rules Fuzzy Assocaton Rule Mnng Introducton Fuzzy Assocaton Rule Mnng Syntax and Interestngness Measurement Mnng Algorthm Emprcal Study The Auto-Mpg Dataset The Page Blocks Dataset Result Analyss and Dscusson Fuzzy Classfcaton Advantages of Usng Fuzzy Classfer General Fuzzy Classfer Model Rule Weght and Fuzzy Reasonng Method Desgnng Fuzzy Classfer From Fuzzy Assocaton Rule Mnng 8 7. Lterature Revew Classfcaton on Non-Fuzzy Data Classfcaton on Fuzzy Data Problem Formulaton Fuzzy Class Assocaton Rule Mnng Algorthm Fuzzy Classfcaton Based On Assocaton (FCBA) Revew of Adaptve Rule Weght Method FCBA Algorthm v

7 8. Emprcal Study of Fuzzy Classfcaton Based On Assocaton Fuzzy Parttonng The Irs Dataset The Wne Dataset Summary of Results Conclusons and Future Research Conclusons Future Research Lst of References 2 v

8 Lst of Tables 3. An example of a transacton database People table and example of quanttatve assocaton rules Mappng from quanttatve attrbutes to Boolean attrbutes Dscrete nterval method wth overlapped Common t-norms and ther dual t-conorms Common fuzzy mplcaton functons Logcal equvalence of A B and A B Truth table of fuzzy assocaton rules Truth table of conuncton Truth table of mplcaton A record of mpg database Attrbutes of the mpg dataset Frst 0 records of mpg database The fuzzy set value transformed from data n Table The normalzed fuzzy set membershp values Frst ten fuzzy conunctve assocaton rules Accuracy on Irs dataset by non-fuzzy classfcaton methods Accuracy on Irs dataset by Adaptve method based on LV Result of Irs data by FCBA method based on LV Performance comparson between GA and FCBA methods Accuracy on Irs dataset by fuzzy classfcaton methods Accuracy of non-fuzzy classfcaton methods on the Wne dataset CV accuracy on Wne dataset by GBML based method CV accuracy on Wne dataset by FCBA method v

9 Lst of Fgures 2. Classfcaton Process Two-class separaton by lnear dscrmnaton Decson tree example for the concept of playng tenns Smple to complex grow of decson tree Emprcal vs. true accuracy Apror algorthm Hash Tree Illustraton Dagram Dscrete Interval Method Fuzzy set membershp functons cut, support, core and heght Lngustc varable temperature Mn and Max operators of fuzzy sets Fuzzy set defned on the nterval An example of lngustc terms and ts termset defnton Fuzzy assocaton rule mnng algorthm Fuzzy sets of attrbute mpg Fuzzy sets of attrbute acceleraton Fuzzy sets of attrbute dsplacement Fuzzy sets of attrbute weght Fuzzy sets of attrbute horsepower Number of rules vs. mnmum support on un-normalzed fuzzy set membershp values Number of rules vs. mnmum support on normalzed fuzzy set values Fuzzy set membershp values of attrbute acceleraton Another fuzzfcaton scheme for attrbute acceleraton Number of rules vs. mnmum support on dfferent confdence values Number of rules vs. mnmum support (multple lngustc terms) Number of rules vs. mnmum support (sngle most mportant lngustc term). 68 v

10 5.6 Effect of mnmum support on the number of frequent termsets The relatonshp between the mnmum support and the number of rules Fuzzy rule representaton Classfcaton boundary by fuzzy rules and the look-up table method Effect of rule weght Fuzzy class assocaton rule mnng algorthm FCBA algorthm Attrbute wth fve unform fuzzy parttons Effect of mnmum support on accuracy on the Wne dataset for FCBA x

11 Chapter Introducton. Fuzzy Classfcaton Many real world decson-makng problems can be treated as classfcaton [Wess 990]. For nstance, when a fnancal bank plans to promote a new type of credt card they wll send out some marketng mal to ther current customers. The purpose of the mal s to promote the new credt card. However, n order to mnmze the marketng campagn cost the bank wants to target the porton of customers that wll respond to the promoton mal and apply for the credt card. Ths problem can be treated as a classfcaton problem that classfes customers nto two groups. One group wll respond to the promoton mal and the other wll not. It would be helpful to management f we can construct a classfer based on lngustc nterpretable rules. One of the approaches to solve ths classfcaton problem s to formulate a soluton usng a fuzzy rule-based classfer. A fuzzy classfer can extract rules n a lngustc format that s more nterpretable compared to other nonlnear approaches such as neural networks. In fact, fuzzy rule-based classfcaton systems have been wdely appled n the pattern classfcaton area [Kuncheva 2000]. Another characterstc of a fuzzy classfer s ts smooth classfcaton boundary because of the overlap between the fuzzy spaces. Ths helps to solve some classfcaton applcatons where the crsp parttonng boundary cannot be easly drawn. Such examples nclude face and voce recognton, and handwrtng verfcaton. Ths dssertaton studes the followng fuzzy classfcaton problem: gven a set of tranng records represented n fuzzy membershp values and assocated class labels, we need to generate a fuzzy classfer contanng a set of fuzzy lngustc rules to predct the class labels of unseen (future) records.

12 Dfferent approaches for fuzzy rule-based classfcaton systems have been proposed n the past. Fuzzy If-Then rules derved from numercal data were obtaned by heurstcprocedures [Ishbuch 992], [Abe 995], [Nozak 996], neuro-fuzzy technques [Pedrycz 992], [Mrta 997], [Nauck 997], genetcs-algorthm [Ishbuch 995], [Cordon 998], [Gonzalez 999]. Generally speakng, extractng fuzzy classfcaton rules from numercal data contans two phases: frst, parttonng the attrbute s doman based on lngustc term defnton, and second, selectng sgnfcant classfcaton rules. Ths type of approach was often referred to as the smple fuzzy grd method [Ishbuch 992]. However, the fast growth of nformaton technology has resulted n large datasets wth many attrbutes. For hgh dmensonal datasets, the smple fuzzy grd method suffers from the curse of dmensonalty problem. Ths s because the total number of fuzzy rules wll ncrease exponentally wth the ncrease of the number of attrbutes. Ths usually results n a huge set of fuzzy rules to be examned. Thus, t becomes very dffcult to construct a fuzzy classfer from datasets that contan a lot of attrbutes. Compared to non-fuzzy data classfcaton, the exponental combnaton effect s more severe for datasets wth many attrbutes snce multple fuzzy lngustc values are defned on each attrbute s doman..2 Integrate Assocaton Rule Mnng and Classfcaton Assocaton rule mnng ams to fnd certan assocaton relatonshps among a set of attrbutes n a dataset. It fnds all the rules n the dataset that satsfy pre-specfed mnmum support and mnmum confdence values. By ncorporatng fuzzy set modelng n assocaton rule mnng we can extract fuzzy assocaton rules from datasets [Kuok 998]. Fuzzy assocaton rules contanng lngustc terms can express more natural and nterpretable knowledge. By extendng the concept from assocaton rule mnng, fuzzy support and fuzzy confdence can be used to measure the rule nterestngness n fuzzy assocaton rule mnng. Assocaton rule mnng and classfcaton are two mportant topcs n the data mnng area [Chen 996]. They can be vewed wthn a unfyng framework of rule dscovery 2

13 [Agrawal 993b]. The rapdly growng and avalable computng power has facltated the use of these two technques on large datasets. There have been some studes on applyng the concept of assocaton rule mnng n classfcaton, though most of them were targeted to non-fuzzy data [Lent 997], [Lu 998], [L 200]. The man dfference between a fuzzy classfcaton rule and a fuzzy assocaton rule s that a fuzzy classfcaton rule only contans the class label as the rule consequent. Therefore, fuzzy classfcaton rules can be treated as a subset of fuzzy assocaton rules. By applyng the fuzzy assocaton rule mnng technque we can construct a fuzzy classfer from large datasets by choosng hgh qualty rules from among all possble rules. In order to select hgh qualty rules, a rule weght s needed to attach to each rule to ndcate the sgnfcance of each rule. Dfferent knds of rule weghts can be derved by ndvdual or composte use of rule nterestngness measurements such as fuzzy support and fuzzy confdence. Fuzzy support ndcates the compatblty grade between all the data and the rule, whle fuzzy confdence ndcates the accuracy of the fuzzy rule. The goal of fuzzy classfcaton s to construct a fuzzy classfer wth strong classfcaton ablty. Classfcaton accuracy and nterpretablty are two maor crtera to evaluate fuzzy classfers. Accuracy s the ablty to correctly classfy unseen data, whle nterpretablty s the level of understandng and nsght that s provded by the model..3 Research Scope and Obectve Ths dssertaton studes ntegratng fuzzy assocaton rule mnng and fuzzy classfcaton. In fuzzy assocaton rule mnng, the use of a fuzzy aggregaton operator plays an mportant role n the rule mnng process. The selecton of an approprate operator depends on the applcaton context. We wll study the mpact of dfferent operators on fuzzy assocaton rule mnng. Fuzzy assocaton rule mnng allows us to extract rules from datasets wth many attrbutes. Based on the results of fuzzy assocaton rule mnng, we propose a heurstc method to construct a fuzzy classfer from fuzzy assocaton rule mnng. In the lterature, the classfcaton ablty of a fuzzy classfer s manly measured by the classfcaton 3

14 accuracy and the nterpretablty. The accuracy of a fuzzy classfer on a dataset s defned as the percentage of the correct classfed data among the total gven data. The nterpretablty of a fuzzy classfer s measured by the total number of rules n the fuzzy classfer. Our obectve s to desgn a fuzzy classfer wth strong classfcaton ablty. Ths ncludes hgher classfcaton accuracy and fewer number of rules. Rule weght ndcates the sgnfcance of the fuzzy rules n fuzzy classfer desgn. Rule nterestngness measurements such as fuzzy support and fuzzy confdence have been ncorporated n the desgn of fuzzy classfers to help pre-screen canddate fuzzy rules [Ishbuch 2004]. The rule weght should have a good trade-off between generalty and specfcty. A rule s general f t covers many records; a rule s specfc f t covers few records wth hgh accuracy. We propose the composte crtera of usng both fuzzy support and fuzzy confdence as the rule weght n fuzzy classfer desgn. In the process of fuzzy classfer desgn, redundant rules need to be removed from the set of fuzzy assocaton rules. In addton, the te stuaton when multple rules can classfy one record needs to be solved also. We wll study these ssues and propose effectve methods to deal wth them. In order to evaluate the proposed fuzzy classfer, we wll compare our approach wth varous non-fuzzy and fuzzy classfcaton methods on some well-known classfcaton problems. Dfferent evaluaton technques such as 0-fold cross valdaton and leavng-one-out wll be used to estmate our fuzzy classfer s performance..4 Dssertaton Organzaton Ths dssertaton contans nne chapters. Chapter ntroduces the problems of fuzzy classfcaton from assocaton rule mnng. It also gves the research obectve and the dssertaton organzaton. Chapter 2 ntroduces the problems of classfcaton, the tradtonal classfcaton methods and the performance evaluaton technques. Chapter 3 ntroduces the assocaton rule mnng problem and the Apror algorthm. The lmtaton of mnng assocaton rules from quanttatve data s also presented. Chapter 4 ntroduces fuzzy aggregaton operators and fuzzy rules that wll be used n the framework of fuzzy 4

15 assocaton rule mnng and fuzzy classfcaton. Then n chapter 5, the fuzzy assocaton rule mnng framework s presented, ncludng the semantcs, syntax, nterestngness measurement and mnng algorthm. The results and dscussons of an emprcal study are also provded. Chapter 6 ntroduces a general fuzzy classfcaton model and dscusses the mportance of the rule weght and fuzzy reasonng method n desgnng a fuzzy classfer. Chapter 7 proposes the framework of desgnng a fuzzy classfer from fuzzy class assocaton rules a subset of fuzzy assocaton rules. The problem of desgnng a fuzzy classfer from fuzzy class assocaton rule mnng s formulated. The fuzzy class assocaton rule mnng algorthm s presented. The heurstc approach fuzzy classfcaton based on assocaton (FCBA) s also proposed. Chapter 8 presents the emprcal study to compare FCBA approach wth other fuzzy or non-fuzzy classfcaton approaches from the lterature. Chapter 9 draws conclusons and proposes future research drectons. 5

16 Chapter 2 Classfcaton Ths chapter provdes a bref ntroducton to classfcaton. The defnton of the classfcaton problem s gven n Secton 2.. The data preparaton procedure s explaned n Secton 2.2. Tradtonal classfcaton methods are ntroduced n Secton 2.3, whch nclude statstcal classfcaton methods (Secton 2.3.) and the decson tree method (Secton 2.3.2). Estmaton technques of classfcaton accuracy are ntroduced n Secton Defnton of Classfcaton Many real world decson-makng problems fall nto the general category of classfcaton [Wess 990], [Mchalsk 998]. The classfcaton approach based on rules provdes a modularzed, clearly explaned format for a decson, whch s compatble wth a human beng s reasonng procedure. In the machne learnng lterature, classfcaton usually means establshng rules by whch we can classfy new data nto the exstng classes that are known n advance [Mche 994]. Each data s assumed to belong to a predetermned class, and ths class s called a class label. Classfcaton s also known as supervsed learnng, n the sense that the classfcaton rules are establshed from gven data whose class labels have been known. Ths contrasts wth unsupervsed learnng, such as clusterng, n whch the class label of each gven data, and the number of classes to be learned may not be known n advance. We present the defnton of a non-fuzzy data classfer as follows. Notce the data used n the non-fuzzy classfer are all crsp values. 6

17 Defnton 2. Non-fuzzy Classfer Let us assume we have n records (patterns) and C { C }, (,, q ) as the set of class labels. m attrbutes (features). Assume Let T T } (,, n ) denote the set of records. A non-fuzzy data classfer s any { mappng T C That s, each record T s mapped to one class label n C. Dependng on the value of the class label, classfcaton can be dvded nto dfferent categores. In the frst type of classfcaton, the class label s value s dscrete or nomnal, such as Yes/No, Hgh/Medum/Low, etc. In the second type of classfcaton, the class label s value s contnuous or ordered. Dfferent classfcaton methods can be appled to these two types of problems. For nstance, a decson tree s the common technque to be used n the frst type of classfcaton problem, whle regresson s used for the second type. In ths dssertaton, we only deal wth the frst type of classfcaton problem, n whch the class label only has dscrete or nomnal values. The classfcaton process usually nvolves two stages and t s llustrated n Fgure 2.. In the frst stage, a classfcaton model s bult to descrbe a gven set of data wth known class labels [Han 200]. Notce n the context of classfcaton, data s also referred to as a pattern, nstance, record, etc. The data s often dvded nto tranng data and testng data. The data used to buld the classfcaton model are called tranng data. They are randomly selected from the sample populaton. The knowledge output of the model s usually represented n the form of classfcaton rules, or mathematcal formula. In the second stage of the classfcaton process, the classfcaton ablty of the model s evaluated. One of the most common crtera to evaluate the performance s to estmate the classfcaton accuracy on the testng data. The accuracy s the percentage of the testng data that are correctly classfed by the model. If the model has satsfactory classfcaton ablty, t wll then be appled to classfy future (unseen) data. 7

18 Gven Tranng Data Classfcaton Model Data Testng Data Knowledge Output Satsfactory Classfcaton Ablty? Y Future Data Fgure 2.. Classfcaton process. 2.2 Data Preparaton for Classfcaton In order to mprove the classfcaton ablty and the effcency of the classfcaton process, some data preprocessng procedures are requred. Ths ncludes data cleanng, feature selecton and data transformaton. We wll brefly ntroduce them here. Data Cleanng Data cleanng generally refers to removng nose and dentfyng anomales of the data. How to deal wth mssng values of the data s also nvolved n data cleanng. The purpose of the data cleanng s to mprove the data qualty and help reduce confuson n the classfcaton model. Feature Selecton Some of the attrbutes of the dataset mght be rrelevant to the classfcaton task. Includng such attrbutes may ncrease the unnecessary computatonal burden and 8

19 mslead the classfcaton effort. It thus helps to dentfy and remove those redundant attrbutes n the classfcaton process. One common approach n feature selecton s to use some form of statstcal test or dstance metrc to stepwse decde whether an attrbute s sgnfcant or not. In addton, some stoppng condton s needed to determne when to stop the feature selecton [Wess 99]. Data Transformaton Normalzaton s one of the most common technques for data transformaton. It scales all attrbute values for a gven attrbute so that they fall wthn a small specfed range, such as from.0 to.0. In dstance metrc based classfcaton methods, normalzaton would prevent attrbutes wth a large range from outweghng other attrbutes wth a smaller range. Besdes, concept herarches may be ncorporated nto the classfcaton process to compress the orgnal data. Such compresson requres the generalzaton of the data based on a predetermned concept herarchy. 2.3 Tradtonal Classfcaton Technques We wll brefly revew two maor categores of tradtonal classfcaton technques here. The frst category s statstcal classfcaton methods that ncludng Bayesan classfers, lnear dscrmnant and the k-nearest neghbor method. The second category s the decson tree method Statstcal Classfcaton Methods The classfcaton problem has been wdely studed n pattern recognton and statstcs. Many methods were developed n these two areas. 9

20 Bayesan Classfers Bayesan classfers are the applcaton of Bayesan analyss to classfcaton problems [Russell 995], [Berthold 999], [Madgan 2003]. Assume a record s denoted as t, and all records are assgned to q known class labels. Thus, we have a class label set C { C } (,, q ). The basc prncple of a Bayesan classfer s that t wll classfy the record as the class C wth the greatest posteror probablty for ths record. The posteror probablty of a record t toward class label C s gven as P( C t) P( Ck t) for all (,, q, k,, q, k ) (2.) From Bayes rules, we know that the posteror probablty of a record t toward C s gven as P( t C ) P( C ) P( C t) (2.2) P( t) where P t C ) s the condtonal probablty of a gven record for a specfc class C. ( Smple mathematcal manpulaton of the Bayes rule shows that an alternatve formulaton of classfyng a record t s to choose the class C to satsfy followng P t C ) P( C ) P( t C ) P( C ) for all k (2.3) ( k k Whle we can use the proporton of each class n the gven data to represent the pror probablty of class C, t s farly dffcult to estmate the true populaton values of the condtonal probablty of a gven record t for a specfc class C. A lot of statstcal classfcaton methods can be vewed as approxmatons to the Bayes rule wth varyng assumptons made to estmate the condtonal probabltes. The assumpton may be a characterstc dstrbuton of the populaton, or specfc format for the decson soluton tself [Wess 99]. 0

21 Lnear Dscrmnant Lnear dscrmnant s one of the most common form of classfer, and t has a qute smple structure [Haste 200]. Assume the dataset has m attrbutes, then the lnear dscrmnant s to use a lnear combnaton of the attrbutes to separate or dscrmnate among the classes and assgn the class label for a new record. For a dataset wth m attrbutes, ths means geometrcally that the separatng surface between the records wll be a m hyperplane. Fgure 2.2 represents an dealzed plane that separates two classes C and C n 3-dmensonal space. 2 AT 3 C C 2 AT AT 2 Fgure 2.2: Two-class separaton by lnear dscrmnaton The maor advantage of lnear dscrmant s ts smple classfcaton structure. The general form of any lnear dscrmnant s gven as follows: w AT w2 AT2 wm ATm w0 (2.4) where AT (,, m) are the m attrbutes of the dataset, w ( 0,, m) are the constant parameters to be estmated. Lnear dscrmnant tends to perform well n practce, though t s true that dfferent classes cannot always be separated by a smple lnear combnaton of attrbutes.

22 Moreover, more than one plane or lne can be used to separate two classes. The maor ssues of applyng lnear dscrmants s to decde the constant parameters for the lnear dscrmnant. The most common approach s to decde those parameters under certan assumpton about the data dstrbuton, such as a normal or Gaussan dstrbuton. k-nearest Neghbor Method k-nearest neghbor method s to frst fnd out the k-nearest neghbor of a new record, and then assgn the data to the class label that appears most frequently among the k neghbors [Mtchell 997]. k s generally an odd number so that te stuaton won t happen. Ths method needs to calculate the dstance between a new record and every exstng record. Dstance metrcs such as absolute dstance, Eucldean dstance, and varous normalzed dstances are used n calculatng the dstance between a new record and an old record. Generally, the dstance s compared attrbute by attrbute and then added together. For absolute dstance, the dfference between the values of each attrbute s added together. For Eucldean dstance, the dfference between the values of each attrbute s squared and added together for all attrbutes. The square root of the sum becomes the Eucldean dstance. In some cases dfferent attrbutes may be scaled dfferently, such as n dfferent unts or n dfferent conventons; therefore, t s more approprate to normalze the dstance metrc. For example, we can measure the dstance n terms of standard devatons from the mean of each attrbute. The maor computatonal effort of the k-nearest neghbor method les n the classfcaton stage. The new record must be compared wth every exstng record n the dataset. Ths ncreases the computatonal effort, especally for huge datasets. On the other hand, k-nearest neghbor method does not need an underlnng assumpton on the data dstrbuton; therefore, t s a non-parametrc method. The above two characterstcs dfferentate ths method from parametrc methods such as the lnear dscrmnant method. 2

23 2.3.2 Decson Tree Method The decson tree method s one of the most wdely used classfcaton methods [Mtchell 997]. Decson tree classfes data based on ts top-down tree structure. Startng from the root node, each nternal node n the tree specfes a test on a certan attrbute of the dataset. Each branch from that node corresponds to one of the possble values of ths attrbute. A record s classfed by startng beng tested by the root node attrbute, and movng down the tree branch correspondng to the value of the attrbute n the gven record. A well-know decson tree example for the concept Play Tenns s gven n Fgure 2.3. Based on the condtons such as weather outlook, humdty and wnd, ths example uses a decson tree to classfy Saturday mornngs accordng to whether or not they are sutable for playng tenns. Outlook Sunny Overcast Ran Humdty Yes Wnd Hgh Normal Strong Weak No Yes No Yes Fgure 2.3. Decson tree example for the concept of playng tenns Snce we wll compare our fuzzy classfcaton approach wth the decson tree method n the emprcal study secton, we brefly explan the decson tree algorthm here. The decson tree method uses a statstcal property, nformaton gan, to measure how well a gven attrbute separates the tranng dataset accordng to the class label. Most decson tree algorthms use ths nformaton gan to select among the canddate attrbutes at each step whle growng the tree. The nformaton gan s based on the entropy concept that s commonly used n nformaton theory. 3

24 Assume we have a gven data set T, and the class label set C C },(,, q ). The { entropy of T s defned as: where Entropy(T) p log 2 p q p s the proporton of T belongng to class. The nformaton gan of each canddate attrbute s actually the expected reducton n entropy resultng from parttonng the records accordng to ths attrbute. Startng from the root node wth all the records, the decson tree algorthm selects the attrbute wth the largest nformaton gan. The process of selectng a new attrbute and parttonng the records s repeated for each node down the tree. The algorthm stops at a certan leaf node only when all the attrbutes have been ncluded n the path down to the certan leaf node, or the records assocated wth that leaf node all belong to the same class. C Some decson tree methods such as ID3 and C4.5 [Qunlan 993] performs a smple to complex search through ts search space. It starts from an empty tree, then consderng trees wth more attrbutes guded by nformaton gan heurstc. Ths process s llustrated n Fgure 2.4. C C V V2 C C Ck C C Ck V2 V2 C C V3 Ck C C V4 Ck Fgure 2.4. Smple to complex grow of decson tree 4

25 The search space of the decson tree method s the set of all possble decson trees. At each node, only the sngle best attrbute s chosen to partton the records though other attrbutes may also be consstent wth the records. Once t selects an attrbute to test at a partcular level of the tree, t does not backtrack to reconsder the alternatve attrbutes at a hgher level. Therefore, t loses the capablty to examne all possble decson trees that are consstent wth the records n the search space Classfcaton Accuracy The purpose of classfcaton s to buld a classfer from gven data and predct correctly on future data. The most commonly used performance measurement s the classfcaton accuracy. For a gven fnte number of records, the emprcal accuracy s defned as the rato of the correct classfcaton to the number of gven records. number of Emprcal accuracy= number correct classfcaton of gven records If the number of gven records approaches nfnty, then the emprcal accuracy becomes statstcally the true accuracy of the classfer under the actual populaton dstrbuton. However, there s only lmted gven data n a real stuaton. Consequently, a technque to estmate the true accuracy from emprcal accuracy becomes very mportant. They are at least as mportant as studes on the classfcaton method tself. Fgure 2.5 llustrates the relatonshp between emprcal accuracy and true accuracy. Gven Data Classfcaton Model New Data Emprcal Accuracy True Accuracy Fgure 2.5. Emprcal vs. true accuracy. 5

26 In order to honestly estmate the true classfcaton accuracy, the gven data should be a random sample. Ths means that the gven data should not be pre-selected or sfted by any specfc human-nvolved crtera. Wthout a random sample, the emprcal accuracy based on gve data wll not be a good estmate of the true accuracy. Assumng the gven data s random, we can thus obtan emprcal accuracy. Snce the gven data s always lmted compared to the whole populaton, the natural queston arses, how many data do we need n order to be confdent that the emprcal accuracy s a good estmate of true accuracy? There has been theoretcal study on ths specfc problem and t s called probably approxmate correct (PAC) analyss [Valant 985], [Kearns 994]. Ths study gves theoretcal bounds on applyng emprcal accuracy on future data; however, t ndcates huge number of data s needed for a guarantee of performance. Most tme, a large populaton of the data s often naccessble and the PAC approach can be modfed to some practcal methods to effectvely estmate the true accuracy. In ths secton, we wll brefly revew some common accuracy evaluaton technques such as the holdout, the k-fold cross-valdaton and the leavng-one-out method. In the emprcal study secton of ths dssertaton, we wll manly use the k-fold cross-valdaton and the leavng-one-out technques to estmate the true accuracy of the fuzzy classfer. The Holdout Method In the holdout method, the gven data are dvded nto a tranng data set and a testng data set. Thus, ths method s also known as tran-and-test method. For nstance, two thrds of the data are used for tranng and to derve the classfcaton model whle the rest of the data are used for testng the classfer s accuracy. If the holdout method s repeated k tmes usng dfferent partton schemes, t s also referred to as random subsamplng, and the accuracy estmate s taken as the average of the accuraces from each teraton. If the sze of the gven data s small, t s crucal to randomly dvde the data nto a tranng and a testng set, and ths s often mplemented by computer. The accuracy estmate by the holdout method s a bt pessmstc, because not all the data are used n desgnng the classfer. Besdes, a sngle tran-and-test partton may cause bas to the estmate. 6

27 k-fold Cross Valdaton k-fold cross valdaton belongs to the famly of resamplng methods. For k-fold cross valdaton, the cases are randomly dvded nto k mutually exclusve test parttons of approxmately equal sze. When we pck one test partton, the rest of the parttons are used for tranng of the classfer. The average accuracy rate over all k parttons s the cross valdaton accuracy rate. Ths method was extensvely tested wth varyng numbers of parttons and 0-fold cross valdaton seemed to be most approprate [Wess 990]. k-fold cross-valdaton can be extended to stratfed cross-valdaton, n whch each partton s stratfed so that the class dstrbuton of the data n each partton s approxmately the same as that n the orgnal data. Leavng-one-out Method Leavng-one-out method can be regarded as a specal case of k-fold cross valdaton, where k s set to be the sze of the dataset. In each teraton, ths method uses one record as the testng data, and the rest wll all be used for tranng of the classfer. The average accuracy rate over all records s used as the estmate of the accuracy. The maor advantage of the leavng-one-out technque s that t ntroduces less bas compared to other estmaton technques. However, the computatonal cost for applyng leavng-oneout s hgh, especally for a large dataset. 7

28 Chapter 3 Assocaton Rule Mnng Assocaton rule mnng s an actve area n data mnng. Ths chapter provdes a bref ntroducton to the assocaton rule mnng problem. We provde the defnton of assocaton rule mnng and ts classfcaton scheme n Secton 3.. One of the most mportant assocaton rule mnng algorthms, the Apror, s explaned n Secton 3.2. Secton 3.3 presents the quanttatve assocaton rule mnng problem and ts lmtatons. 3. Defnton of Assocaton Rule Mnng Assocaton rule mnng s a maor research area of data mnng. Its obectve s to fnd certan assocaton relatonshps among a set of data tems n a database. The assocaton relatonshps are descrbed n assocaton rules. Assocaton rule mnng was orgnally motvated by market basket analyss whch studes the buyng habts of customers [Agrawal, 993a]. It provdes the answer to the followng queston: Whch group of tems are usually assocated so that customers are lkely to purchase them together when they shop at the supermarket? The analyss can be performed on the retal data or customer transactons at the store. The result can be used for store layout, cross-marketng advertsement and drect malng applcatons. Assume customers who buy a computer also tend to buy fnancal management software. For one layout strategy, those two tems can be placed n close proxmty n order to further promote the sale of such tems together. In an alternatve strategy, placng hardware and software at opposte ends of the store may entce customers who purchase such tems to pck up other tems along the way [Han, 200]. Now, assocaton rule mnng has been extended to other areas lke geographcal databases, network securty, medcal dagnoss, etc. 8

29 The formal defnton of an assocaton rule s gven n [Agrawal,994]: Let I {, 2,, m} be a set of lterals, called tems. Let D be the set of all transactons, where each transacton t s a set of tems such that t I. Each transacton s assocated wth a unque dentfer, called TID. Let A be a set of tems n I. A transacton t s sad to contan A f and only f A t. An assocaton rule s of the form A B, where A I, B I, and A B. Wthn the rule A B, A s called the antecedent whle B s called the consequent of the rule. Support and confdence are the two key measures of rule nterestngness n assocaton rule mnng. A set of tems X n I s referred to as an temset. X s called the k-temset f t contans k tems; t X s the set of transactons that contan the temset X ; D s the total number of transactons; Support Sup s the percentage of transactons n D that contans both A and B. t A tb Sup ( A B) (3.) D t t A B s the number of transactons that contan both A and B. Confdence Conf s the percentage of transactons n D contanng A that also contan B. Conf Sup( A B) t A tb (3.2) Sup( A) t A From these defntons, we can nfer that the values of support and confdence range from 0 to. Gven an example n Table 3., let s examne the support and confdence of a potental assocaton rule: Bread Butter. Sup( t 2 Sup = Bread tbutter ) 40% D 5 Sup( tbread t Conf Sup( t ) Bread Butter ) 2 50% 4 Therefore, the rule Bread Butter has support of 40% and confdence of 50%. 9

30 Table 3.. An example of a transacton database. TID Items Bread, Salsa 2 Beer, Bread, Eggs, Jam 3 Bread, Butter 4 Bread, Butter, Lettuce, Salsa 5 Beer, Butter Gven user specfed mnmum support and mnmum confdence thresholds, the assocaton rule mnng s to fnd the rules wth support and confdence larger than the respectve thresholds. Generally, assocaton rule mnng can be descrbed as a two-step process: Frst, fnd all temsets whose support s above the predetermned mnmum support. These temsets are called the frequent temsets or the large temsets. Second, generate nterestng assocaton rules from the frequent temsets. Market basket analyss s ust one form of assocaton rule mnng. There are dfferent knds of assocaton rules besdes ths. Based on the types of values handled n the rule, t can be classfed as ether a Boolean assocaton rule or a quanttatve assocaton rule. Boolean assocaton rules only concern the assocatons between the presence or absence of tems. Market basket analyss belongs to Boolean assocaton rule mnng. If ether the antecedent or consequence of the rule contans quanttatve attrbutes, then t s a quanttatve assocaton rule. For nstance, age(x, 30 39) AND ncome(x, 40K 50K) Number of Cars(X, 3) s a quanttatve assocaton rule. If the tems or attrbutes n an assocaton rule nvolve only one dmenson, t s a sngle dmenson rule, whle a mult-dmensonal assocaton rule nvolves two or more dmensons. The above quanttatve assocaton rule s a mult-dmensonal assocaton rule. Based on the levels of abstracton nvolved n the rule set, t can be classfed as a sngle-level rule or a mult-level assocaton rule. If the rule can fnd assocatons at 20

31 dfferent levels of abstracton, then t s a mult-level assocaton rule. For nstance, we may have Apple and Banana as tems n the transactons, and Frut as a category name for those tems at a hgher abstracton level. Hence, the rule Frut Juce s a mult-level assocaton rule. 3.2 Apror Algorthm As mentoned prevously, the process of assocaton rule mnng s dvded nto two phases. In the frst phase, canddate temsets are generated and counted by scannng the transacton data. If the number of temsets appearng n the transactons s larger than the mnmum support, the temset s consdered a frequent temset. Itemsets contanng only one sngle tem are processed frst. Large temsets contanng only sngle tems are then combned to form canddate temsets contanng two tems. Ths process s repeated untl all large temsets have been found. In the second phase, all possble assocaton combnatons for each large temset are formed, and those wth a calculated confdence value larger than the mnmum confdence are output as assocaton rules. In ths secton, we wll ntroduce the Apror algorthm, the mportant algorthm used n bnary assocaton rule mnng to fnd the frequent temsets Downward Closure Property The Apror algorthm s based on a very mportant property: the downward closure property [Brn 997]. The downward closure property states that f an temset s frequent (whose support value s above the mnmum support) then all subsets of the temset must be frequent also. Assume an temset X and an operaton on an temset s OP (X ). If X,, X m are m subsets of the temset X, the downward closure property holds as follows: OP( X ) mn( OP( X ),, OP( X m )) (3.3) Notce the property holds for any conunctve operaton lke mnmum or multplcaton. 2

32 3.2.2 Iteratve Procedure Apror s an teratve, level-wse algorthm. It can be dvded nto two maor steps: on and prune. In the on step, t frst generates frequent temsets wth only one tem (called frequent -temsets), then frequent 2-temsets, and so on. In the prune step, the downward closure property can be appled to prune the canddate temsets whose support value s below the mnmum support. The followng notaton wll be used hereafter: k-temset An temset havng k tems L k k Set of frequent k-temset (wth mnmum support) CAND Set of canddate k-temset ( CAND s a superset of L ) k k Jon Step In ths step, k-temset are used to form (k+)-temset. For example, after one full scan of the dataset, the algorthm frst fnds the set of frequent -temset L. By onng L wth tself, L 2 can be found and so forth. th th Denote as the temset n L ; l [ ] as the tem n l. The two temset l and l l k s onable only f ther frst (k-2) tems are n common, that s: (l [ ] l [] ) (l [ 2] l [2]) (l [ k 2] l [ k 2] ) (l [ k ] l [ k ] ) where represents a logcal AND. Ths ensures that no duplcate temset L s k generated. The resultng canddate temset wll be l [ ] l [2] l [ k ] l [ k ]. Prune Step By onng L to tself, we can get canddate set CAND. To fnd the frequency of k temsets n CAND wll nvolve heavy computaton because CAND can become huge. k By usng the closure property, f any ( ) subset of an temset CAND s not n, then that canddate temset cannot be frequent ether and therefore can be removed from CAND k. Ths subset prunng can be mplemented by mantanng a hash tree of all frequent temsets. The Apror algorthm termnates when there are no frequent k k k L k k 22

33 k temsets [Han, 200]. Fgure 3. shows the pseudo-code for the Apror algorthm and ts related procedures. L ={frequent -temsets}; for (k=2; L k ; k++) do begn CAND k _ k apror gen( L,mn_ sup); //generate canddate temsets CANDk for all transacton t D do begn //scan D for counts end L k CANDt subset( CANDk, t) ; //get the subsets of t that are canddates for all canddates { CAND CAND CAND.count ; k CAND CANDt CAND. count mn_ sup} end return L k Lk procedure apror_gen( L k,mn_ sup) for each temset l L k for each temset l L f end end return CAND k do begn do begn ( l [] l []) ( l [ k 2] l [ k 2]) ( l [ k ] l [ k ]) CAND = l k l //on step, generate canddates then f nfrequent_subsets( CAND, L k ) then delete CAND ; //use closure property to prune else add CAND to CAND ; k Fgure 3.. Apror algorthm [Han 200]. 23

34 3.2.3 Hash Tree Implementaton The Apror algorthm s mplemented n the form of a hash tree n [Agrawal,994]. The hash tree contans two types of nodes: leaf nodes and nteror nodes. The temsets are stored n the leaf node. In an nteror node, each bucket of the hash table ponts to another node. The root of the hash tree s defned to be at depth, an nteror node at depth d ponts to nodes at depth d+. When a new temset s nserted nto the hash tree, the algorthm traverses from the root level to the leaf level. At depth d, a hash functon s appled to the dth tem of the temset. After the number of temsets stored n the leaf node reaches to the specfed lmt, the leaf node s converted to an nteror node. If t s at a leaf node, the frequency of the temsets n all the transactons s counted and recorded. If t s at an nteror node, and tem s hashed, then every tem that comes after wll be hashed. Ths procedure wll be recursvely appled to the node n the correspondng bucket. Suppose there s a smple transacton database as shown n Fgure 3.2. From the fgure t has four transactons and each transacton contans a seres of tems among A,B,C,D,E. By scannng the database, we form the -tem canddate set ({A},{B},{C},{D},{E}) and store them nto a -tem hash tree based on the gven hash functon. Assume the maxmum temsets that one leaf node can hold s two, then obvously one leaf node cannot hold all the -tem canddate sets. Further, assume each nternal node has three hash buckets. Accordng to the hash functon, f the d th element of the transacton s A or B, the temset goes to the left branch; f the d th element of the transacton s C or D, then the temset goes to the mddle branch; smlarly, E goes to the rght branch. The value n the parenthess s the frequency count of each tem. Each temset stored n the hash tree s compared wth the transacton to be scanned. If the transacton contans the temset, then the count value s ncreased by. Ths repeats for each transacton (record) n the dataset. In Fgure 3.2, we assume mnmum support value s two. By applyng the closure property, we can remove tem {E} from the -tem canddate set because ts frequency count s one. Thus, we get the -tem frequent set ({A},{B},{C},{D}). By 24

35 onng the -tem frequent set, we can get the 2-tem canddate set: ({A,B},{A,C},{A,D},{B,C},{B,D},{C,D}). By examnng the 2-temset hash tree and frequency count value of each temset, we know only {A,B},{B,C},{B,D} remans n the 2-tem frequent set. Transacton TID Transacton A,B 2 A,B,C 3 A,B,D,E 4 B,C,D Assume mnmum support s 2 Hash functon If dth element=a, B then left branch If dth element=c, D then mddle branch If dth element=e then rght branch -temset hash tree Each leaf node can maxmum hold 2 temset Each nternal node has 3 hash buckets A(3) C(2) -tem frequent set:{a},{b},{c},{d} E() B(4) D(2) 2-temset hash tree 2-tems canddate set:{a,b},{a,c},{a,d} {B,C},{B,D},{C,D} AB(3) AC() BC(2) CD() AD() BD(2) Fgure 3.2. Hash tree llustraton dagram. 25

36 3.3 Quanttatve Assocaton Rule Mnng There has been much research n Boolean assocaton rule mnng. For nstance, besdes Apror, AprorTID and AprorHybrd algorthms were also proposed for mnng Boolean assocaton rules [Agrawal 994]. However, real-world transacton data usually conssts of quanttatve data. How to extract useful rules from quanttatve data presents a challenge n ths research feld. The problem of mnng quanttatve assocaton rules was ntroduced by Srkant and Agrawal [Srkant 996]. They proposed mnng quanttatve assocaton rules by parttonng the attrbute doman and combnng the adacent parttons to transform the problem nto a bnary assocaton rule mnng problem. Several other approaches based on the dea of mnng assocaton rules from nterval clusters were also proposed [Mller 997], [Zhang 997]. Ths category method s often referred to as the dscrete nterval method n the lterature. The dscrete nterval method dvdes the attrbute doman nto dscrete ntervals and measures ther mportance based on the frequency of the nterval tems. For llustraton, a database table wth two quanttatve attrbutes and one categorcal attrbute s shown n Table 3.2. It shows the relatonshp between people s age, marrage status and number of cars they buy. Table 3.2. People table and example of quanttatve assocaton rules. RecordID Age Marred NumCars No Yes No Yes Yes 2 Rules(Sample) Support Confdence <Age : > and <Marred: Yes> <NumCars: 2> <NumCars:0 > <Marred: No> 40% 40% 00% 66.6% 26

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

A Combined Approach for Mining Fuzzy Frequent Itemset

A Combined Approach for Mining Fuzzy Frequent Itemset A Combned Approach for Mnng Fuzzy Frequent Itemset R. Prabamaneswar Department of Computer Scence Govndammal Adtanar College for Women Truchendur 628 215 ABSTRACT Frequent Itemset Mnng s an mportant approach

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework Fuzzy Weghted Assocaton Rule Mnng wth Weghted Support and Confdence Framework M. Sulaman Khan, Maybn Muyeba, Frans Coenen 2 Lverpool Hope Unversty, School of Computng, Lverpool, UK 2 The Unversty of Lverpool,

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio, Parallel and Dstrbuted Assocaton Rule Mnng - Dr. Guseppe D Fatta fatta@nf.un-konstanz.de San Vglo, 18-09-2004 1 Overvew Assocaton Rule Mnng (ARM) Apror algorthm Hgh Performance Parallel and Dstrbuted Computng

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks Seventh Internatonal Conference on Intellgent Systems Desgn and Applcatons GA-Based Learnng Algorthms to Identfy Fuzzy Rules for Fuzzy Neural Networks K Almejall, K Dahal, Member IEEE, and A Hossan, Member

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

(1) The control processes are too complex to analyze by conventional quantitative techniques.

(1) The control processes are too complex to analyze by conventional quantitative techniques. Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information