Abstract. 1. Introduction

Size: px
Start display at page:

Download "Abstract. 1. Introduction"

Transcription

1 One-Class Tranng for Masquerade Detecton Ke Wang Salvatore J. Stolfo Computer Scence Department, Columba Unversty 500 West 20 th Street, New York, NY, 0027 {kewang, Abstract We extend pror research on masquerade detecton usng UNIX commands ssued by users as the audt source. Prevous studes usng mult-class tranng requres gatherng data from multple users to tran specfc profles of self and non-self for each user. Oneclass tranng uses data representatve of only one user. We apply one-class Naïve Bayes usng both the multvarate Bernoull model and the Multnomal model, and the one-class SVM algorthm. The result shows that oneclass tranng for ths task works as well as mult-class tranng, wth the great practcal advantages of collectng much less data and more effcent tranng. One-class SVM usng bnary features performs best among the oneclass tranng algorthms.. Introducton The Masquerade attack may be one of the most serous securty problems. It commonly appears as spoofng, where an ntruder mpersonates another person and uses that person s dentty, for example, by stealng ther passwords or forgng ther emal address. Masqueraders can be nsders or outsders. As an outsder, the masquerader may try to gan superuser access from a remote locaton and can cause consderable damage or theft. A smpler nsder attack can be executed aganst an unattended machne wthn a trusted doman. From the system s pont of vew, all of the operatons executed by an nsder masquerader may be techncally legal and hence not detected by exstng access control or authentcaton schemes. To catch such a masquerader, the only useful evdence s the operatons he executes,.e., hs behavor. Thus, we can compare one user s recent behavor aganst ther profle of typcal behavor and recognze a securty breach f the user s recent behavor departs suffcently from hs profled behavor, ndcatng a possble masquerader. The nsder problem n computer securty s shftng the attenton of the research and commercal communty from ntruson detecton at the permeter of network systems. Research and development s gong on n the area of modelng user behavors n order to detect anomalous msbehavors of mportance to securty; for example, the behavor of user-ssued OS commands as represented n ths paper, and n emal communcatons [7]. Consderable work s ongong n certan communtes to detect not only mpersonaton, but also author dentfcaton. For example, Sedelow [6] and Vel [8] are two examples bracketng the length of tme ths topc has exsted n the lterature. The masquerade problem s a challengng problem. If the masquerader can mmc the user s behavor successfully, he won t be detected. In addton, f the user hmself s behavng much dfferently than hs traned profle, the detector wll msclassfy hm as masquerader, whch may cause annoyng false alarms. There have been several attempts to solve ths problem usng command lne sequences, [4] and [9]. The best results so far reported are 60-70% accuracy wth a false postve rate as low as - 2%. The profles were computed usng supervsed machne learnng algorthms that classfy tranng data acqured from multple user. These approaches consdered tranng user profles as a mult-class supervsed learnng task where data gathered on a user s treated as an example of one-class,.e. a dstnct user. In ths paper, we consder a dfferent approach wth substantal practcal advantage. We examne the task of proflng a user by modelng hs data exclusvely, wthout usng examples from other users, and achevng good detecton performance and mnmal false postve rates. We also consder alternatve machne learnng algorthms that may be employed for ths one-class tranng approach. One-class tranng means that we only use the user s own legtmate examples of commands they ssue to buld the user s self profle. Prevous work uses both postve and negatve examples to buld both self and non-self profles, except for Maxon [9], who consders the problem of determnng how vulnerable a user s behavor may be to mmcry attack. Here we extend ths technque usng one-class SVM. Ths s mportant n many contexts, especally when the only nformaton avalable s the hstory of the user s actvtes. If a one-class tranng algorthm can acheve smlar performance to that exhbted by a mult-class approach, we may provde a sgnfcant beneft n real securty applcatons; much less data s requred, and tranng can proceed ndependently of any other user. The study reported n ths paper ndcates that ndeed one-class tranng algorthms perform equally well as two class tranng approaches.

2 Ths self profle dea s smlar to the wdely used anomaly detecton technques n ntruson detecton system [eg. 2, 3]. For example, the anomaly detector of IDES [8] uses establshed normal usage profles, whch s the expected behavor, to dentfy any large usage devaton as a possble attack. Several methods have been used to model the normal data, for example, decson trees [7], neural network [4], and sparse Markov Transducers [2], and Markov chans [9]. In ths paper, we appled one-class Naïve Bayes and one-class SVM algorthms to the masquerade dataset of UNIX system call sequences. In prevous work, we beleve there were several methodologcal flaws n the manner n whch data was acqured and used. The Schonlau dataset from [4] presents each user s command lne data wth a varyng number of artfcally created masquerade command blocks, rangng from 0 to 24, out of a total of 00 command blocks to be classfed. The prevous work only consdered the average performance of a gven method when t s appled to all of the 50*00 blocks of commands ssued by the 50 users. However, snce the masquerade blocks are randomly nserted nto each user s data by usng some other user s command block, each user s data has a dfferent number of masquerade blocks, and the content of these masquerade blocks all dffer. Ths data s not a good baselne to compare the effectveness of alternatve detecton methods because one method mght be better at detectng certan forms of masquerade attack whle others are not. Unfortunately, snce the dstrbuton of such masquerade blocks appear many tmes n the dataset, some algorthms appear to have better performance over others, whle, n practce or n other contexts, ths fndng may not be true. To better compare the alternatve methods proposed n ths work, we follow the exhaustve v49 evaluaton methodology from [9], whch wll be descrbed n detal n the secton about the expermental methodology and results. The ROC score [5] s used to compare several one-class tranng methods under dfferent false postve rate restrctons. Ths approach provdes a better vew to ndcate whch algorthm s better for most users. Even though one machne learnng algorthm may perform better than another n terms of detecton and false postve rate, our most mportant fndng s that one-class tranng can ndeed perform as well as mult-class tranng. Even so, the overall performance of both approaches ndcates that there s much room for mprovement. The rest of the paper s organzed as follows. Secton 2 descrbes the dataset we used n our research and provdes a revew of recent work, and Secton 3 dscusses the oneclass Naïve Bayes and one-class SVM methods. Secton 4 descrbes the expermental settng and our results, and Secton 5 concludes the paper wth our analyss and fndngs. 2. Recent Work Schonlau et al. [4] and Maxon [9] provde a dataset collected from keyboard commands on a UNIX platform. The dataset s publcly avalable at Unfortunately, there are nether tmestamps nor other detaled command lne arguments avalable n the Schonlau dataset. Hence, the dataset n queston conssts smply of sequental truncated commands ssued by each of 70 users. Each user s represented by 5,000 commands, collected over tme from a few days to a few months. They randomly chose 50 users to serve as ntruson target, and the remanng 20 users as masqueraders. The frst 5,000 commands of each user are clean data (legtmately ssued by the user), and the next 0,000 commands of the target users were randomly njected wth commands ssued by the 20 masquerade users, apparently none of whom are any of the 50 users. The commands are grouped nto blocks, wth 00 commands per block. The commands n one block are ether all clean or all masquerade attacks, called drty blocks. The task studed n [4] s to accurately classfy the drty blocks as masquerade commands, whle classfyng the clean blocks as belongng to the user. From ths pont of vew, ths problem s very smlar to a text classfcaton problem, where the commands may be regarded as words or tokens n a document that s to be classfed. Ther work dentfes sx masquerade-detecton schemes. They are: Bayes -Step Markov, Hybrd Mult- Step Markov, Incremental Probablstc Acton Modelng (IPAM), Unqueness, Sequence-Match, and Compresson. These masquerade detecton methods were drectly appled to ths dataset; the frst 5000 commands were used for tranng, and the remanng 0000 commands were nserted and used for testng, where each were processed n blocks of 00 commands. Testng and tranng were performed on a per block bass. The paper by Maxon [9] presents a clear and concse revew of these sx methods. The strateges behnd each of these methods are qute dfferent, but the results show that no method completely domnates another usng cluster analyss and ROC curves. Maxon [9] mproved upon Schonlau s result by applyng the Naïve Bayes classfcaton algorthm usng the bag of words features. Naïve Bayes has been used n text classfcaton for a long tme and proved to be very effcent n ths context as well. Naïve Bayes was also used n the earler work [5] on classfyng malcous code attachments n emal messages. Maxon presents a detaled analyss of the orgns of the classfcaton error, revealng why some users are good masquerades and others are not. That paper desgned another experment, called v49, to perform ths error analyss. We also use the v49 expermental settng n our work, but here t s

3 used to compare the performance of dfferent classfers when appled to multple classes. The results for these revewed methods are dsplayed n Table and serve as a baselne for comparson. Method Hts False Postves N. Bayes (updatng) 6.5%.5% N. Bayes (no Upd.) 66.2% 4.6% Unqueness 39.4%.4% Hybrd Markov 49.3% 3.2% -step Markov 69.3% 6.7% IPAM 4.4% 2.7% Sequence Matchng 36.8% 3.7% Compresson 34.2% 5.0% 3. Machne learnng methods 3.. Learnng task For ths masquerade detecton problem, the learnng task s to buld a classfer that can accurately detect the masquerade commands whle not msclassfyng the user s legtmate commands as a masquerade. Usng the Schonlau dataset, whch s organzed as a set of blocks of 00 commands, the learnng task s to compute a bnary classfer whose nput s a block of 00 commands and whose output s a classfcaton of that block as ether generated by a masquerader or not. The target classfcaton s to detect the masquerader s command blocks. Hence, the masqueraders data are postve examples, whle the user s legtmate data are treated as negatve examples. Thus, a true postve outcome s a masquerade block of 00 commands, whle a false postve outcome s a block of commands legtmately ssued by the user but msclassfed as a masquerade. In the followng descrpton, we call the masquerade blocks postve examples and call the legtmate blocks, those ssued by the user hmself, negatve examples. One-class tranng means that a classfer s computed usng only negatve examples of the user hmself as tranng data to buld the classfer, whch wll be used to classfy both postve and negatve data. Thus, the task s to postvely dentfy masqueraders, but not to postvely dentfy a partcular user One-class or two class Prevous work consdered the problem as a mult-class supervsed tranng exercse. The dataset contans data for 50 users. For each user, a specfc class, the frst 5000 commands are treated as negatve examples, whle the data from the other 49 users are treated as postve examples. It s reasonable to assume the negatve examples, whch belong to the same user, were treated consstently, whle the postve examples used n tranng belong to another user. For the masquerade problem, t s probably mpossble and unreasonable to estmate how an attacker would behave. Thus, treatng sets of other users data as postve examples provdes a substantve bas (to those users behavor who probably was not behavng malcously). We next present the means of mplementng one-class tranng for Naïve Bayes classfer and for SVM, usng only data from a sngle user when tranng a classfer to profle a dstnct user Naïve Bayes Classfer The Naïve Bayes classfer [2] s a smple and effcent supervsed learnng algorthm, whch has been proved to be very effectve n text classfcaton, and many other applcatons. It s based on Bayes rule, p( u) P( d p ( u d) = p( d) whch calculates the probablty of a class gven an example. Appled to the masquerade problem, t calculates the lkelhood that a command block belongs to a masquerader (non-self), or some legtmate user. Dfferent commands c, whch are used as features here, are assumed ndependent from each other. Ths s the Naïve part of ths method. There are two common models used n Naïve Bayes Classfer, one s the mult-varate Bernoull model, and the other s the multnomal model []. In the multvarate Bernoull event model, a vector of bnary attrbutes s used to represent a document (n our case, a block of 00 commands), ndcatng whether the command occurs or doesn t occur n the document. The multnomal model uses the number of command occurrences to represent a document, whch s called bag-of-words approach, capturng the word frequency nformaton n documents. Accordng to McCallurn [] s result, mult-varate Bernoull model performs better for small vocabulary sze, and the multnomal model usually performs better at larger vocabulary sze. Because the vocabulary sze (the number of dstnct commands) of ths masquerade problem s 856, whch s a moderate n sze, we want to compare both of these models for ths problem.

4 Mult-varate Bernoull model Usng the mult-varate Bernoull Model, a command block d s represented as a bnary vector d = b ( d), b ( d),..., b ( )) ( 2 m d, wth b (d) set to f the command c occurs at least once n ths block. Here m s the total number of features,.e., the number of dstnct commands. Gven p( c, whch s the probablty estmated for command c for user u n the tranng data, we can compute p ( d of the test block d as: p ( d = m = ( b ( d) p( c + ( b ( d ))( p( c )) () where p( c s estmated wth a Laplacean pror: + N( c, u) p( c = (2) 2 + N( u) N (u) s the number of tranng examples for user u, whle N ( c, u) s the number of documents contanng the command c for user u. Multnomal model Usng the standard bag-of-words approach, each command block s represented by a feature vector d = n ( d), n ( d),..., n ( )) ( 2 m d, where (d) n s the number of tmes command c appears n the command block d. Smlarly, gven p( c, whch s the frequency count computed for command c for user u n the tranng data, we can compute p ( d of the test block d as: m n ( d ) p( d = ( p( c ) (3) = where p( c s derved from: p( c = m N ( u) = j= n d j ( j ) + α = (4) N ( u) n ( d ) + α * m Here α s used for smoothng, whch controls the senstvty to prevously unseen commands. (Ths mples there s a non-zero probablty any command may be ssued by any user.) We set t to 0.0 followng [9]. One-class Naïve Bayes Adaptng the above algorthm to one-class Naïve Bayes, whch uses only postve examples for tranng, s very smple. We only compute p( c for useru s self profle. For the non-self profle, we can assume each j command has equal probablty / m, whch s essentally random. Thus, gven a test d, we can compare p ( d self ) wth p ( d nonself ). The larger the rato of p ( d self ) to p ( d nonself ), the more lkely ths command block d s from the user u. Applyng the one-class Naïve Bayes algorthm to our specfc dataset s also qute smple. Snce each test document (a block of commands) has a fxed number of 00 commands, the probablty of non-self s the same for all tested blocks. Snce we do not have to compute the probablty of non-self; we may compare the probablty of beng self to a threshold n order to decde whether ths block s a masquerade block or not. Furthermore, we can easly adjust the threshold to control the false postve and detecton rate One-class support vector machne Support Vector Machnes (SVM) have been shown to be hghly effectve n text classfcaton as well [6], among other mportant learnng tasks. They are maxmal-margn classfers, rather than probablstc as s Naïve Bayes. In the two-class formulaton, the basc dea s to map feature vectors to a hgh dmensonal space and to compute a hyperplane that not only separates the tranng vectors from dfferent classes, but also maxmzes ths separaton by makng the margn as large as possble. Scholkopf et al. [3] proposed a method to adapt the SVM algorthm for one-class SVM, whch only use examples from one-class, nstead of multple classes, for tranng. The one-class SVM algorthm frst maps nput data nto a hgh dmensonal feature space va a kernel functon and treats the orgn as the only example from other classes. It then teratvely fnds the maxmal margn hyperplane that best separates the tranng data from the orgn. Consderng that our tranng data set x, x,..., 2 x X, Φ s the feature mappng X F to a hgh-dmensonal space, we can defne the kernel functon as: k( x, y) = ( Φ( x) Φ( y)) Usng kernel functons, the feature vectors need not be computed explctly, greatly mprovng computatonal effcency snce we can drectly compute the kernel values and operate on ther mages. Some common kernels are lnear, polynomal, and radal bass functon (rbf) kernels: Lnear Kernel: k( x, y) = ( x y) P-th order polynomal kernel: k ( x, y) = ( x y + ) k( x, y) = e x y 2 / 2 rbf kernel: Now, solvng the one-class SVM problem s equvalent to solvng the dual quadratc programmng (QP) problem: σ 2 p

5 mn α 2 j α α k ( x, x ) subject to 0 α, α =. v where α s a Lagrange multpler, whch can be thought of as a weght on example x, and ν s a parameter that controls the trade-off between maxmzng the number of data ponts contaned by the hyperplane and the dstance of the hyperplane from the orgn. After solvng forα, we can use a decson functon to classfy data. The decson functon s: f ( x) = sgn( α k( x, x) ρ) where the offset ρ can be recovered by ρ = α k x, ). ( j j j x In our work, we used the LIBSVM 2.4 [] avalable at for our experments. LIBSVM s an ntegrated tool for support vector classfcaton and regresson that mplemented Sholkopf s algorthm for one-class SVM. We used the default rbf kernel and the default values of the parameters for oneclass SVM. Another problem to consder for one-class SVM s how to represent the features. We used both a word count representaton and a bnary representaton, whch are equvalent to the multnomal model and mult-varate Bernoull model of Naïve Bayes algorthm, respectvely. The vectors are normalzed to length. j j nserted drty command blocks under a probablty dstrbuton from other users besdes the 50 users whose commands were traned on. For our one-class algorthm, the test data remans the same, but we only use one user s tranng data (the frst 5,000 commands) as negatve examples for tranng; there are no postve examples used at all to compute the classfer. Frst we compare the Naïve Bayes classfer usng the mult-varate Bernoull model and the multnomal model. Then we compare the one-class tranng algorthms aganst the mult-class tranng algorthms. All the classfcaton algorthms we tested do no updatng durng the test of any parameters because we want to compare the pure performance of each of them. Maxon [9] used an emprcally pre-fxed threshold to update the classfer durng the experment, whch s not an ntrnsc characterstc of the classfcaton algorthm. 4. Evaluaton We conducted two sets of experments. The frst experment repeats the expermental methodology of [4]. We show that the performance of one-class tranng s almost the same as the performance of mult-class tranng. Ths s a sgnfcant fndng on ts own. The second experment ams to compare the performance of the two one-class tranng algorthms when appled to multple users. Followng [9], we wll call the frst the SEA experment, whch s from the authors names n [4], Schonlau et al. The second experment s called v49, because we traned usng only one user s data and tested on all other 49 user s data. 4.. SEA Experment Recall that n ths experment, the frst 5,000 commands of a user serve as postve examples, and the frst 5,000 commands of all the other 49 users serve as negatve examples. The resultant classfer s tested on the rest of the 0,000 commands of the user. These have! "! Fgure dsplays plots comparng the mult-varate Bernoull model and the multnomal model of Naïve Bayes classfer. When usng mult-class tranng, the multnomal model s obvously better than the Bernoull model. But the dfference s not so obvous n one-class tranng, especally when the false postve rate s low. We

6 thus compare both models n the followng v49 experment. To compare the performance of the one-class tranng algorthms aganst the mult-class tranng algorthm on the same test data, we plot the ROC curves as dsplayed n Fgure. For the mult-class tranng algorthm, we only use the multnomal model Naïve Bayes algorthm as the baselne for comparson, whch s better than Bernoull model and has been proved to the best among the varety of methods as descrbed n [9]. For the one-class SVM, we compare both the bnary and word count representatons. From Fgure 2, we can see that only oneclass SVM usng the word count representaton s a lttle bt worse than the other three methods. One-class SVM usng the bnary representaton and one-class Naïve Bayes acheved almost the same performance as the two class Naïve Bayes algorthm. We also compare n Fgure 3 the performance of all the prevous algorthms from Table to one-class SVM algorthm usng bnary features, whch s best one among the one-class tranng algorthms. One-class SVM-bnary s better than most of the prevous algorthms except the two-class multnomal Naïve Bayes algorthm wth updatng. Ths experment confrmed our conjecture that for masquerade detecton, one-class tranng s as effectve as two class tranng. # $ % & ' ( $ ) * 4.2. v49 Experment As we have ponted out, snce the dataset used had randomly nserted masquerade blocks n each user s test commands (0,000 commands followng the frst 5,000), each user has a dfferent number of drty blocks and the orgns of these drty blocks also dffer. So the result of the SEA experment may not llustrate the real performance of a classfcaton algorthm. (There are too many unfxed parameters.) To better evaluate the performance of a classfcaton algorthm, we can treat these 50 users as our selected sample of common users. If we can prove algorthm A s better than algorthm B for most of the 50 users, we can nfer A s better than B n a general sense. To meet ths requrement, we follow the v49 experment, but for a dfferent purpose. We use one user s frst 5,000 commands as negatve tranng data to compute a classfer wthout any postve tranng data. For test data, we use the non-masquerade blocks from the 0,000 addtonal commands of the same user as negatve test data, and the other 49 users frst 5,000 commands as postve test data. Ths data s also organzed n blocks of 00 commands. As we mentoned before, the same algorthm mght perform qute dfferently for dfferent users. Fgure 4 llustrates the dfference. Fgure 4 shows the ROC curve for user 2, 20 and 40 usng one-class SVM wth the bnary feature representaton. Such a dfference occurs no matter whch algorthm has been used; the dfference s determned by the characterstc of each user.

7 +, $ ) * To compare the dfferent methods for multple users, we compute the ROC score for each user. In general, a ROC score s the fracton of the area under the ROC curve, the larger the better. A ROC score of means perfect detecton wthout any false postves. Fgure 5 below shows the ROC scores for users 20 and 40 usng the oneclass SVM-bnary algorthm. -, % ' #. +., /, 0 " ' For the masquerade problem, we are more nterested n the regon of the ROC curve wth a low false postve rate; otherwse, the annoyance level of false alarms would render the detector useless n practcal use. Therefore, we restrct the ROC scores to the curves wth false postve lower than P, whch s called the ROC-P score. For example, f we want to restrct the false postves to be lower than 5% of all command blocks, we can compute ROC-5. Smlar to the general ROC score, the ROC-P score s the fracton of the area under the ROC curve where the false postve rate s lower than P%. Fgure 7, dsplays an example of ROC-0, based on the ROCcurves of users 20 and 40. Only part of the ROC curve s drawn here to hghlght the plots. Fgure 6 llustrates the performance of several oneclass tranng algorthms as measured by ROC scores. The fgure ncludes results for all 50 users. From Fgure 6, we can see that one-class SVM usng word-count features s the worst among the four algorthms. At the hgh ROC score regon, wth a ROC score hgher than 0.8 (whch s what we prefer) one-class SVM usng bnary features performs best among all. There s no bg dfference between Naïve Byaes usng the multnomal model or the mult-varate Bernoull model.,. #. +.!" Snce we can see that one-class SVM usng the bnary feature s generally better than one-class SVM usng the word count feature, as depcted n Fgure 6; here we only compare the one-class SVM usng the bnary representaton wth the multnomal model Naïve Bayes and Bernoull model Naïve Bayes n the followng ROC-P comparson. Fgures 8 plots the comparson for ROC-5 and ROC-, whch means false postves are below 5% and %, respectvely. From these two plots, we can

8 determne that one-class SVM usng the bnary feature s almost always better than the other two one-class Naïve Bayes methods. 5, $ ) * " *! - 4 4! " ' -! To compare the performance of dfferent algorthms on an ndvdual user bass, we compare the ROC-P score user by user. Fgure 9 shows a user-by-user comparson of one-class SVM usng the bnary feature representaton and one-class Naïve Bayes usng the multnomal model, when the false postve rate s lower than %. Agan we can see, for most of the 50 users, one-class SVM wth bnary features s better than one-class Naïve Bayes usng the multnomal model. However, there are stll some users whose data exhbt better performance usng the one-class Naïve Bayes. Ths suggests that we can choose the best algorthm to use for an ndvdual user to mprove the whole system s performance. 6, 7 $ ) *! 4 5. Dscusson From our work we can see that one-class SVM usng bnary features performs better than one-class Naïve Bayes and one-class SVM usng word count features. Even so, masquerade detecton s a very hard problem, and all three algorthms dd not acheve very hgh accuracy wth near to zero false postve rates for every user. Ths s partly caused by the nherent nature of the data avalable and the dffculty of ths problem. We would lke to reapply these methods usng a rcher set of data as descrbed by Maxon [0], ncorporatng command arguments. We also beleve that temporal data assocated wth each user s sequental commands wll provde consderable value as well to mprove performance. Another problem to consder for the practcal utlty of these approaches s reslency to drect attack;.e. how could we protect the models that were computed from, for example, a mmcry attack by the masquerader? In the experments performed, we dd not evaluate feature selecton. We tested one-class SVM usng 00,

9 200, and 300 of the most frequently used UNIX commands. Each of the results s worse than had we used all of the avalable UNIX commands, whose total number s around 870. We also conjectured that 2-gram features (adjacent pars of commands) would perform better than ndvdual commands (-grams) as a feature. However, we found that the results were worse when we used all of the 2-grams. In further work, we would evaluate some feature selecton methods to mprove performance. For example, we beleve a selecton of some features usng both -gram and 2-grams may mprove the qualty of the user profles, and thus the accuracy of the detector. A system to detect masqueraders as descrbed n ths paper should not be vewed as a sngle detector, but rather as evdence to be correlated wth other sensors and other detectors. Thus, although the performance of the detectors descrbed heren and n pror work seemngly are not accurate enough, when one wshes to lmt false postves, t may be wse to relax the threshold to generate hgher true postve rates. If the output of the detector were combned wth other evdence (for example, fle system access anomaly detecton, or other sensors), t may be possble to rase substantally the bar n protectng hosts from malcous abuse. 6. Concluson In ths paper, to solve the masquerade detecton problem, we use one-class tranng algorthms whch only tran on a user s clean data. It has been demonstrated that one-class tranng algorthms can acheve smlar performance as multple class methods, but requre much less effort n data collecton and centralzed management. Besdes masquerade detecton, we beleve one-class tranng s also good for some other ntruson detecton problems where sample ntruson data are hard to get or too varable to cluster. We also gve a detaled comparson of the performance of dfferent one-class algorthms as appled to multple users. The results show that for most users one-class SVM usng the bnary feature representaton s better than oneclass Naïve Bayes and one-class SVM usng the word count representaton, especally when we want to restrct the false postve rate to a relatvely low level. In our future work, we plan to nclude command arguments, not only truncated commands, as features to mprove the accuracy of masquerade detecton. As the number of features ncrease, we also plan to do feature selecton to fnd the most nformatve features and to dscard those features that have no value for the target task. Acknowledgments Ths work was partally supported by DARPA contract No. F We also thank Prof. Tony Jebara for helpful suggestons and valuable comments. Reference: [] Chh-Chung Chang and Chh-Jen Ln, LIBSVM: a lbrary for support vector machnes, 200. Software avalable at [2] Eleazar Eskn, Wenke Lee and Salvatore J. Stolfo, Modelng System Calls for Intruson Detecton wth Dynamc Wndow Szes, Proceedngs of DISCEX II, June, 200. [3] Stephane Forrest, Steven A. Hofmeyr, Anl Somayaj, and Thomas A. Longstaff, A sense of self for UNIX processes, In Proceedngs of IEEE Symposum on Securty and Prvacy, 996. [4] Anup K. Ghosh and Aaron Schwartzbard, A study n usng neural networks for anomaly and msuse detecton, In Proceedngs of USENIX Securty Symposum 999 [5] M. Grbskov and N. L. Robnson, Use of recever operatng characterstc (ROC) analyss to evaluate sequence matchng, Computers and Chemstry, 20():25 33, 996. [6] Thorsten Joachms, Text categorzaton wth support vector machnes: Learnng wth many relevant features, In Proc. of the European Conference on Machne Learnng (ECML), pp , 998. [7] W. Lee and S. J. Stolfo, Data mnng approaches for ntruson detecton, In Proceedngs of USENIX Securty Symposum 998 [8] T. Lunt, A.Tamaru, F. Glham, R. Jagannathan, C. Jala, H.S. Javtz, A. Valdes, and P.G. Neumann, A Real-Tme Intruson Detecton Expert System," SRI CSL Tecncal Report, SRI-CSL-90-05, June 990. [9] Maxon, Roy A. and Townsend, Tahla N, Masquerade Detecton Usng Truncated Command Lnes, Internatonal Conference on Dependable Systems and Networks (DSN- 02), pp , Washngton, D.C June [0] Maxon, Roy A. Masquerade Detecton Usng Enrched Command Lnes, In Internatonal Conference on Dependable Systems & Networks (DSN-03), pp. 5-4, San Francsco, Calforna, June IEEE Computer Socety Press, Los Alamtos, Calforna, [] A. McCallurn, K. Ngam, A Comparson of Event Models for Nave Bayes Text Classfcaton, AAAI-98 Workshop on Learnng for Text Categorzaton, 998 [2] T. M. Mtchell, Bayesan Learnng, Chapter 6 n Machne Learnng, pp McGraw-Hll, 997. [3] B. Scholkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Wllamson, Estmatng the support of a hghdmensonal dstrbuton. Technque report, Mcrosoft Research, MSR-TR-99-87, 999.

10 [4] M. Schonlau, W. DuMouchel, W. -H. Ju, A. F. Karr, M. Theus, and Y. Vard, Computer ntruson: Detectng masquerades, Statstcal Scence, 6():58-74, February 200. [5] Matthew G. Schultz, Eleazar Eskn, and Salvatore J. Stolfo, Malcous Emal Flter - A UNIX Mal Flter that Detects Malcous Wndows Executables, Proceedngs of USENIX Annual Techncal Conference - FREENIX Track, Boston, MA: June 200. [6] S. Y. Sedelow, The Computer n the Humantes and Fne Arts, ACM Computng Surveys 2(2): 89-0 (970) [7] Salvatore J. Stolfo, Shlomo Hershkop, Ke Wang, Olver Nmeskern, and Cha-We Hu, Behavor Proflng of Emal, st NSF/NIJ Symposum on Intellgence & Securty Informatcs (ISI 2003), June 2-3, 2003, Tucson, Arzona. [8] O. De Vel, A. Anderson, M. Corney, and G. Mohay, Mnng Emal Content for Author Identfcaton Forenscs, SIGMOD: Specal Secton on Data Mnng for Intruson Detecton and Threat Analyss, December 200. [9] Nong Ye, A Markov Chan Model of Temporal Behavor for Anomaly Detecton, Proceedngs of the IEEE Systems, Man, and Cybernetcs Informaton Assurance and Securty Workshop, 2000.

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Neural Networks in Statistical Anomaly Intrusion Detection

Neural Networks in Statistical Anomaly Intrusion Detection Neural Networks n Statstcal Anomaly Intruson Detecton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech., Unversty Heghts, Newark, NJ 72, USA

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers Journal of Convergence Informaton Technology Volume 5, Number 2, Aprl 2010 Investgatng the Performance of Naïve- Bayes Classfers and K- Nearest Neghbor Classfers Mohammed J. Islam *, Q. M. Jonathan Wu,

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status

More information

Relevance Feedback Document Retrieval using Non-Relevant Documents

Relevance Feedback Document Retrieval using Non-Relevant Documents Relevance Feedback Document Retreval usng Non-Relevant Documents TAKASHI ONODA, HIROSHI MURATA and SEIJI YAMADA Ths paper reports a new document retreval method usng non-relevant documents. From a large

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Using Neural Networks and Support Vector Machines in Data Mining

Using Neural Networks and Support Vector Machines in Data Mining Usng eural etworks and Support Vector Machnes n Data Mnng RICHARD A. WASIOWSKI Computer Scence Department Calforna State Unversty Domnguez Hlls Carson, CA 90747 USA Abstract: - Multvarate data analyss

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Feature Kernel Functions: Improving SVMs Using High-level Knowledge

Feature Kernel Functions: Improving SVMs Using High-level Knowledge Feature Kernel Functons: Improvng SVMs Usng Hgh-level Knowledge Qang Sun, Gerald DeJong Department of Computer Scence, Unversty of Illnos at Urbana-Champagn qangsun@uuc.edu, dejong@cs.uuc.edu Abstract

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton We-Chh Hsu, Tsan-Yng Yu E-mal Spam Flterng Based on Support Vector Machnes wth Taguch Method for Parameter Selecton

More information

CLASSIFICATION OF ULTRASONIC SIGNALS

CLASSIFICATION OF ULTRASONIC SIGNALS The 8 th Internatonal Conference of the Slovenan Socety for Non-Destructve Testng»Applcaton of Contemporary Non-Destructve Testng n Engneerng«September -3, 5, Portorož, Slovena, pp. 7-33 CLASSIFICATION

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

An Anti-Noise Text Categorization Method based on Support Vector Machines *

An Anti-Noise Text Categorization Method based on Support Vector Machines * An Ant-Nose Text ategorzaton Method based on Support Vector Machnes * hen Ln, Huang Je and Gong Zheng-Hu School of omputer Scence, Natonal Unversty of Defense Technology, hangsha, 410073, hna chenln@nudt.edu.cn,

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Intrinsic Plagiarism Detection Using Character n-gram Profiles

Intrinsic Plagiarism Detection Using Character n-gram Profiles Intrnsc Plagarsm Detecton Usng Character n-gram Profles Efstathos Stamatatos Unversty of the Aegean 83200 - Karlovass, Samos, Greece stamatatos@aegean.gr Abstract: The task of ntrnsc plagarsm detecton

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults 1 An Improved Neural Network Algorthm for Classfyng the Transmsson Lne Faults S. Vaslc, Student Member, IEEE, M. Kezunovc, Fellow, IEEE Abstract--Ths study ntroduces a new concept of artfcal ntellgence

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION... Summary A follow-the-leader robot system s mplemented usng Dscrete-Event Supervsory Control methods. The system conssts of three robots, a leader and two followers. The dea s to get the two followers to

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments Internatonal Journal of u- and e- ervce, cence and Technology Vol8, o 7 0), pp7-6 http://dxdoorg/07/unesst087 ecurty Enhanced Dynamc ID based Remote ser Authentcaton cheme for ult-erver Envronments Jun-ub

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines. CS534 - Machine Learning Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators

More information