Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Size: px
Start display at page:

Download "Detecting Outliers in High-Dimensional Datasets with Mixed Attributes"

Transcription

1 Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of ECE, Florida Institute of Tehnology, Melbourne, FL, USA Abstrat - Outlier Detetion has attrated substantial attention in many appliations and researh areas. Examples inlude detetion of network intrusions or redit ard fraud. Many of the existing approahes are based on pair-wise distanes among all points in the dataset. These approahes annot easily ext to urrent datasets that usually ontain a mix of ategorial and ontinuous attributes, and may be sattered over large geographial areas. In addition, urrent datasets usually have a large number of dimensions. These datasets t to be sparse, and traditional onepts suh as Eulidean distane or nearest neighbor beome unsuitable. We propose ODMAD, a fast outlier detetion strategy inted for datasets ontaining mixed attributes. ODMAD takes into onsideration the sparseness of the dataset, and is experimentally shown to be highly salable with the number of points and number of attributes in the dataset. Keywords: Outlier Detetion, Mixed Attribute Datasets, High Dimensional Data, Large Datasets. Introdution Deteting outliers in data is a researh field with many appliations, suh as redit ard fraud detetion [], or disovering riminal ativities in eletroni ommere. Outlier detetion approahes fous on deteting patterns that our infreuently in the dataset, versus traditional data mining strategies that attempt to find regular or freuent patterns. One of the most widely aepted definitions of an outlier pattern is provided by Hawkins [2]: An outlier is an observation that deviates so muh from other observations as to arouse suspiion that it was generated by a different mehanism. Most of the existing researh efforts in outlier detetion have foused on datasets with a speifi attribute type, and assume that attributes are only numerial and/or ordinal, or only ategorial. In the ase of data with ategorial attributes, tehniues whih assume numerial data need to first map the ategorial values to numerial values, a task whih is not a straightforward proess (e.g., the mapping of a marital status attribute (married or single) to a numerial attribute). In the ase of ontinuous attributes, algorithms designed for ategorial data might use disretization tehniues to map intervals of ontinuous spae into disrete values, whih an lead to loss of information. A seond issue is that many appliations for mining outliers reuire the mining of very large datasets (e.g. terabyte-sale data). This leads to the need for outlier detetion algorithms whih must sale well with the size and dimensionality of the dataset. Data may also be sattered aross various geographial areas, whih implies that transferring data to a entral loation and then deteting outliers is impratial, due to the size of the data, as well as data ownership and ontrol issues. Thus, the algorithms designed to detet outliers must minimize the number of data sans, as well as the need for exessive ommuniation and reuired synhronization. A third issue is the high dimensionality of urrently available data. Due to the large number of dimensions, the dataset beomes sparse, and in this type of setting, traditional onepts suh as Eulidean distane between points, and nearest neighbor, beome irrelevant [3]. Employing similarity measures that an handle sparse data beomes imperative. Also, inspeting several, smaller views of the data an help unover outliers, whih would otherwise be masked by other outliers if one were to look at the entire dataset at one. In this paper, we ext our work in [4] and we propose an outlier detetion approah for datasets that ontain both ategorial and ontinuous attributes. Our method, Outlier Detetion for Mixed Attribute Datasets (ODMAD), uses an anomaly sore based on the ategorial values of eah data point. ODMAD then uses this sore to find similarities among the points in the sparse ontinuous spae. ODMAD is fast, effiiently handles sparse data, relies on minimal data sans, and ls itself to large and geographially distributed data. The organization of this paper is as follows: Setion 2 ontains an overview of previous researh in outlier detetion. In Setion 3, we present our outlier detetion approah, ODMAD. Setion 4 inludes our experimental results, followed by our onlusions in Setion 5.

2 2 Previous Work The existing outlier detetion work an be ategorized as follows. Statistial-model based methods assume that a speifi model desribes the distribution of the data [5], whih has the problem of obtaining the suitable model for eah partiular dataset and appliation [6]. Distane-based approahes (e.g. [7]) essentially ompute distanes among data points, thus beome uikly impratial for large datasets (e.g., a nearest neighbor method has uadrati omplexity with respet to the number of dataset points). Bay and Shwabaher [8] propose a distane-based method based on randomization and pruning and laim its omplexity is lose to linear in pratie. Distane-based methods reuire data to be in the same loation or large amounts of data to be transferred from different loations, whih makes them impratial for distributed data. Clustering tehniues an also be employed to first luster the data, so that points that do not belong in the formed lusters are designated as outliers. However, these methods are foused on optimizing lustering rather than finding outliers [7]. Density-based methods estimate the density distribution of the data and identify outliers as those lying in relatively low-density regions (e.g. [9]). Although these methods are able to detet outliers not disovered by the distane-based methods, they beome hallenging for sparse high-dimensional data [0]. Other outlier detetion efforts rely on Support Vetor methods [], Repliator Neural Networks [2], or using a relative degree of density with respet only to a few fixed referene points [3]. Most of the aforementioned tehniues are geared towards numerial data and thus are more appropriate for numerial datasets or ordinal data that an be easily mapped to numerial values [4]. Another limitation of previous methods is the lak of salability with respet to number of points and/or dimensionality of the dataset. Outlier detetion tehniues for ategorial datasets have reently appeared in the literature (e.g. [5]). In [4], we experimented with a number of representative outlier detetion approahes for ategorial data, and proposed AVF (Attribute Value Freueny), a simple, fast, and salable method for ategorial sets. Otey et al. [6] presented a distributed and dynami outlier detetion method for mixed attribute datasets that has linear runtime with respet to the number of data points; however their runtime is exponential in the number of ategorial attributes and uadrati in the number of numerial attributes. Regarding sparseness in highdimensional data, Ertoz et al. [3] use the osine funtion for doument lustering. They onstrut a shared nearest neighbor (SNN) graph, and then luster together high-dimensional points based on their shared nearest neighbors. In this paper, we ext our previous work in [4] and propose Outlier Detetion for Mixed Attribute Datasets (ODMAD), an outlier detetion approah for sparse data with both ategorial and ontinuous attributes. ODMAD exhibits very good auray and performane, it is highly salable with the number of points and dimensionality of the dataset, and an be easily applied to distributed data. We ompare ODMAD with the tehniue in [6] whih is the existing outlier detetion approah for distributed, mixed attribute datasets. 3 ODMAD Algorithm The outlier detetion proposed in this paper, ODMAD, detets outliers based on the assumption that outliers are points with highly irregular or infreuent values. In [4], we showed how this idea ould be used to effetively detet outliers in ategorial data. ODMAD exts the work in [4] to explore outliers in the ategorial and in the ontinuous spae of attributes. In this mixed attribute spae, an outlier an have irregular ategorial values only (type a), or irregular ontinuous values only (type b), or both (type ). The algorithmi steps of ODMAD are below: In the first step, we inspet the ategorial spae in order to detet data points with irregular ategorial values. This enables us to detet outliers of type a and type. In the seond step, we set aside the points found as irregular from the first step, and fous on the remaining points, in an attempt to detet the rest of the outliers (type b). Based on the ategorial values of the remaining points, we onentrate on subsets extrated from the data, and work only on these subsets, one at a time. These subsets are onsidered so that we an identify outliers that would have otherwise been missed (masked) by more irregular outliers. To illustrate our point, onsider the senario in Figure. Outlier point O 2 is irregular with respet to the rest of the data points, while the seond outlier, O, is loser to the normal points. In this ase, outlier point O 2 masks the other outlier point, O. One solution to this problem ould be to seuentially remove outliers. This implies several data sans, whih is impratial for large or distributed data. In Setion 3.3, we explain in more detail how we address this issue by onsidering subsets of the data. Figure : Masking Effet - Outlier O 2 is more irregular than normal points and outlier O, therefore O 2 will likely mask O.

3 3. Categorial Sore As shown in [4], the ideal outlier in a ategorial dataset is one for whih eah and every value of its values is extremely irregular (or infreuent). The infreuent-ness of an attribute value an be measured by omputing the number of times this value is assumed by the orresponding attribute in the dataset. In [4] we assigned a sore to eah data point in the dataset that reflets the freueny with whih eah attribute value of the point ours. In this paper, we ext this notion of outlierness to over the likely senario where none of the single values in an outlier point are infreuent, but the oourrene of two or more of its attribute values is infreuent. We onsider a dataset D with n data points, x i, i =..n. If eah point x i has m ategorial attributes, we write x i = [ x i,, x il,, x im ], where x il is the value of the l-th attribute of x i. Our anomaly sore for eah point makes use of the idea of an itemset (or set) from the freuent itemset mining literature [6]. Let I be the set of all possible ombinations of attributes and their values in dataset D. Let S be a set of all sets d suh that an attribute ours only one in eah set d: S = {d : d power set (I) l, k d, l k} where l and k represent attributes whose values appear in set d. We also define the length of d, d, as the number of attribute values in d, and the freueny or support of set d as f(d), whih is the number of points x i in dataset D whih ontain set d. Following the reasoning stated earlier, a point is likely to be an outlier if it ontains single values or sets of values that are infreuent. We say that a value or a set of values is infreuent if it appears less than minsup times in our data, where minsup is a user threshold. Therefore, a good indiator to deide if x i is an outlier in the ategorial attribute spae is the sore value, Sore, defined below: Sore ( MAXLEN x i ) = () f ( d) d d = d x i f ( d ) minsup Essentially, we assign an anomaly sore to eah data point that deps on the infreuent subsets ontained in this point. As shown in [6], we obtain a good outlier detetion auray by only onsidering sets of length up to a user-entered MAXLEN. For example, let point x i = [a b ], and MAXLEN = 3, the possible subsets of x i are: a, b,, ab, a, b, and ab. If subset d of x i is infreuent, i.e. f(d) minsup, we inrease the sore of x i by the inverse of f(d) times the length of d. In our example, if f(ab) = 3 and minsup = 5, ab is an infreuent subset of x i, and Sore will inrease by /(3 2) = /6. A higher sore implies that it is more likely that the point is an outlier. If a point does not ontain any infreuent subsets its sore will be zero. Sore is inversely proportional to the freueny, as well as to the length of eah set d that belongs to x i. Therefore, a point that has very infreuent single values will get a very high sore; a point with moderate infreuent single values will get a moderately high sore; and a point whose single values are all freuent and has a few infreuent subsets will get a moderately low sore. We note that Sore is similar to the one in [6]; however the latter does not make any distintion between sets of different freueny. We use the freueny of the sets to further distinguish between points that ontain the same number of infreuent values. The benefit of our sore beomes pronouned with larger datasets: for example, onsider a dataset with a million data points and minsup of 0%. Also assume two ategorial values: a, that appears only one in the dataset, and b, that appears in the dataset slightly less than a hundred thousand times. Using our sore, a data point ontaining value a (very infreuent) will have a muh higher sore than a point with value b. Using the sore by [6] the two values would add the same to the sore. Therefore, our sore better reflets the amount of irregularity in the data. 3.2 Continuous Sore Many existing outlier detetion methods are based on distanes between points in the entire dataset. In addition to the fat that this an be ineffiient, espeially for large or distributed data, it is very likely that in doing so, the algorithm might miss points whih are not globally obvious outliers, but easier to spot if we fous on a subset of our dataset. Furthermore, the notion of a nearest neighbor does not hold as well in high dimensional spaes beause the distane between any two data points beomes almost the same [7]. In our ase of mixed attribute data, it is reasonable to believe that data points that share a ategorial value should also share similar ontinuous values. Therefore, we an restrit our searh spae by fousing on points that share a ategorial value, and then rank these points based on similarity to eah other. One issue that arises is how to identify similarities between points in high-dimensional data. The most prevalent similarity or distane metri is the Eulidean distane, or the L 2 -norm. Even though the Eulidean distane is valuable in relatively small dimensionalities, its usefulness dereases as the dimensionality grows. Let us onsider the four points below, taken from the KDDCup 999 dataset (desribed in more detail in 4.): the first two points are normal and the seond two points are outliers (we removed the olumns that had idential values for all four points). Using Eulidean distane we find orretly that point is losest to point 2 and vie versa, but for points 3 and 4 we find that eah is losest in Eulidean distane to point, i.e. the two outliers are more similar to a normal point than to eah other.

4 E-6 8.5E This is mainly beause the Eulidean distane assigns eual importane to attributes with zero values as to attributes with non-zero values. In higher dimensionalities, the presene of an attribute is typially a lot more important than the absene of an attribute [3], as the data points in high dimensionalities are often sparse vetors. Cosine similarity is a ommonly used similarity metri for lustering in very highdimensional datasets, e.g. used for doument lustering in [3]. The osine similarity between two vetors is eual to the dot produt of the two vetors divided by the individual vetor norms. Assuming non-negative values, minimum osine similarity is 0 (non-similar vetors) and maximum is (idential vetors). In our example with the four points above, the osine funtion assigns highest similarity between points and 2, and between points 3 and 4, so it orretly identifies similarity between normal points (points and 2) and between outlier points (points 3 and 4). In this paper, we used the osine funtion to define similarities in the ontinuous spae. Consider a data point x i ontaining m ategorial values and m ontinuous values. The ategorial and ontinuous parts of x i are denoted by x i and x i respetively. Let a be one of the ategorial values of x i whih ours with freueny f (a). We identify a subset of the data that inludes the ontinuous vetors orresponding to all points that share value a: { x : a x i, i =..n}, whih ontains f(a) vetors. The osine similarity between the mean vetor of this set, µ a, and x i is below: os( m x ij µ a j x i, µ a ) = (2) j= x µ ij a where x is the L 2 -norm of vetor x. Finally, we assign the following sore to eah x i, for all ategorial values a in Sore 2 ( x i ) = os( x i, µ a ) (3) a x i a x i i x i : whih is the summation of all osine similarities for all ategorial values a divided by the total number of values in the ategorial part x i. As minimum osine similarity is 0 and maximum is, the points with similarity lose to 0 are more likely to be outliers. Even though using the osine similarity helps us better assess distanes in a high-dimensional spae, its use will not vastly improve our outlier detetion auray in a large dataset with many different types of outliers. As we noted earlier in this setion, we fous on speifi subsets of the ontinuous spae so as to identify outliers in smaller settings. In the next setions, we address the issue of having more than one outlier in a subset, and we outline whih ategorial values we use for Sore 2 in E. (3). 3.3 Improving Auray Many methods (e.g. [7]) assume that outliers are the data points that are irregular in omparison to the rest of the dataset, and that they an be globally deteted. However, in many real datasets there are multiple outliers with different harateristis and their irregularity and detetion deps on the rest of the outliers against whih they are ompared. This way, there ould be outliers in our dataset that are masked by other, more irregular outliers (see Figure ). The solution that we propose is to further use the knowledge that we obtain from the ategorial sores to help alleviate this issue. Based on E. (), data points with highly infreuent ategorial values will have a very high Sore. We an exlude these points with high Sore from the omputation of our ontinuous sore in Euations (2)-(3). The exlusion of these outlier points an be done in the following manner: as we ompute the freuenies and means for eah ategorial value in our dataset, we identify highly infreuent ategorial values. Based on this information, we an update the means for the rest of the ategorial values that o-our with the highly infreuent values. The details on how we selet the values to exlude from the ontinuous subsets are given in the following setions. 3.4 Algorithm ODMAD onsists of two phases: the first phase alulates the neessary uantities for the algorithm (ategorial values, freuenies, sets, and means); the seond phase goes over eah point in the dataset and deides if eah point is an outlier or not, based on the sores desribed in Setion 3. and 3.2. The pseudoode for the two phases is given in Figures 2 and 3, respetively. As shown in Figure 2, for the sore alulation from E. (), we only gather the freuenies of ertain sets: the pruned andidates. Pruned andidates are those infreuent sets suh that all their subsets are freuent. These are the sets that are pruned at eah phase of a Freuent Itemset Mining algorithm suh as Apriori [6]. The reason behind this is that as mentioned in setion 3., we are interested in either single ategorial values that are infreuent, or infreuent sets ontaining single values that are freuent on their own. This makes ODMAD faster as shown in the following example.

5 Input: D dataset (n points, m and m attributes) minsup, MAXLEN Output: G - Pruned Candidates & their freuenies; A - Categorial values, means & freuenies foreah point x i ( i =..n) begin Add the ategorial values of x i, their freuenies, & their means to A; foreah len = 2..MAXLEN begin Create andidate sets and get freuent itemsets; Add pruned sets & their freuenies to G ; Figure 2: First Phase of our Outlier Detetion Approah ODMAD Input: D - dataset (n points, m and m attributes) G, A, minsup, MAXLEN, window, sore,, low_sup, upper_sup Output: outliers foreah point x i ( i =..n) begin foreah ategorial value a in x i begin If f(a) < minsup Sore (x i ) += / f (a); If low_sup < f(a) upper_sup i Sore 2 (x i ) += os ( x, µ ); foreah pruned set d in G found in x i begin Sore (x i ) += / ( f (d) d ); If Sore > sore average Sore in window or Sore 2 < sore average Sore 2 in window flag(x i ) = outlier; else normal, add Sore,2 to window sores; Figure 3: Seond Phase our Outlier Detetion Approah ODMAD Example. Assume we have two points, eah with three ategorial attributes: x = [a b ] and x 2 = [a b d]. If only single values a, are infreuent with freueny eual to 5, the sore is as follows: Sore ( x Sore ( x 2 ) = f ( a) + f ( ) = ) = f ( a) = / 5 = 0.2 a 2 / 5 = 0.4, Sine a and are infreuent, we do not hek any of their ombinations with other values beause they will also be infreuent. The sets we will not hek are: ab, a, ad, b, d. However, bd onsists of freuent values, b and d, so we hek its freueny. Assuming bd is infreuent, and f (bd) = 4, we inrease the sore of x 2 : Sore ( x 2 ) = 0.2+ ( f ( bd ) bd ) = Note that at this point we stop inreasing the sore of both x and x 2, beause there are no more freuent sets. Therefore, in this senario, we only need to hek sets a,, and bd, instead of all possible sets of length to 3 ontained in x, x 2. As we identify ategorial values and sets, we also update the orresponding mean vetors as disussed in Setion 3.3. We use a user-entered freueny threshold, alled low_sup, to indiate what values we onsider highly infreuent ; then ategorial values with freueny low_sup are marked as highly infreuent. As we desribed in 3.3, we exlude points that ontain these highly infreuent values from the mean in E. (2) of all other ategorial values they o-our with. For example: assume point x ontains ategorial values a, f(a) low_sup, and value b, f(b) > low_sup. We exlude point x as follows: µ b n = x f ( b) i=, b x i i x In the seond phase in Figure 3, we first find all ategorial values in eah point and update Sore in E. () aordingly. We do the same for all the pruned sets ontained in the urrent point. Also, for eah ategorial value, we ompute Sore 2 using the updated mean we omputed in the first phase. The ontinuous vetors we use are those that orrespond to ategorial values with freueny in (low_sup, upper_sup]. If a point has a value with freueny less than low_sup, its Sore 2 will be, as it ontains a highly infreuent ategorial value. If a point has no values with freueny in (low_sup, upper_sup] it will have a Sore 2 of 0. By applying a lower bound to the freueny range we exlude values with very infreuent ategorial values, and by applying an upper bound we limit the amount of data points to whih we assign a sore in the ontinuous domain. Finally, as we san and sore the data points, we maintain a window of ategorial and ontinuous sores. We also employ a delta value for the detetion of abnormal sores: sore for the ategorial sores and sore for the ontinuous sores. As we go over the points in the seond phase, if a point has a sore larger (smaller in the ase of Sore 2 ) than the average sore of the previous window of points by the orresponding value, it is flagged as an outlier. Otherwise, the point is normal, and its non-zero sores are added to the window we maintain. If eah of the m ategorial attributes has an average of v distint values, the omplexity upper bound is below: T n MAXLEN j ( m v m + v ) j= m = O ( n m ( ) v + n v m ) m j

6 TABLE. DETECTION RATE ON THE KDDCUP 999 TRAINING DATASETS (0% Training Set and Entire Training Set) 0% training set Entire Training set Attak Type ODMAD Otey s ODMAD Otey s Bak Buffer overflow FTP Write Guess password Imap IP Sweep Land Load Module Multihop Neptune Nmap Perl Phf Pod Port Sweep Root Kit Satan Smurf Spy Teardrop Warez lient Warez master TABLE 2. EXECUTION TIME (SECONDS) FOR ODMAD VERSUS OTEY S APPROACH ON THE KDDCUP 999 TRAINING DATASETS ODMAD Otey s Approah 0% Training Set Entire Training Set Therefore ODMAD sales linearly with the number of data points, n, and with the number of ontinuous attributes, m, but seems to be saling exponentially with the number of ategorial attributes m. In pratie our algorithm runs faster beause we are using only the pruned andidates for the ategorial value-based sore. Otey s method in [6] has exponential time with respet to ategorial attributes, and uadrati with the number of ontinuous. Moreover, the method in [6] reuires a ovariane matrix for eah possible itemset in the dataset, while our method only reuires a vetor of length m (the mean vetor) for eah ategorial value. 4 Experiments 4. Experimental Setup We implemented our approah and Otey s approah [6] using C++. We ran our experiments on a workstation with a Pentium 4.99 GHz proessor and GB of RAM. We used the KDDCup 999 intrusion detetion dataset [8] from the UCI repository [9]. This dataset ontains reords that represent onnetions to a military omputer network and multiple intrusions and attaks by unauthorized users. The raw binary TCP data were proessed into features suh as onnetion duration, protool type, number of failed logins, et. The KDD dataset ontains a training set with 4,898,430 data points and a dataset with 0% training data points. There are 33 ontinuous attributes and 8 ategorial attributes. Due to the large number of attaks in these datasets, we preproess them suh that attak points are around 2% of the dataset, and we preserve the proportions of the various attaks. We follow the same onept as in [6]: sine network traffi pakets t to our in bursts for some intrusions, we look at bursts of pakets in the data set. Our proessed dataset based on the entire training set ontains 983,550 instanes with 0,769 attak instanes, and similarly for the 0% training dataset. We ompare our method with the one proposed in [6] as it is the only existing distributed outlier detetion approah for mixed attribute datasets that sales well with the number of data points. We evaluate both algorithms based on two measures: outlier detetion auray, or the outliers orretly identified by the approah as outliers, and the false positive rate, refleting the number of normal points erroneously identified as outliers. We also ompare the exeution time of the two algorithms using the same data. 4.2 Results The outlier detetion auray or detetion rate reflets how many points we detet orretly as outliers. In the KDDCup set, if we detet one point in a burst of pakets as an outlier we mark all points in a burst as outliers, as in [6]. The false positive rate is how many normal points we inorretly detet as outliers versus total number of normal points. In Table, we depit the detetion rate ahieved from ODMAD versus the approah in [6] (better rates are in bold). In Table 2 we show the exeution time in seonds for the two approahes. We used window = 40 for all experiments. We experimented with several values for the Otey s approah parameters, and in Table we present the best results (we used: δ = 35; minsup = 50% for the 0% set, and 0% for the entire training set; sore = 2). For our approah we used: upper_sup = minsup = 0%; low_sup = 2%; sore = 0, sore =.27 (0% set); and sore = 0, sore =.8 (entire training set). As an be seen in Table, ODMAD has eual or better detetion rate than Otey s approah for all but two of the attaks on the 0% training set, and all but three of the attaks for the entire training set. Moreover, the detetion rates in Table for the 0% dataset were ahieved with a false positive rate of 4.32% for ODMAD and 6.99% for Otey s, while the detetion rates for the entire training set were ahieved with a false positive rate of 7.09% for ODMAD, and 3.32% for Otey s. Exeution time for our approah is signifiantly faster as well, e.g. ODMAD proessed the KDDCup 0% dataset in 38 seonds while it took Otey s approah 00 minutes for the same task. We attribute this mainly to the fat that the method in [6] reates

7 and heks a ovariane for eah and every possible set of ategorial values, while ODMAD looks at single ategorial values and the mean of their ontinuous ounterparts. We observed similar auray and performane for the KDD Test set, and we also onduted experiments to explore how ODMAD s performane varies with respet to the parameters (results not shown here due to spae). Detetion and false positive rates derease as sore inreases, as it reflets the magnitude of differene between sores in the data. The larger sore is, the higher the sore differene needs to be for a point to be an outlier, and ODMAD will return less and less outliers. Also, the overall results indiate that good values for upper_sup are lose to the value for minsup, and for low_sup lose to -3% deping on the dataset size. 5 Conlusions We proposed Outlier Detetion for Mixed Attribute Datasets (ODMAD), a fast outlier detetion algorithm for mixed attribute data that responds well to sparse highdimensional data. ODMAD identifies outliers based on ategorial attributes first, and then fouses on subsets of data in the ontinuous spae by utilizing information from the ategorial attribute spae. We experimented with the KDDCup 999 dataset, a benhmark outlier detetion dataset, in order to demonstrate the performane of our approah. We found that ODMAD in most instanes exhibits higher outlier detetion rates (auray) and lower false positive rates, ompared to the existing work in the literature [6]. Furthermore, ODMAD relies on two data sans and is onsiderably faster than the ompeting work in [6]. Exting our work for distributed data is the fous of our future work. Aknowledgements: This work was supported in part by NSF grants 03460, , , , , , Referenes [] Bolton, R.J., Hand, D.J., Statistial fraud detetion: A review, Statistial Siene, 7, pp , [2] Hawkins, D. Identifiation of Outliers. Chapman and Hall [3] Ertoz, L., Steinbah, M., Kumar, V. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data, Pro. of ACM Int l Conf. on Knowledge Disovery and Data Mining (KDD), [4] Koufakou, A., Ortiz, E., Georgiopoulos, M., Anagnostopoulos, G., Reynolds, K., A Salable and Effiient Outlier Detetion Strategy for Categorial Data, Int l Conf. on Tools with Artifiial Intelligene ICTAI, Otober, [5] Barnett, V., Lewis, T. Outliers in Statistial Data. John Wiley, 994. [6] Otey, M.E., Ghoting, A., Parthasarathy, A., Fast Distributed Outlier Detetion in Mixed-Attribute Data Sets, Data Mining and Knowledge Disovery, [7] Knorr, E., Ng, R., and Tuakov, V., Distane-based outliers: Algorithms and appliations, Very Large Databases Journal, [8] Bay, S.D. Shwabaher, M., Mining distane-based outliers in near linear time with randomization and a simple pruning rule, Pro. of ACM SIGKDD Int l Conf. on Knowledge Disovery and Data Mining, [9] Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J., LOF: Identifying density-based loal outliers, Pro. of the ACM SIGMOD Int l Conf. on Management of Data, [0] Wei, L., Qian, W., Zhou, A., Jin, W., HOT: Hypergraph-based Outlier Test for Categorial Data, Pro. of 7th Paifi-Asia Conferene on Knowledge Disovery and Data Mining PAKDD, pp , [] Tax, D., Duin, R., Support Vetor Data Desription, Mahine Learning, pp , [2] Harkins, S., He, H., Williams, G., Baster, R., Outlier Detetion Using Repliator Neural Networks, Data Warehousing and Knowledge Disovery, pp , [3] Pei, Y., Zaiane, O., Gao, Y., An Effiient Referenebased Approah to Outlier Detetion in Large Dataset, IEEE Int l Conferene on Data Mining, [4] Hodge, V., Austin, J., A Survey of Outlier Detetion Methodologies, Artifiial Intelligene Review, pp. 85, [5] He, Z., Deng, S., Xu, X., A Fast Greedy algorithm for outlier mining, Proeedings of PAKDD, [6] Agrawal, R., Srikant, R., Fast algorithms for mining assoiation rules, Int l Conf. on Very Large Data Bases, pp , 994. [7] Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U., When is nearest neighbor meaningful?, Int l Conf. Database Theory, pp , 999. [8] Hettih, S., Bay, S. KDDCUP 999 dataset, UCI KDD arhive [9] Blake, C., Merz, C. UCI mahine learning repository. 998.

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

A Scalable and Efficient Outlier Detection Strategy for Categorical Data A Scalable and Efficient Outlier Detection Strategy for Categorical Data A. Koufakou 1 E.G. Ortiz 1 M. Georgiopoulos 1 G.C. Anagnostopoulos 2 K.M. Reynolds 3 1 University of Central Florida, School of

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors

Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors Hello neighbor: aurate objet retrieval with -reiproal nearest neighbors Danfeng Qin Stephan Gammeter Luas Bossard Till Qua,2 Lu van Gool,3 ETH Zürih 2 Kooaba AG 3 K.U. Leuven {ind,gammeter,bossard,tua,vangool}@vision.ee.ethz.h

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

Adapting K-Medians to Generate Normalized Cluster Centers

Adapting K-Medians to Generate Normalized Cluster Centers Adapting -Medians to Generate Normalized Cluster Centers Benamin J. Anderson, Deborah S. Gross, David R. Musiant Anna M. Ritz, Thomas G. Smith, Leah E. Steinberg Carleton College andersbe@gmail.om, {dgross,

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification A New RBFNDDA-KNN Network and Its Appliation to Medial Pattern Classifiation Shing Chiang Tan 1*, Chee Peng Lim 2, Robert F. Harrison 3, R. Lee Kennedy 4 1 Faulty of Information Siene and Tehnology, Multimedia

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Model Based Approach for Content Based Image Retrievals Based on Fusion and Relevancy Methodology

Model Based Approach for Content Based Image Retrievals Based on Fusion and Relevancy Methodology The International Arab Journal of Information Tehnology, Vol. 12, No. 6, November 15 519 Model Based Approah for Content Based Image Retrievals Based on Fusion and Relevany Methodology Telu Venkata Madhusudhanarao

More information

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction University of Wollongong Researh Online Faulty of Informatis - apers (Arhive) Faulty of Engineering and Information Sienes 7 Time delay estimation of reverberant meeting speeh: on the use of multihannel

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs? One Against One or One Against All : Whih One is Better for Handwriting Reognition with SVMs? Jonathan Milgram, Mohamed Cheriet, Robert Sabourin To ite this version: Jonathan Milgram, Mohamed Cheriet,

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition A Coarse-to-Fine Classifiation Sheme for Faial Expression Reognition Xiaoyi Feng 1,, Abdenour Hadid 1 and Matti Pietikäinen 1 1 Mahine Vision Group Infoteh Oulu and Dept. of Eletrial and Information Engineering

More information

An Edge-based Clustering Algorithm to Detect Social Circles in Ego Networks

An Edge-based Clustering Algorithm to Detect Social Circles in Ego Networks JOURNAL OF COMPUTERS, VOL. 8, NO., OCTOBER 23 2575 An Edge-based Clustering Algorithm to Detet Soial Cirles in Ego Networks Yu Wang Shool of Computer Siene and Tehnology, Xidian University Xi an,77, China

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

A Multi-Head Clustering Algorithm in Vehicular Ad Hoc Networks

A Multi-Head Clustering Algorithm in Vehicular Ad Hoc Networks International Journal of Computer Theory and Engineering, Vol. 5, No. 2, April 213 A Multi-Head Clustering Algorithm in Vehiular Ad Ho Networks Shou-Chih Lo, Yi-Jen Lin, and Jhih-Siao Gao Abstrat Clustering

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R.

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R. EngOpt 2008 - International Conferene on Engineering Optimization Rio de Janeiro, Brazil, 01-05 June 2008. Automated System for the Study of Environmental Loads Applied to Prodution Risers Dustin M. Brandt

More information

Facility Location: Distributed Approximation

Facility Location: Distributed Approximation Faility Loation: Distributed Approximation Thomas Mosibroda Roger Wattenhofer Distributed Computing Group PODC 2005 Where to plae ahes in the Internet? A distributed appliation that has to dynamially plae

More information

Creating Adaptive Web Sites Through Usage-Based Clustering of URLs

Creating Adaptive Web Sites Through Usage-Based Clustering of URLs Creating Adaptive Web Sites Through Usage-Based Clustering of URLs Bamshad Mobasher Dept. of Computer Siene, DePaul University, Chiago, IL mobasher@s.depaul.edu Robert Cooley, Jaideep Srivastava Dept.

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

New Fuzzy Object Segmentation Algorithm for Video Sequences *

New Fuzzy Object Segmentation Algorithm for Video Sequences * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 521-537 (2008) New Fuzzy Obet Segmentation Algorithm for Video Sequenes * KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO Department

More information

C 2 C 3 C 1 M S. f e. e f (3,0) (0,1) (2,0) (-1,1) (1,0) (-1,0) (1,-1) (0,-1) (-2,0) (-3,0) (0,-2)

C 2 C 3 C 1 M S. f e. e f (3,0) (0,1) (2,0) (-1,1) (1,0) (-1,0) (1,-1) (0,-1) (-2,0) (-3,0) (0,-2) SPECIAL ISSUE OF IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION: MULTI-ROBOT SSTEMS, 00 Distributed reonfiguration of hexagonal metamorphi robots Jennifer E. Walter, Jennifer L. Welh, and Nany M. Amato Abstrat

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

Relevance for Computer Vision

Relevance for Computer Vision The Geometry of ROC Spae: Understanding Mahine Learning Metris through ROC Isometris, by Peter A. Flah International Conferene on Mahine Learning (ICML-23) http://www.s.bris.a.uk/publiations/papers/74.pdf

More information

A Comprehensive Review of Overlapping Community Detection Algorithms for Social Networks

A Comprehensive Review of Overlapping Community Detection Algorithms for Social Networks International Journal of Engineering Researh and Appliations (IJERA) ISSN: 2248-9622 National Conferene on Advanes in Engineering and Tehnology (AET- 29th Marh 2014) RESEARCH ARTICLE OPEN ACCESS A Comprehensive

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Torpedo Trajectory Visual Simulation Based on Nonlinear Backstepping Control

Torpedo Trajectory Visual Simulation Based on Nonlinear Backstepping Control orpedo rajetory Visual Simulation Based on Nonlinear Bakstepping Control Peng Hai-jun 1, Li Hui-zhou Chen Ye 1, 1. Depart. of Weaponry Eng, Naval Univ. of Engineering, Wuhan 400, China. Depart. of Aeronautial

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller Trajetory Traking Control for A Wheeled Mobile Robot Using Fuzzy Logi Controller K N FARESS 1 M T EL HAGRY 1 A A EL KOSY 2 1 Eletronis researh institute, Cairo, Egypt 2 Faulty of Engineering, Cairo University,

More information

Incremental Mining of Partial Periodic Patterns in Time-series Databases

Incremental Mining of Partial Periodic Patterns in Time-series Databases CERIAS Teh Report 2000-03 Inremental Mining of Partial Periodi Patterns in Time-series Dataases Mohamed G. Elfeky Center for Eduation and Researh in Information Assurane and Seurity Purdue University,

More information

An Interactive-Voting Based Map Matching Algorithm

An Interactive-Voting Based Map Matching Algorithm Eleventh International Conferene on Mobile Data Management An Interative-Voting Based Map Mathing Algorithm Jing Yuan* University of Siene and Tehnology of China Hefei, China yuanjing@mail.ust.edu.n Yu

More information

Stable Road Lane Model Based on Clothoids

Stable Road Lane Model Based on Clothoids Stable Road Lane Model Based on Clothoids C Gakstatter*, S Thomas**, Dr P Heinemann*, Prof Gudrun Klinker*** *Audi Eletronis Venture GmbH, **Leibniz Universität Hannover, ***Tehnishe Universität Münhen

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications Bayesian Belief Networks for Data Mining Harald Stek and Volker Tresp Siemens AG, Corporate Tehnology Information and Communiations 81730 Munih, Germany fharald.stek, Volker.Trespg@mhp.siemens.de Abstrat

More information

Evolutionary Feature Synthesis for Image Databases

Evolutionary Feature Synthesis for Image Databases Evolutionary Feature Synthesis for Image Databases Anlei Dong, Bir Bhanu, Yingqiang Lin Center for Researh in Intelligent Systems University of California, Riverside, California 92521, USA {adong, bhanu,

More information

Batch Auditing for Multiclient Data in Multicloud Storage

Batch Auditing for Multiclient Data in Multicloud Storage Advaned Siene and Tehnology Letters, pp.67-73 http://dx.doi.org/0.4257/astl.204.50. Bath Auditing for Multilient Data in Multiloud Storage Zhihua Xia, Xinhui Wang, Xingming Sun, Yafeng Zhu, Peng Ji and

More information

Deep Rule-Based Classifier with Human-level Performance and Characteristics

Deep Rule-Based Classifier with Human-level Performance and Characteristics Deep Rule-Based Classifier with Human-level Performane and Charateristis Plamen P. Angelov 1,2 and Xiaowei Gu 1* 1 Shool of Computing and Communiations, Lanaster University, Lanaster, LA1 4WA, UK 2 Tehnial

More information

Displacement-based Route Update Strategies for Proactive Routing Protocols in Mobile Ad Hoc Networks

Displacement-based Route Update Strategies for Proactive Routing Protocols in Mobile Ad Hoc Networks Displaement-based Route Update Strategies for Proative Routing Protools in Mobile Ad Ho Networks Mehran Abolhasan 1 and Tadeusz Wysoki 1 1 University of Wollongong, NSW 2522, Australia E-mail: mehran@titr.uow.edu.au,

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

FUZZY WATERSHED FOR IMAGE SEGMENTATION

FUZZY WATERSHED FOR IMAGE SEGMENTATION FUZZY WATERSHED FOR IMAGE SEGMENTATION Ramón Moreno, Manuel Graña Computational Intelligene Group, Universidad del País Vaso, Spain http://www.ehu.es/winto; {ramon.moreno,manuel.grana}@ehu.es Abstrat The

More information

Fast Distribution of Replicated Content to Multi- Homed Clients Mohammad Malli Arab Open University, Beirut, Lebanon

Fast Distribution of Replicated Content to Multi- Homed Clients Mohammad Malli Arab Open University, Beirut, Lebanon ACEEE Int. J. on Information Tehnology, Vol. 3, No. 2, June 2013 Fast Distribution of Repliated Content to Multi- Homed Clients Mohammad Malli Arab Open University, Beirut, Lebanon Email: mmalli@aou.edu.lb

More information

Intra- and Inter-Stream Synchronisation for Stored Multimedia Streams

Intra- and Inter-Stream Synchronisation for Stored Multimedia Streams IEEE International Conferene on Multimedia Computing & Systems, June 17-23, 1996, in Hiroshima, Japan, p 372-381 Intra- and Inter-Stream Synhronisation for Stored Multimedia Streams Ernst Biersak, Werner

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse knn

Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse knn Lazy Updates: An Effiient Tehniue to Continuously onitoring Reverse k uhammad Aamir Cheema, Xuemin Lin, Ying Zhang, Wei Wang, Wenjie Zhang The University of ew South Wales, Australia ICTA, Australia {maheema,

More information

Cluster-based Cooperative Communication with Network Coding in Wireless Networks

Cluster-based Cooperative Communication with Network Coding in Wireless Networks Cluster-based Cooperative Communiation with Network Coding in Wireless Networks Zygmunt J. Haas Shool of Eletrial and Computer Engineering Cornell University Ithaa, NY 4850, U.S.A. Email: haas@ee.ornell.edu

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method Measurement of the stereosopi rangefinder beam angular veloity using the digital image proessing method ROMAN VÍTEK Department of weapons and ammunition University of defense Kouniova 65, 62 Brno CZECH

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

Wide-baseline Multiple-view Correspondences

Wide-baseline Multiple-view Correspondences Wide-baseline Multiple-view Correspondenes Vittorio Ferrari, Tinne Tuytelaars, Lu Van Gool, Computer Vision Group (BIWI), ETH Zuerih, Switzerland ESAT-PSI, University of Leuven, Belgium {ferrari,vangool}@vision.ee.ethz.h,

More information

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models Fast Rigid Motion Segmentation via Inrementally-Complex Loal Models Fernando Flores-Mangas Allan D. Jepson Department of Computer Siene, University of Toronto {mangas,jepson}@s.toronto.edu Abstrat The

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information