Internet Traffic Classification using a Hidden Markov model

Size: px
Start display at page:

Download "Internet Traffic Classification using a Hidden Markov model"

Transcription

1 th International Conference on Hybrid Intelligent Systems Internet Traffic Classification using a Hidden Markov model José Everardo Bessa Maia Department of Statistics and Computing UECE - State University of Ceará Fortaleza - Ceará - Brazil jmaia@uece.br Raimir Holanda Filho Masters Course in Applied Computer Sciences UNIFOR - University of Fortaleza Fortaleza - Ceará - Brazil raimir@unifor.br Abstract This paper examines the performance of a new Hidden Markov Model (HMM) structure used as the core of an Internet traffic classsifier and compares the results against other models present in the literature. Traffic modeling and classification find importance in many areas such as bandwidth management, traffic analysis, prediction and engineering, network planning, Quality of Service provisioning and anomalous traffic detection. The new HMM structure, which takes into account the packet payload size (PS) and the inter-packet times (IPT) sequences, is obtained by concatenation of a first part which is framed with a HMM profile with another part whose structure is that of a fully-connected HMM. The first part captures the specific properties of the initial protocol packets while the second part captures the statistical properties of the whole sequence present in the flow. Models generated are found to increase the accurate in classifying different traffic classes in the analysed dataset. The average accuracy obtained by the classifier is 62.5% having seen only five packets, 80.0% after examining 13 packets and 95.5% after seeing the unidirectional entire flow. Keywords-Internet Traffic Classification; Hidden Markov model. I. INTRODUCTION The ability to accurately classify and identify the network traffic associated with different applications is fundamental to numerous network activities, including network management and security monitoring, traffic modeling and network planning, accounting and Quality of Service provision [1]. It is at the basis of any modern network management platform. Despite the various approaches proposed for this task, no definite answer has been found to date [2]. Real-time classification, independent performance of the network in which the algorithm was trained, completeness and accuracy of the classification are still challenges to be overcome. Because of this fact, and its relevance, traffic classification has become one of the hottest research topics in computer science and telecommunications. Internet traffic classification is a hard task for several reasons. The traditional and direct approaches of relying on transport level protocol ports or on payload inspection have become rapidly unreliable [3] or not feasible [4]. Moreover, in several network scenarios it is quite unrealistic to assume that all the IP traffic classes are known a priori. In these cases, in fact, some network protocols may be known, but novel protocols can appear so giving rise to unknown classes. Additionally, a platform for traffic classification and identification of applications must meet the time constraints of the particular use. For example, a management function requires the identification of the application protocol after being seen only some of the first flow packets while other functions can operate on the complete information of the flow. Moreover, the ranking can be set at different levels of granularity, from a few very broad classes (e.g. interactive, transactional and bulk data classes) through intermediate degrees of discrimination (e.g., application protocols family) to the identification of the application itself (e.g., the application protocol). These aspects together make this an incredibly difficult task and this scenario is pushing the search for alternative techniques. This paper analyzes the performance of a new type of Internet traffic classifier which combines the ideas of previous proposals [2], [5], [6], [7] in the hope of obtaining a model that has the best properties of the original models. This classifier is based on a Hidden Markov Model (HMM) with a new structure. The new HMM structure, which takes into account the packet payload size (PS) and the interpacket times (IPT) sequences, is obtained by concatenation of a first part which is framed with a HMM profile with another part whose structure is that of a fully-connected HMM. The first part captures the specific properties of the initial protocol packets while the second part captures the statistical properties of the whole sequence present in the flow. The new HMM structure is used as the core of an Internet traffic classsifier and is evaluated, and the results produced with the evaluation are compared against other models in the literature. This study was based on the following four broad application classes with their reference applications, commonly found in IP networks. Interactive Class, which is represented by Telnet, CounterStrike(CS) game and HTTP protocols, Bulk data transfer Class, with the FTP-data protocol, Transactional Class, which is present with the HTTPS protocol, and Continuous-Media (CM) Streaming Class, which is represented by RealMedia streaming [8]. These reference applications are clearly within one class [1], are widely used /10/$ IEEE 37

2 and have server ports in the well known port range. The remainder of this paper is organized as follows. Section II presents the related work and places this work in the context of others. Section III presents the HMM model for realizing the classification. Section IV presents the real traffic traces used in the experiments and the measurement procedures. Section V presents evaluations of this technique and a results discussion. Finally, Section VI concludes the paper. II. RELATED WORK The two main techniques used for traffic classification on an IP network, namely, mapping the transport layer source and destination ports in the applications or the payload signatures recognition, become every day less effective or impractical. So much of the research in this area is shifting to the use of statistical and Machine Learning (ML) methods which are independent of the existence of packet fixed parameters or inspection of their contents. These classification techniques rely on the fact that different applications typically have distinct behavior patterns when communicating on a network. The traffic behavior patterns are originating mainly from the used protocol specifications, the application type itself or the user behavior, in the case of interactive applications. The model used in this work to recognize these distinct behavior patterns is based on HMM. HMMs are appropriate models to the approximate matching problem of families of sequences which have different sizes and can record any insertions or absence of some of its elements. These are typical phenomena in the packets sequences constituting a communication flow in IP networks when considering parameters such as PS or IPT. This work is a progression and inspired on those developed in [2], [5], [6], [7] based on statistical properties of IPT or PS sequences (or on joint sequence), present in the flows. In [6], an approach based on profile HMMs has been proposed in which a left-to-right structure for the state topology of the HMM is used. The authors present two classifiers working separately on IPTs or on PSs. In [7], the same authors to account for joint IPT and PS. In fact, the observable variables are one-dimensional and the IPT and PS joint information is taken into account via vector quantization based on K-means. Furthermore, a heuristic technique is used to account for different trace lengths. In these works the PS and IPT variables are discretized and the model considers packets in the two directions. In [2], the proposed model works directly on a twodimensional continuous observable variable, thus exploits IPT and PS joint information without needing any preprocessing like vector quantization. The approach presents a fully-connected HMM structure for the state topology that allows an reduction of the number of states, avoids postprocessing, and although being much less structured than the profile HMMs with respect to the traffic characteristics, is still able to achieve good classification results, as recorded by the authors. The model considers packets in one direction only. Moreover, in [5], based on that applications have different packet sizes for control flows, is proposed a technique which by applying unsupervised clustering (Simple K-Means) to first k-data-packet size vector of each TCP flow provide more than 95% average accuracy to identify traffic in protocol level. Sequences made of only the first 4 to 10 packets were used to train HMMs and to attempt flow classification at an early stage. The Model takes into account the initial packets in both directions and only TCP traffic is considered. The idea in the model presented on the next section is to use profile HMM as the behavior memory of the first packet of a TCP connection and use fully-connected HMM to recognize the global behavior of the flow. The hypothesis here is to separate more efficiently between TCPs and UDP traffics and thus improving the classification results. The results reported in this paper test the simplifying assumption of using only the first package of one of the directions and the values of PS and IPT are quantized using the k-means algorithm. The extension of the model to consider both traffic directions at the same time and a two-dimensional continuous observable variable is under investigation. A large and varied range of other statistical techniques have been applied to the problem of traffic classification in IP networks but are not directly related to this work [9], [10], [11], [12], [13]. Among the latest are the algorithms of Support Vector Machine (SVM) [2] and the classifier ensemble [2]. There are also many efforts that combine packet inspection and ML [14]. Surveys on this subject can be found in [4], [15]. III. THE MODEL Consider sequences of observations O = {o 1...o N, }, N 1, defined over S, o i S R n and a set of admissible classes Ω = {ω 1...ω c } where c = Ω. A sequence O belongs to the space of sequences of length N, S N. The sequence classification problem is to recognize the class ω i of a sequence of observations. A hidden Markov model (HMM) based classifier is a specific type of Bayesian classifier [16] in which the system being modeled is assumed to be a Markov process with unobserved state [17]. Each state has a probability distribution over the possible output simbols. Therefore the sequence of simbols generated by an HMM gives some information about the sequence of states. In a Bayes classifier the decision is based on classification cost: c j (O) = i c ij P (ω i /O) (1) where O is the observed sequence, c ij is the cost of misclassifying an observation in class i to class j and P(ω i /O) 38

3 is the a posteriori probability of i class. The a posteriori probabilities can be calculated using the Bayesian inversion, P (ω i /O) = kp (O/ω i )P (ω i ) (2) which requires the probability distributions of the generated sequences for each class, which are generally unknown. k is a normalization constant. In this work c ij = c = cte, ij and P (ω i ) = p = cte, i. In this Bayesian approach the goal of the HMM model (or any others who wished to use) is to generate estimates of P (O/ω i ) learning to do this from a classified set of available observations. HMM theory will not be covered in detail here; for a comprehensive tutorial, see [17]. Basically, an HMM λ is a 4-tuple λ = (S, A, π, B), where S is the set of states, A is the transition matrix (representing the probabilities of transition between states), π is a vector of initial state probabilities, and B is the emission model, which describes the probability (density or mass) function of symbol emission from each state. The standard HMM-based approach to sequence classification, adoted here, consists in training one HMM for each class, which are subsequently used as class-conditional densities in a standard Bayes classification paradigm. For example, assuming a priori equiprobable classes, an unknown sequence is classified into the class whose model shows the highest probability (likelihood) of having generated this sequence (this is the well-known maximum-likelihood (ML) classification rule) [18]. Thus an unknown sequence O is assigned to the class showing the highest likelihood, i.e. some decisions had to be taken. It is designed for onedimensional observable variables. The two variables, PS and IPT have been quantized on a scale not linear, separately, using the k-means algorithm [18]. Using previous work as a guide and after some experimental trials, PS and 10log 10 (IPT/1µs) (called dbµs in [6]) were each quantized to eight values. The Cartesian product of two sets generated an output alphabet of sixty-four one-dimensional observable symbols. The architecture of the classifier is the one shown in Fig-2. Even though conceptually composed of two parts, this model is operated as a single model both in the training phase and to calculate the likelihood for a sequence in the test phase. Note that no transitions are provided of the second part to the first part. Thus the first part of each model represents a memory of the first packets profile of each protocol family which is not modified by the effect of training on a fully-connected part. During the classification of a test sequence, the likelihood of the first protocol packets is captured and retained in this model part which affect the likelihood of the whole sequence through the composition with the likelihood calculated by the second half. Insert states Delete states Profile HMM part Match states Full connected HMM part Class(O) = arg max i P (O/λ i ) (3) where λ i is the HMM corresponding to the ith class. This requires training C HMMs for a C-class problem. Training was performed using the standard Baum-Welch algorithm [17]. This algorithm is an iterative forwardbackward procedure which just search for model parameters maximizing the probability that the model itself generates the sequences used in the training. The general structure of the HMM model used in the classifier is that shown in Fig-1. It has fixed structure for all applications, with five matching states, four delete states and four insert states in the profile HMM part and five states in the fully-connected HMM part resulting in a total of eighteen states. The classifier architecture for C classes is composed by a bank of C parallel HMMs and a decision block which selects the best estimate for the traffic class as one whose HMM model generated the greatest likelihood for the test sequence. The implementation model used is discrete both in the state and the symbols observed. To make this simple model Figure 1. Proposed HMM model structure. HMM 1 PS stream IPT stream k means symbols quantizers (1.. 64) HMM HMM C Class decision (1.. C) Figure 2. Architecture of the proposed classifier. 39

4 IV. DATA AND MEASUREMENTS In this work the analysis unit is the flow given by 5- tuple: source IP, source port, destination IP, destination port, transport protocol, with a timeout of 60 seconds. Therefore, a model must be built for traffic in each direction. This study considered only traffic in one direction for each host (e.g. packets with port 25 or 80 for SMTP or HTTP, respectively). In Fig-3, was taken into account only traffic exiting from target hosts and reaching the client computer. To build the class HMM models, each HMM in the bank have been obtained via the Baum-Welch training algorithm [17] using only sequences in each class. Each model was trained with a trace containing 400 flows. The algorithm starts from an initial model in which all symbols are equally likely in every state. As is standard, all packets with empty payload and the flows with less than 10 packets heve been excluded both from training and test sets. To evaluate the model, we use data from two packet traces which are collected in laboratory and in a campus network. Ground-truth information is done using a human supervised data verification process. Using database tools flows were filtered by 5-tuple and examined their contents for labeling. All known information such as well-known port numbers and packet payload contents, including the some protocol signatures, were used to identify the application within the flows. Flows whose labeling was unreliable were simply discarded. The Fig-3 is used to describe the measurements for the acquisition of the traces. As shown in this figure, to facilitate the acquisition and labeling of training traces, one client at a time was performed for each application reference in a single computer running this application only. Then each set was visualized and labeled manually in the laboratory using database tools. Note that although the acquisition point to be a isolated router, collected packets through the campus network and therefore the sequence (PS, IPT) obtained is not artificial. In the Fig-3, the Internet router is interfacing directly to the Internet. This traces set is used to obtain HMM models. Were collected a total of 2400 flows, 400 flows of each application. Moreover, the validation traces were collected from a campus network router under regular traffic. The trace file has been pre-processed to separate the target applications flows. Again, each set was visualized and labeled manually in the laboratory. For the less frequent applications, the traffic was induced to a user. Were collected a total of 1200 flows, 200 flows of each application, using tcpdamp. Data were collected at various days and times over 3 months. Flows in the streaming class are made up of small video clips. For performance comparison were also included in the results for 3 statistical classifiers operating in the flow level, namely, minimum-distance to centroid based, 1-NN and Naive Bayes classifiers, also for the other two which preceded this proposal. For this 8 features were extracted from flows. They are: mean and standard deviation of PS, IPT, flow size in packets and flow size in bytes. Computations were performed with the support of WEKA [19] and Matlab [20]. Computer Router Internet Servers Clients: ftp telnet http mail Figure 3. Campus Network Data for trainning and classification, here. Router Campus Network Servers: ftp telnet http mail Diagram of measurements topology. V. RESULTS AND DISCUSSION The validation results are presented in three tables. Accuracy is the percentage of data sets predicted correctly using the models. Table 1 shows the results of the experimental validation of the proposed classifier against five other models used in the literature including the two models which inspired this proposal, HMM Profile (5P-HMM) and Fully- Connected HMM (5F-HMM). The centroid, 1-NN and Naive Bayes classifiers are widely known but a detailed description of each can be found in the references noted in the table. The table describes the classification accuracy when the classification decisions occur after observing the first five packets and the first thirteen (ad hoc chosen) packets of the flow and after watching the whole flow. The decision point in five packages was chosen because it is the memory size of the HMM Profile part and decision point in thirteen packages was determined experimentally as that in which achieved a substantial improvement in ratings from the previous decision. The results show the superior performance of the proposed mixed model (5-Profile+5-Fully HMM). The last 3 rows of the table seem to confirm the working hypothesis that the behavior memory of the first packets brings the quality of the final decision. For comparison, note that with six classes the random hit rate would be 16.66% while the algorithm 5P5F-HMM achieves 62.5% had seen only 5 packets of the unidirectional flows. Table 2 shows the performance of each classifier per traffic class. Note that the gain by the new model is consistent and overcomes the other classifiers in all classes. Only for mail and telnet applications the accuracy is below 94%. The biggest advantage is 15.0% for streaming class and the lowest advantage is 8.0% for http and telnet classes, compared to 5P-HMM and 5F-HMM, favorable to the new model. Table 3 is the Confusion Matrix produced by the new classifier for the test dataset. Note the values on the diagonal 40

5 Table I CLASSIFICATION RESULTS: AVERAGE ACCURACY - PERCENT CORRECTLY PREDICTED. (TOTAL: 1200 FLOWS) Classifier 5-Packets 13-Packets Flow % % % Centriod Class.(CC)[17] NN [5] Naive Bayes(NB) [14] Fully-connected HMM Profile HMM Profile+5-Fully HMM Table II CLASSIFICATION RESULTS: ACCURACY PER APPLICATION, AFTER SEEING THE WHOLE FLOW, IN PERCENT.(TOTAL: 1200 FLOWS) Class. http telnet ftp mail stream CS av. CC NN NB F P P5F that the lower rate of correct classification is 92.5%, higher than the rates obtained in [2], 90.23%, and in [6], 81.69%. The largest share of confusion (3.5%) is one in which mail is classified as telnet. From this table, all other performance measures, such as false positive and false negative can be derived. Table III CLASSIFICATION RESULTS: CONFUSION MATRIX, AFTER SEEING THE WHOLE FLOW, IN PERCENT. http telnet ftp mail stream CS http telnet ftp mail stream CS Tables 1, 2 and 3 indicate clear advantages of the proposed model. The average accuracy obtained by the classifier is 62.5% having seen five packets, 80.0% after examining 13 packets and 95.5% after seeing the unidirectional entire flow. Although standard deviations were not included, all results presented are confirmed by average values obtained in 30 executions in the way of cross-validation with randomly selected 20% of flows for validation and 80% for training, using the mix of the all traces. It is worth noticing that in this work the implementation of the 5P-HMM and 5F-HMM models used here for comparison are one-dimensional, the traffic is uni-directional and adopts a set of sixty-four discrete symbols which differentiates them from similar models who inspired them in [2] and [6]. The superior results presented in Table 3, when compared with those obtained in [2] and [6] may not be conclusive because of the different data and models used. Unfortunately the scientific community does not yet have repositories of data to benchmark of classification traffic algorithms. However they clearly show, on the considered comparison basis, the hybrid model proposed here provides gains with respect to the previous mentioned. VI. CONCLUSION Internet traffic classification finds importance in many areas such as bandwidth management, Quality of Service provisioning, network security and anomalous traffic detection. This paper proposed and analyzed a new type of Internet traffic classifier which combines the ideas of previous proposals [2], [6] in the hope of obtaining a model that has the best properties of the original models. The experimental validation of this new model shows an improvement of the classification accuracy when compared with other methods, under the same conditions, for the analyzed dataset. Despite the promising results already obtained with this model, this research continues in some directions. Evaluations using large traffic traces from different network locations is the next step of this work. In addition, variations that can optimize the performance of the classifier should also be tested. For example, consider the flows in both directions, using continuous probability distributions and investigate the best combination of number of states in the two parts of the HMM (HMM Profile part and Fullyconnected HMM part) for each application can significantly improve the results already obtained. These aspects are under investigation and will be presented in a future paper. In addition, further work is under progress in two directions. First, we are implementing sequential decision in which the classifier updates the classification decision on each new packet received from the stream. In another line, which will include a third part to the classification model, based on statistical characteristics: a classifier based on aggregate statistics of the whole flow (Naive Bayes). In this new classifier, the decision about the class that owns the flow can occur at three different times depending on the degree of confidence previously found by the classifier, which may also have changed their decision at the end of the flow. The first turning point occurs after the first five packets are seen, based on HMM-Profile, which captures the specific features of the first protocol packets, the second turning point occurs after seeing the whole flow, based on HMM-Full, which captures the properties of the whole sequence of packets in the flow, and in the third time the classifier calculates some aggregate statistics for the flow and re-evaluate the decision. As already anticipated, depending on the advisability of implementing the decision taken in one stage could be announced and yet, corrected by the 41

6 next stages. The evidence and confidence in the three stages are combined cumulatively for decision making and not in isolation. Are also being carried out tests using the two parts as separate classifiers (in parallel) and calculating the sequence likelihood as a linear combination of the two. This classification mode also allows a balance between the likelihood of the first protocol packets and that of the whole sequence. This paper does not present results for this classification method. REFERENCES [1] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, Classof-service mapping for qos: a statistical signature-based approach to ip traffic classification, in Proc. the 4th ACM SIGCOMM, p , [2] A. Dainotti, W. Donato, A. Pescape, and P. S. Rossi, Classification of network traffic via packet-level hidden markov models, in IEEE GLOBECOM 2008, [14] W. Li, M. Canini, A. W. Moore, and R. Bolla, Efficient application identification and the temporal and spatial stability of classification schema, Computer Networks, vol. 56, no. 3, pp , [15] T. T. Nguyen and G. Armitage, A survey of techniques for internet traffic classification using machine learning, IEEE Communications Surveys and Tutorials, [16] K. Fukunaga, Introduction to Statistical Pattern Recognition. 2nd Edition. New York: Academic Press, [17] L. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Procs. IEEE, vol. 77, no. 2, pp , [18] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. 2nd Edition. New York: Wiley, [19] Waikato, Weka: Data mining software in java, [20] Matworks, Matlab, [3] A. W. Moore and K. Papagiannaki, Toward the accurate identification of network applications, in PAM 2005, [4] H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, Internet traffic classification demystified: Myths, caveats, and the best practices, in ACM CoNEXT 2008, [5] L. Bernaille, R. Teixeira, and K. Salamatian, Early application identification, in ACM CoNEXT 2006, [6] C. Wright, F. Monrose, and G. Masson, Hmm profiles for network traffic classification, in VizSEC/DMSEC, pp. 9 15, [7], Towards better protocol identification using profile hmms, in JHU Tech. Rep. JHU-SPAR051201, [8] H. Sun, A. Vetro, and J. Xin, An overview of scalable video streaming: Research articles, Wireless Comm. and Mobile Computing, vol. 7, pp , [9] A. Moore and D. Zuev, Internet traffic classification using bayesian analysis techniques, In ACM SIGMETRICS 2005, [10] G. Szabo, D. Orincsay, S. Malomsoky, and I. Szabo, On the validation of traffic classification algorithms, In PAM 2008, [11] B.-C. Park, Y. J. Win, M.-S. Kim, and J. W. Hong, Towards automated application signature generation for traffic identification, In NOMS 2008, [12] J. Erman, M. Arlitt, and A. Mahanti, Traffic classification using clustering algorithms, In SIGCOMM 2006, [13] M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli, Traffic classification through simple statistical fingerprinting, SIG- COMM Comput. Commun. Rev., vol. 37, no. 1, pp. 5 16,

Can we trust the inter-packet time for traffic classification?

Can we trust the inter-packet time for traffic classification? Can we trust the inter-packet time for traffic classification? Mohamad Jaber, Roberto G. Cascella and Chadi Barakat INRIA Sophia Antipolis, EPI Planète 2004, Route des Luciolles Sophia Antipolis, France

More information

Efficient Flow based Network Traffic Classification using Machine Learning

Efficient Flow based Network Traffic Classification using Machine Learning Efficient Flow based Network Traffic Classification using Machine Learning Jamuna.A*, Vinodh Ewards S.E** *(Department of Computer Science and Engineering, Karunya University, Coimbatore-114) ** (Assistant

More information

Online Traffic Classification Based on Sub-Flows

Online Traffic Classification Based on Sub-Flows Online Traffic Classification Based on SubFlows Victor Pasknel de A. Ribeiro, Raimir Holanda Filho Master s Course in Applied Computer Sciences University of Fortaleza UNIFOR Fortaleza Ceará Brazil paskel@unifor.br,

More information

Keywords Traffic classification, Traffic flows, Naïve Bayes, Bag-of-Flow (BoF), Correlation information, Parametric approach

Keywords Traffic classification, Traffic flows, Naïve Bayes, Bag-of-Flow (BoF), Correlation information, Parametric approach Volume 4, Issue 3, March 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

Tunneling Activities Detection Using Machine Learning Techniques

Tunneling Activities Detection Using Machine Learning Techniques Fabien Allard 1, Renaud Dubois 1, Paul Gompel 2 and Mathieu Morel 3 1 Thales Communications 160 Boulevard de Valmy BP 82 92704 Colombes Cedex FRANCE firstname.lastname@fr.thalesgroup.com 2 pgompel@gmail.com

More information

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Timothy Glennan, Christopher Leckie, Sarah M. Erfani Department of Computing and Information Systems,

More information

Identify P2P Traffic by Inspecting Data Transfer Behaviour

Identify P2P Traffic by Inspecting Data Transfer Behaviour Identify P2P Traffic by Inspecting Data Transfer Behaviour Mingjiang Ye, Jianping Wu,KeXu,DahMingChiu 2 Department of Computer Science, Tsinghua University, Beijing, 84, P.R.China yemingjiang@csnet.cs.tsinghua.edu.cn,

More information

An Analysis of UDP Traffic Classification

An Analysis of UDP Traffic Classification An Analysis of UDP Traffic Classification 123 Jing Cai 13 Zhibin Zhang 13 Xinbo Song 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 Graduate University of Chinese Academy

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Classification of Traffic Flows into QoS Classes by Unsupervised Learning and KNN Clustering

Classification of Traffic Flows into QoS Classes by Unsupervised Learning and KNN Clustering KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 3, NO. 2, April 2009 134 Copyright c 2009 KSII Classification of Traffic Flows into QoS Classes by Unsupervised Learning and KNN Clustering Yi

More information

On the Stability of the Information Carried by Traffic Flow Features at the Packet Level

On the Stability of the Information Carried by Traffic Flow Features at the Packet Level On the Stability of the Information Carried by Traffic Flow Features at the Packet Level Alice Este, Francesco Gringoli, Luca Salgarelli DEA, Università degli Studi di Brescia, Italy Email: @ing.unibs.it

More information

Tunneling Activities Detection Using Machine Learning Techniques

Tunneling Activities Detection Using Machine Learning Techniques Paper Tunneling Activities Detection Using Machine Learning Techniques Fabien Allard, Renaud Dubois, Paul Gompel, and Mathieu Morel, Colombes Cedex, France Abstract Tunnel establishment, like HTTPS tunnel

More information

Statistical based Approach for Packet Classification

Statistical based Approach for Packet Classification Statistical based Approach for Packet Classification Dr. Mrudul Dixit 1, Ankita Sanjay Moholkar 2, Sagarika Satish Limaye 2, Devashree Chandrashekhar Limaye 2 Cummins College of engineering for women,

More information

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017 Internet Traffic Classification Using Machine Learning Tanjila Ahmed Dec 6, 2017 Agenda 1. Introduction 2. Motivation 3. Methodology 4. Results 5. Conclusion 6. References Motivation Traffic classification

More information

A Hybrid Approach for Accurate Application Traffic Identification

A Hybrid Approach for Accurate Application Traffic Identification A Hybrid Approach for Accurate Application Traffic Identification Thesis Defence December 21, 2005 Young J. Won yjwon@postech.ac.kr Distributed Processing & Network Management Lab. Dept. of Computer Science

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Identify P2P Traffic by Inspecting Data Transfer Behaviour

Identify P2P Traffic by Inspecting Data Transfer Behaviour Identify P2P Traffic by Inspecting Data Transfer Behaviour Mingjiang Ye, Jianping Wu, Ke Xu, Dah Ming Chiu 2 Tsinghua National Laboratory for Information Science and Technology, Department of Computer

More information

A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification

A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification A Preliminary Performance Comparison of Two Feature Sets for Encrypted Traffic Classification Riyad Alshammari and A. Nur Zincir-Heywood Dalhousie University, Faculty of Computer Science {riyad,zincir}@cs.dal.ca

More information

Early Application Identification

Early Application Identification Early Application Identification Laurent Bernaille Renata Teixeira Kave Salamatian Université Pierre et Marie Curie - LIP6/CNRS Which applications run on my network? Internet Edge Network (campus, enterprise)

More information

Packet Classification in Co-mingled Traffic Streams

Packet Classification in Co-mingled Traffic Streams Packet Classification in Co-mingled Traffic Streams Siddharth Maru, Timothy X Brown Dept. of Electrical, Computer and Energy Engineering University of Colorado at Boulder, CO 80309-0530 {siddharth.maru,timxb}@colorado.edu

More information

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation. Volume 3, Issue 12, December 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

BitTorrent Traffic Classification

BitTorrent Traffic Classification BitTorrent Traffic Classification Atwin O. Calchand, Van T. Dinh, Philip Branch, Jason But Centre for Advanced Internet Architectures, Technical Report 090227A Swinburne University of Technology Melbourne,

More information

Traffic Classification through Joint Distributions of Packet-level Statistics

Traffic Classification through Joint Distributions of Packet-level Statistics Traffic Classification through Joint Distributions of Packet-level Statistics Alberto Dainotti and Antonio Pescapé University of Napoli Federico II (Italy) Email: {alberto,pescape}@unina.it Hyun-chul Kim

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Normalized Texture Motifs and Their Application to Statistical Object Modeling

Normalized Texture Motifs and Their Application to Statistical Object Modeling Normalized Texture Motifs and Their Application to Statistical Obect Modeling S. D. Newsam B. S. Manunath Center for Applied Scientific Computing Electrical and Computer Engineering Lawrence Livermore

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Machine Learning based Traffic Classification using Low Level Features and Statistical Analysis

Machine Learning based Traffic Classification using Low Level Features and Statistical Analysis Machine Learning based Traffic using Low Level Features and Statistical Analysis Rajesh Kumar M.Tech Scholar PTU Regional Center (SBBSIET) Jalandhar, India TajinderKaur Assistant Professor SBBSIET Padhiana

More information

Automated Traffic Classification and Application Identification using Machine Learning. Sebastian Zander, Thuy Nguyen, Grenville Armitage

Automated Traffic Classification and Application Identification using Machine Learning. Sebastian Zander, Thuy Nguyen, Grenville Armitage Automated Traffic Classification and Application Identification using Machine Learning Sebastian Zander, Thuy Nguyen, Grenville Armitage {szander,tnguyen,garmitage}@swin.edu.au Centre for Advanced Internet

More information

Early traffic classification using Support Vector Machines

Early traffic classification using Support Vector Machines Early traffic classification using Support Vector Machines Gabriel Gómez Sena Facultad de Ingeniería Universidad de la República Montevideo, Uruguay ggomez@fing.edu.uy Pablo Belzarena Facultad de Ingeniería

More information

Network Management without Payload Inspection: Application Classification via Statistical Analysis of Bulk Flow Data

Network Management without Payload Inspection: Application Classification via Statistical Analysis of Bulk Flow Data Future Network and MobileSummit 2012 Conference Proceedings Paul Cunningham and Miriam Cunningham (Eds) IIMC International Information Management Corporation, 2012 ISBN: 978-1-905824-29-8 Network Management

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

Using Visual Motifs to Classify Encrypted Traffic

Using Visual Motifs to Classify Encrypted Traffic Using Visual Motifs to Classify Encrypted Traffic VizSEC'06 - November 3, 2006 Charles V Wright Fabian Monrose Gerald M Masson Johns Hopkins University Information Security Institute Traffic Classification:

More information

Investigating Two Different Approaches for Encrypted Traffic Classification

Investigating Two Different Approaches for Encrypted Traffic Classification Investigating Two Different Approaches for Encrypted Traffic Classification Riyad Alshammari and A. Nur Zincir-Heywood Faculty of Computer Science, Dalhousie University 6050 University Avenue Halifax,

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Generalization and Optimization of Feature Set for Accurate Identification of P2P Traffic in the Internet using Neural Network

Generalization and Optimization of Feature Set for Accurate Identification of P2P Traffic in the Internet using Neural Network Generalization and Optimization of Feature Set for Accurate Identification of P2P Traffic in the Internet using Neural Network S. AGRAWAL, B.S. SOHI Department of Electronics & Communication Engineering

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

Color-Based Classification of Natural Rock Images Using Classifier Combinations

Color-Based Classification of Natural Rock Images Using Classifier Combinations Color-Based Classification of Natural Rock Images Using Classifier Combinations Leena Lepistö, Iivari Kunttu, and Ari Visa Tampere University of Technology, Institute of Signal Processing, P.O. Box 553,

More information

Network Traffic Classification Using Correlation Information

Network Traffic Classification Using Correlation Information 104 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 1, JANUARY 2013 Network Traffic Classification Using Correlation Information Jun Zhang, Member, IEEE, Yang Xiang, Member, IEEE, Yu

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Use of Multi-category Proximal SVM for Data Set Reduction

Use of Multi-category Proximal SVM for Data Set Reduction Use of Multi-category Proximal SVM for Data Set Reduction S.V.N Vishwanathan and M Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560 012, India Abstract.

More information

Internet Traffic Classification using Machine Learning

Internet Traffic Classification using Machine Learning Internet Traffic Classification using Machine Learning by Alina Lapina 2018, UiO, INF5050 Alina Lapina, Master student at IFI, Full stack developer at Ciber Experis 2 Based on Thuy T. T. Nguyen, Grenville

More information

Generalization of Signatures for SSH Encrypted Traffic Identification

Generalization of Signatures for SSH Encrypted Traffic Identification Generalization of Signatures for SSH Encrypted Traffic Identification Riyad Alshammari and A. Nur Zincir-Heywood Faculty of Computer Science, Dalhousie University 6050 University Avenue Halifax, NS, Canada

More information

Visualization of Internet Traffic Features

Visualization of Internet Traffic Features Visualization of Internet Traffic Features Jiraporn Pongsiri, Mital Parikh, Miroslova Raspopovic and Kavitha Chandra Center for Advanced Computation and Telecommunications University of Massachusetts Lowell,

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

BLINC: Multilevel Traffic Classification in the Dark

BLINC: Multilevel Traffic Classification in the Dark BLINC: Multilevel Traffic Classification in the Dark Thomas Karagiannis, UC Riverside Konstantina Papagiannaki, Intel Research Cambridge Michalis Faloutsos, UC Riverside The problem of workload characterization

More information

ABSTRACT. 1. Introduction. identificationn. remotely. P2P applications need hard to. most exciting. areas of Inter- centralized to. system.

ABSTRACT. 1. Introduction. identificationn. remotely. P2P applications need hard to. most exciting. areas of Inter- centralized to. system. Journal of Applied Mathematics and Physics,, 2013, 1, 56-62 http://dx.doi.org/10.4236/jamp..2013.14011 Published Online October 2013 (http://www.scirp.org/journal/jamp) EPFIA: Extensible P2P Flows Identification

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

Heuristics to Classify Internet Backbone Traffic based on Connection Patterns

Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Department of Computer Science and Engieneering Chalmers University of Technolgy Göteborg,

More information

Active Build-Model Random Forest Method for Network Traffic Classification

Active Build-Model Random Forest Method for Network Traffic Classification Active Build-Model Random Forest Method for Network Traffic Classification Alhamza Munther #1, Rozmie Razif #2, Shahrul Nizam #3, Naseer Sabri #4, Mohammed Anbar *5 #1, 2, 3, 4 School of Computer and Communication

More information

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION * Prof. Dr. Ban Ahmed Mitras ** Ammar Saad Abdul-Jabbar * Dept. of Operation Research & Intelligent Techniques ** Dept. of Mathematics. College

More information

Flow-based Anomaly Intrusion Detection System Using Neural Network

Flow-based Anomaly Intrusion Detection System Using Neural Network Flow-based Anomaly Intrusion Detection System Using Neural Network tational power to analyze only the basic characteristics of network flow, so as to Intrusion Detection systems (KBIDES) classify the data

More information

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE Fei He 1, 2, Fan Xiang 1, Yibo Xue 2,3 and Jun Li 2,3 1 Department of Automation, Tsinghua University, Beijing, China

More information

1. INTRODUCTION. AMS Subject Classification. 68U10 Image Processing

1. INTRODUCTION. AMS Subject Classification. 68U10 Image Processing ANALYSING THE NOISE SENSITIVITY OF SKELETONIZATION ALGORITHMS Attila Fazekas and András Hajdu Lajos Kossuth University 4010, Debrecen PO Box 12, Hungary Abstract. Many skeletonization algorithms have been

More information

Can Passive Mobile Application Traffic be Identified using Machine Learning Techniques

Can Passive Mobile Application Traffic be Identified using Machine Learning Techniques Dublin Institute of Technology ARROW@DIT Dissertations School of Computing 2015-03-10 Can Passive Mobile Application Traffic be Identified using Machine Learning Techniques Peter Holland Dublin Institute

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

A Performance Evaluation of HMM and DTW for Gesture Recognition

A Performance Evaluation of HMM and DTW for Gesture Recognition A Performance Evaluation of HMM and DTW for Gesture Recognition Josep Maria Carmona and Joan Climent Barcelona Tech (UPC), Spain Abstract. It is unclear whether Hidden Markov Models (HMMs) or Dynamic Time

More information

A NEW HYBRID APPROACH FOR NETWORK TRAFFIC CLASSIFICATION USING SVM AND NAÏVE BAYES ALGORITHM

A NEW HYBRID APPROACH FOR NETWORK TRAFFIC CLASSIFICATION USING SVM AND NAÏVE BAYES ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Robust Network Traffic Classification

Robust Network Traffic Classification IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 23, NO. 4, AUGUST 2015 1257 Robust Network Traffic Classification Jun Zhang, Member, IEEE, XiaoChen, Student Member, IEEE, YangXiang, Senior Member, IEEE, Wanlei

More information

Face recognition using Singular Value Decomposition and Hidden Markov Models

Face recognition using Singular Value Decomposition and Hidden Markov Models Face recognition using Singular Value Decomposition and Hidden Markov Models PETYA DINKOVA 1, PETIA GEORGIEVA 2, MARIOFANNA MILANOVA 3 1 Technical University of Sofia, Bulgaria 2 DETI, University of Aveiro,

More information

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand

More information

Temporal Modeling and Missing Data Estimation for MODIS Vegetation data

Temporal Modeling and Missing Data Estimation for MODIS Vegetation data Temporal Modeling and Missing Data Estimation for MODIS Vegetation data Rie Honda 1 Introduction The Moderate Resolution Imaging Spectroradiometer (MODIS) is the primary instrument on board NASA s Earth

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

A Data Classification Algorithm of Internet of Things Based on Neural Network

A Data Classification Algorithm of Internet of Things Based on Neural Network A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Scalable Coding of Image Collections with Embedded Descriptors

Scalable Coding of Image Collections with Embedded Descriptors Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

Improving Time Series Classification Using Hidden Markov Models

Improving Time Series Classification Using Hidden Markov Models Improving Time Series Classification Using Hidden Markov Models Bilal Esmael Arghad Arnaout Rudolf K. Fruhwirth Gerhard Thonhauser University of Leoben TDE GmbH TDE GmbH University of Leoben Leoben, Austria

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks

Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks Thuy T.T. Nguyen, Grenville Armitage Centre for Advanced Internet Architectures Swinburne University

More information

Computer Communications

Computer Communications Computer Communications 33 (2) 4 5 Contents lists available at ScienceDirect Computer Communications journal homepage: www.elsevier.com/locate/comcom Identify P2P traffic by inspecting data transfer behavior

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method. IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED ROUGH FUZZY POSSIBILISTIC C-MEANS (RFPCM) CLUSTERING ALGORITHM FOR MARKET DATA T.Buvana*, Dr.P.krishnakumari *Research

More information

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known

More information

Using Hidden Markov Models to analyse time series data

Using Hidden Markov Models to analyse time series data Using Hidden Markov Models to analyse time series data September 9, 2011 Background Want to analyse time series data coming from accelerometer measurements. 19 different datasets corresponding to different

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,

More information

Associating Terms with Text Categories

Associating Terms with Text Categories Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science

More information

Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models

Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models Manuele Bicego 1, Enrico Grosso 1, and Massimo Tistarelli 2 1 DEIR - University

More information

2. Basic Task of Pattern Classification

2. Basic Task of Pattern Classification 2. Basic Task of Pattern Classification Definition of the Task Informal Definition: Telling things apart 3 Definition: http://www.webopedia.com/term/p/pattern_recognition.html pattern recognition Last

More information

Traffic Classification Using Visual Motifs: An Empirical Evaluation

Traffic Classification Using Visual Motifs: An Empirical Evaluation Traffic Classification Using Visual Motifs: An Empirical Evaluation Wilson Lian 1 Fabian Monrose 1 John McHugh 1,2 1 University of North Carolina at Chapel Hill 2 RedJack, LLC VizSec 2010 Overview Background

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Det De e t cting abnormal event n s Jaechul Kim

Det De e t cting abnormal event n s Jaechul Kim Detecting abnormal events Jaechul Kim Purpose Introduce general methodologies used in abnormality detection Deal with technical details of selected papers Abnormal events Easy to verify, but hard to describe

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Object of interest discovery in video sequences

Object of interest discovery in video sequences Object of interest discovery in video sequences A Design Project Report Presented to Engineering Division of the Graduate School Of Cornell University In Partial Fulfillment of the Requirements for the

More information