Network Traffic Classification by Common Subsequence Finding

Size: px
Start display at page:

Download "Network Traffic Classification by Common Subsequence Finding"

Transcription

1 Network Traffic Classification by Common Subsequence Finding Krzysztof Fabjański and Tomasz Kruk NASK, The Research Division, Wąwozowa 18, Warszawa, Poland Abstract. The paper describes issues related to network traffic analysis. The scope of this article includes discussion regarding the problem of network traffic identification and classification. Furthermore, paper presents two bioinformatics methods: Clustal and Center Star. Both methods were precisely adapted to the network security purpose. In both methods, the concept of extraction of a common subsequence, based on multiple sequence alignment of more than two network attack signatures, was used. This concept was inspired by bioinformatics solutions for the problems related to finding similarities in a set of DNA, RNA or amino acids sequences. Additionally, the scope of the paper includes detailed description of test procedures and their results. At the end some relevant evaluations and conclusions regarding both methods are presented. Keywords: network traffic analysis, anomaly detection, network intrusion detection systems, common subsequence finding, bioinformatics algorithms, Clustal algorithm, Center Star method, automated generation of network attack signatures 1 Introduction The Internet became one of the most popular tools used by almost everyone. It is important to mention that the Internet and the World Wide Web (WWW) are not synonymous. The World Wide Web is one of the many services available in the Internet. The Internet consists of an enormous number of computer networks. Therefore, the issue regarding network security is so important. The network security issue is not only a set of security methods required for ensuring safety. It also consists of elements related to network security policy which should be obeyed. Different institutions and companies are introducing their private security policies. Often, security policies are performed according to some known standards. Unfortunately, this approach does not guarantee that the precious resources will remain unaffected. Other, more sophisticated methods should be introduced. One of the most recognized families of systems are network intrusion detection systems. This group of systems allows to alert about unwanted and malicious activity registered in the network flow. The process of identifying a malicious network flow involves comparing the network flow content with a predefined set of rules. The set of rules, sometimes known as M. Bubak et al. (Eds.): ICCS 28, Part I, LNCS 511, pp , 28. c Springer-Verlag Berlin Heidelberg 28

2 5 K. Fabjański and T. Kruk well as a set of network attack signatures, describes different Internet threats by mapping their content into the specific format. Despite of the malicious flow method identification, there are many new Internet threats which have not been discovered yet. Fortunately there are methods and heuristic approaches which allow to identify new Internet threats by following different network trends and statistics. Although those methods are very promising, still there is a huge requirement for new algorithms. Those algorithms should be capable of analysing a huge portion of attack signatures for network intrusion detection systems, produced in an automatic manner. In order to support this process, some new approaches were proposed. One of the ways allowing to analyse the attack signature collections is the bioinformatics approach. Multiple sequence alignment is a fundamental tool in bioinformatics analysis. This tool allows to find similarities embedded in a set of DNA, RNA or amino acids sequences. The bioinformatics approach can be adapted to the network traffic identification and classification problem. The second section of this article presents different systems for network traffic analysis. Section three develops briefly two bioinformatics methods: Center Star and Clustal. The fourth section includes various test results. The last section discusses algorithm complexity and their suitability in network traffic analysis. 2 Network Traffic Classification and Identification Problem Computer threats are often a reason of unwanted incidents, which might cause irreversible damage to the system. From the scientific point of view, computer threats use certain vulnerabilities, therefore threats, vulnerabilities and exposures should be considered as a disjointed issue. One of the most popular and widely present group of Internet threats is Internet Worm [1]. Intrusion detection systems (IDS) [2] detect mainly malicious network flow by analyzing its content. An example of IDS are network intrusion detection systems (NIDS). NIDS are able to detect many types of malicious network traffic including worms. One of the most popular NIDS is Snort. It is an open source program available for various architectures. It is equipped with the regular expression engine which enhance the network traffic analysis. It analyses the network flow by comparing its content (not only a payload) with a specific list of rules. During this process, Snort utilises the regular expression engine. As a result of this analysis, Snort makes a decision regarding a particular network flow, whether it is malicious or regular. An example of simple Snort rule is shown in the table (Table 1). Table 1. An exemplary Snort rule alert udp $EXTERNAL_NET 22 -> $HTTP_SERVERS 22 ( msg:"misc slapper worm admin traffic"; content:" E "; depth:1; reference:url,isc.incidents.org/analysis.html?id=167; reference:url, classtype:trojan-activity; sid:1889; rev:5;)

3 Network Traffic Classification by Common Subsequence Finding 51 Snort works as a one thread application. Its action is to receive, decode and analyse the incoming packets. Snort allows us to identify unwanted malicious network flows by generating appropriate alerts. The main problem is that if the set of rules for Snort has a poor quality, we can expect many false positive or false negative alerts. Therefore classification of a network attack signature as well as improving their quality is a matter of great importance. Very often NIDS are combined with systems for automated generationof network attack signatures, such as Honeycomb [3,4]. Tools which join functions of NIDS and automated signature creation system are known as network early warning systems (NEWS). An exemplary network early warning system is Arakis [5]. NEWS develop very sophisticated methods for network traffic classification in order to speed up the process of identification of potential new Internet threats. Main problem concerning classification and identification of network flows is related to extraction of common regions from the network attack signatures sets [6]. Many techniques were developed. One of the techniques allowing the network security specialists to distinguish the regular network flow from the suspicious one, is usage of honeypots [7]. Honeypot is a specially designed system which simulates some network resources in order to capture the malicious flow. Generally, it consists of a part of an isolated, unprotected and monitored network with some virtually simulated computers which seem to have a valuable resources or information. Therefore, flow which occurs inside the honeypot is assumed to be malicious by the definition. Protocol analysis and patterndetection techniques performed on flows collected by honeypots result in network attack signatures generation. Generation of network attack signatures is mainly based on the longest common substring extraction [4,8]. One of the tools allowing generation of network attack signatures is Honeycomb. 3 Sequence Alignment Sequence alignment is a tool that can be used for extraction of common regions from the set of network attack signatures [6]. Extraction of common regions is shown in the figure (Fig. 1). It is somewhat similar to the biologist task. The biologist identifies newly discovered genes by comparing them to the family of genes whose function is already known. Comparison is performed by assigning those newly discovered genes to the known families by common subsequence finding. Problem of extraction of common regions from the network attack signature set is actually the multiple sequence alignment (MSA) [9] problem. The MSA is a generalization of the pairwise alignment [1]. Insertion of gaps is performed into each string so that resulting strings have equal length. Although the problem of multiple sequence GET / HTTP GET /a/a.htm HTTP GET / HTTP/1.1 Fig. 1. Problem of the longest common subsequence finding

4 52 K. Fabjański and T. Kruk alignment is an NP-complete task, there are many heuristics, probabilistic and other approaches that cope with that issue. A specific classification of those methods was proposed in [1]. Among so many algorithms, two classical approaches where chosen. The first algorithm and probably the most basic is a Center Star method. It was chosen for network traffic identification purpose. The main goal of this adaptation was to check whether this method can be used for network attack signature common region extraction. The second algorithm that was required for classification of network attack signatures is Clustal. It is worth to mention that in both algorithms a global alignment was used. Global alignment was computed using Needleman-Wunsch [2] algorithm. 3.1 Center Star Method The Center Star method [11] is classified to the group of algorithms with some elements of approximation. As it was mentioned before, multiple sequence alignment is an NP-complete problem. Presented method, Center Star, is an approximation of multisequence alignment. Thus, expected results can provide, but do not have to provide optimal solutions. The Center Star method consists of three main steps. Detailed description of Center Star method can be found in [11]. 3.2 Clustal Algorithm Clustering is the method which classifies particular objects into appropriate groups (clusters). Classification is performed based on the defined distance measurement technique. Every object from a single cluster should share a common trait. Data clustering is widely used in many science fields. We can find it in data mining, pattern recognition or bioinformatics. Data clustering algorithms can be divided into two main categories: hierarchical methods. latexdeschierarchical methods assign objects to the particular clusters by measuring the distance between them. In partitioning approach, new clusters are generated and then recomputing of the new cluster centers is performed, partitioning algorithms. latexdescpartitioning algorithms start with an initial partition and then by iterative control strategy optimize an objective function. Every cluster is represented by the gravity center or by its center object. In hierarchical methods, in turn, we can distinguish two different types: agglomerative. we begin the clustering procedure with each element as a separate cluster. Merging them into larger clusters, we come to the point where all elements can be classified into one big cluster, divisive. starts the process from one big set and then, divides it into successively smaller subsets Clustal is an example of agglomerative algorithm, also know as bottom-up approach. During implementation of Clustal algorithm some modification were performed. Modification were introduced in order to adapt this method for network attack signature classification purpose. Instead of profile representation of internal nodes in the dendrogram, consensus sequence was used. This was caused mainly by the fact that so far the

5 Network Traffic Classification by Common Subsequence Finding 53 scoring scheme used for network traffic classification has a very basic structure. Assuming that network flow can be represented as a sequence of extended ASCII characters, we have 1 for each match and otherwise. Although the standard objective function is the only reasonable solution for this moment, there is some research [12], which may result in new scoring scheme proposition. 4 Tests and Results This section provides detailed description concerning efficiency tests of Center Star method and Clustal algorithm. Tests were executed on the Intel(R) Xeon(TM) CPU 3.GHz computer equipped with kb of the total memory. Compiler used for compilation was g++ (v4.1). In the test procedure, external data sets, extracted from Arakis database, were used. What is most interesting, data were extracted from Arakis database. Therefore, some tests results were confronted with the Arakis algorithm results. Data, used in the test, consisted of real network signatures, suspected to be malicious. Arakis algorithms were mainly based on DBSCAN [19] clustering mechanism and edit distance measurement. 4.1 Center Star Methods Tests In the figure (Fig. 2 (a)) we have experimental data set presented. The horizontal axis represents the total number of characters (counted as a total sum of network attack signatures lengths). The vertical axis, in turn, represents the actual number of processed signatures. This data set was used in Center Star method tests. (Fig. 2 (b)) shows the actual execution time of the Center Star method. Time was measured in seconds. Next figure (Fig. 2 (c)) reflects the relation between the length of the multiple sequence alignment (MSA) and the common subsequence extracted from MSA. This relation provides us information regarding total length of the extracted subsequence. In the next test, we measured average length of single division in one signature. Assuming that single network attack signature may consist of many parts, this test provided us an approximate information concerning the quality of the extracted common subsequence. The greater the average length of a single division in network attack signature, the lower the probability of false positive or false negative alerts. Center Star method algorithm was compared with Arakis algorithms. The results of the test are shown in the figure (Fig. 2 (d)). In some cases Arakis algorithm seems to obtain better results than Center Star algorithm. Those situations were precisely investigated and it turned out that the reason of that had a background in different interpretation of the clusters representatives. In some cases Arakis algorithm does not update the clusters representatives even if some very long network attack signatures have expired. This has a consequences in overestimating the average single division length of a common subsequence. 4.2 Clustal Algorithm Tests Most of the Clustal tests were performed in order to compare results with those of Arakis algorithm. Data used in the tests, are shown in the figure (Fig. 3 (a)).

6 54 K. Fabjański and T. Kruk Number of signatures vs Number of signatures database (a) Center Star: The no. of characters vs the number of signatures Time Number of characters vs Time Number of characters Center Star method (b) The Center Star method execution time 2 vs Pattern length 16 vs Average length of divs Pattern length Average length of divs LCS length MSA length Center Star method Arakis algorithm (c) Center Star method: the MSA and LCS relation (d) Average single division length of common subsequence: Arakis vs Center Star Fig. 2. Center Star method tests Next test (Fig. 3 (b)) investigates the Clustal algorithm execution time in respect to the total number of processed characters. Time was measured in seconds. The (Fig. 4) represents the comparison of the Arakis clustering algorithm with the Clustal method. Comparison of those two methods was made in order to show the main advantage of the Clustal algorithm. The main advantage of the Clustal algorithm is the possibility of adjustment. Two subfigures (a,c) present the number of clusters produced by the Arakis and Clustal algorithms. In the first subfigure (a), we can notice that Clustal algorithm produces smaller number of clusters than Arakis solution. However, the precision for that test was rather poor (b). Smaller number of clusters was achieved using EPS1 =.1. EPS1 is an epsilon which determines whether the investigated signature should be classified to particular cluster. The condition where distance 1 between two signatures is greater than EPS1 is consider as satisfied. Precision is expressed as ratio of MSA length to the LCS length. The closer the ratio to 1,the better precision we obtain. Better precision is obtained at the greater number of clusters cost. In the next subfigures (c,d), EPS1 was set to.9. For this values, Clustal algorithm produces much more clusters than Arakis algorithm. On the other hand, precision 1 Levenshtein distance [18].

7 Network Traffic Classification by Common Subsequence Finding 55 8 vs Number of signatures 25 Number of characters vs Time Number of signatures Time database (a) Clustal: number of characters vs number of signatures Fig. 3. Clustal algorithm tests Number of characters Clustal algorithm (b) The Clustal algorithm execution time gained in those two tests was very high. All four subfigures (a,b,c,d) were generated by computing the Clustal algorithm with the standard scoring scheme 2. Parameters MATCH, MISMATCH and GAP _PENALTY were set according to standard scoring scheme. The reason why gap penalty had the same value assigned as mismatch was straightforward. So far there is no scoring scheme for ASCII alphabet, therefore only trivial approach was presented. In this approach gap penalties were not considered. However in extended test procedure, different values for gap penalties were assigned. Those test results were preliminary and thus they were not published in this paper. 5 Evaluation and Conclusions In this section, detailed estimation of the main methods are given. Estimation was based on theoretical assumption and faced with empirical implementation of both methods. 5.1 Center Star Method Complexity The Center Star method consists of three main phases, after which multiple sequence alignment is found. In the first phase of this method, all pairwise alignment are formed (distance matrix calculation). The complexity of this phase, in the worst case, is O((N 2 + 3N) ( K 2 ) ),wherek is the number of input signatures. The second phase is related to finding the signature which is "the closest" to others. This step requires O(K). In the last step, multiple sequence alignment is formed. The last phase computational complexity of the Center Star method is O(2N ( K 2 ) ). The Center Star method provides essential functionality in common motif finding process. It allows us to extract the common subsequence from the multiple sequence alignment. This procedure requires O(KN). 2 1 for match and otherwise.

8 56 K. Fabjański and T. Kruk Number of cluster vs Number of cluster Clustal algorithm Arakis algorithm (a) dist = 1, MATCH = 1, MISMATCH =, GAP_PENALTY =, EPS1 =.1 Pattern length vs Pattern length LCS length MSA length (b) dist = 1, MATCH = 1, MISMATCH =, GAP_PENALTY =, EPS1 =.1 35 vs Number of cluster 8 vs Pattern length Number of cluster Clustal algorithm Arakis algorithm (c) dist = 1, MATCH = 1, MISMATCH =, GAP_PENALTY =, EPS1 =.9 Pattern length LCS length MSA length (d) dist = 1, MATCH = 1, MISMATCH =, GAP_PENALTY =, EPS1 =.9 Fig. 4. Number of clusters vs precision: comparison of the Arakis algorithm with Clustal algorithm 5.2 Clustal Algorithm Complexity In Clustal algorithm we have very complicated and time consuming procedures, including distance matrix calculation, dendrogram creation and clustering mechanism. All those three phases have the following computational complexities: 1. Distance matrix calculation - O((N 2 +3N) ( ) K 2 2. Dendrogram creation - O( ( ) ( K 2 +2(K 1) [ K ) 2 +N 2 +4(N + K)]) 3. Clustering (reading the dendrogram and writing clusters to the file) - O(K) All calculations regarding computational complexity were based on theoretical assumptions and source code analysis. Run-time dependencies shown in (Fig. 3 (b)) seem to confirm the results. Moreover, presented computational complexities do not compromise the theoretical assumptions regarding complexities of presented methods. To sum up, Clustal and Center Star algorithms have got some advantages and disadvantages. One of the biggest drawback of both algorithms is their high run-time complexity. On the other hand, the whole task is an NP-complete problem, so we cannot expect better run-time complexity. Clustal, as well as Center Star method, can be modified in order to decrease this complexity. In the Center Star method instead of finding

9 Network Traffic Classification by Common Subsequence Finding 57 all pairwise alignment, we can take a randomly selected sequence from the set of input signatures. After that, we can form the multiple sequence alignment by computing all pairwise alignments of the chosen sequence with the rest of sequences. As a result, we would omit the process of choosing the center sequence, which involves computation of all pairwise alignments in the set of network attack signatures. On the other hand, in Clustal algorithm, instead of using Neighbor-Joining algorithm [13][15][16][17] for dendrogram creation, we could have used an Unweighted Pair Group Method with Arithmetic Mean (UPGMA) [14]. The UPGMA is faster than Neighbor-Joining algorithm at precision expense. This improvements lead to the better time complexity, but on the other hand, they result in worse common subsequence extraction and worse network traffic classification. In our case, better time complexity might occur to be more important than worse common subsequence extraction. Extraction of the common subsequence during preprocessing phase should be performed in online mode. On the other hand, clustering of already created signatures must be performed in offline mode. This paper involves the process of classification and identification of network attack signatures only. Therewere no other tests checking the influence on the number of false positive or false negative alerts. Such experiments will be performed after we finally prove that bioinformatics methods are suitable for suspicious network traffic analysis. In further work, it is expected that adapted methods will be constantly developed. Although, the results of tests performed on the real network traffic data are very promising, still there is an issue related to new scoring function proposition. Therefore, further work will focus on aspects related to statistics regarding the network traffic. Statistics will allow in future, to represent the particular families of Internet threats as a profile structures. The profile structure will allow to create scoring matrices, similar to those which can be met in bioinformatics. Moreover, profile structures will allow to deal with Internet threats such as polymorphic worms. Furthermore, profiles will allow us to identify those regions in the network traffic patterns, which remain unchanged even in case of polymorphic Internet threats. References 1. Nazario, J.: Defense and Detection Strategies against Internet Worms. Artech House, Boston & London (24) 2. Kreibich, C., Crowcroft, J.: Automated NIDS Signature Creation using Honeypots. University of Cambridge Computer Laborator (23) 3. Kreibich, C., Crowcroft, J.: Honeycomb - Creating Intrusion Detection Signatures Using Honeypots. In: Proceedings of the Second Workshop on Hot Topics in Networks (Hotnets II). Cambridge Massachusetts: ACM SIGCOMM, Boston (23) 4. Rzewuski, C.: Bachelor s Thesis: SigSearch - automated signature generation system (in Polish). Warsaw University of Technology, The Faculty of Electronics and Information Technology (25) 5. Kijewski, P., Kruk, T.: Arakis - a network early warning system (in Polish) (26) 6. Kreibich, C., Crowcroft, J.: Efficient sequence alignment of network traffic. In: IMC 26: Proceedings of the 6th ACM SIGCOMM on Internet measurement, isbn , pp ACM Press, Brazil (26)

10 58 K. Fabjański and T. Kruk 7. Bakos, G., Beale, J.: Honeypot Advantages & Disadvantages, LasVegas, pp. 7 8 (November 22) 8. Kreibich, C.: libstree A generic suffix tree library, 9. Gusfield, D.: Efficient method for multiple sequence alignment with guaranteed error bound. Report CSE-91-4, Computer Science Division, University of California, Davis (1991) 1. Reinert, K.: Introduction to Multiple Sequence Alignment. Algorithmische Bioinformatik WS 3, 1 3 (25) 11. Bioinformatics Multiple sequence alignment, Kharrazi, M., Shanmugasundaram, K., Memon, N.: Network Abuse Detection via Flow Content Characterization. In: IEEE Workshop on Information Assurance and Security United States Military Academy (24) 13. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 2, (1987) 14. Tajima, F.: A Simple Graphic Method for Reconstructing Phylogenetic Trees from Molecular Data. In: Reconstruction of Phylogenetic Trees, Department of Population Genetics, National Institute of Genetics, Japan, pp (199) 15. The Neighbor-Joining Method, Weng, Z.: Protein and DNA Sequence Analysis BE561. Boston University (25) 17. Multiple alignment: heuristics, Levenshtein, V.: Binary codes capable of correcting insertions and reversals. Soviet Physics Doklady, (1966) 19. Ester, M., Kriegel, H., Sander, J., Xiaowei, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD 1996), Institute for Computer Science, University of Munich (1996) 2. Fabjañski, K.: Master s Thesis: Network Traffic Classification by Common Subsequence Finding. Warsaw University of Technology, The Faculty of Electronics and Information Technology, Warsaw (27)

Polygraph: Automatically Generating Signatures for Polymorphic Worms

Polygraph: Automatically Generating Signatures for Polymorphic Worms Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome Brad Karp Dawn Song Presented by: Jeffrey Kirby Overview Motivation Polygraph Signature Generation Algorithm Evaluation

More information

A New Platform NIDS Based On WEMA

A New Platform NIDS Based On WEMA I.J. Information Technology and Computer Science, 2015, 06, 52-58 Published Online May 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.06.07 A New Platform NIDS Based On WEMA Adnan A.

More information

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota Marina Sirota MOTIVATION: PROTEIN MULTIPLE ALIGNMENT To study evolution on the genetic level across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Bioinformatics explained: Smith-Waterman

Bioinformatics explained: Smith-Waterman Bioinformatics Explained Bioinformatics explained: Smith-Waterman May 1, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com

More information

ARAKIS An Early Warning and Attack Identification System

ARAKIS An Early Warning and Attack Identification System ARAKIS An Early Warning and Attack Identification System Piotr Kijewski Piotr.Kijewski@cert.pl 16th Annual FIRST Conference June 13-18, Budapest, Hungary Presentation outline Trends in large scale malicious

More information

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into

More information

Unsupervised learning on Color Images

Unsupervised learning on Color Images Unsupervised learning on Color Images Sindhuja Vakkalagadda 1, Prasanthi Dhavala 2 1 Computer Science and Systems Engineering, Andhra University, AP, India 2 Computer Science and Systems Engineering, Andhra

More information

Data Mining Technologies for Bioinformatics Sequences

Data Mining Technologies for Bioinformatics Sequences Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment

More information

CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment

CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment Courtesy of jalview 1 Motivations Collective statistic Protein families Identification and representation of conserved sequence features

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010 Principles of Bioinformatics BIO540/STA569/CSI660 Fall 2010 Lecture 11 Multiple Sequence Alignment I Administrivia Administrivia The midterm examination will be Monday, October 18 th, in class. Closed

More information

Multiple sequence alignment. November 20, 2018

Multiple sequence alignment. November 20, 2018 Multiple sequence alignment November 20, 2018 Why do multiple alignment? Gain insight into evolutionary history Can assess time of divergence by looking at the number of mutations needed to change one

More information

A Novel Approach to Detect and Prevent Known and Unknown Attacks in Local Area Network

A Novel Approach to Detect and Prevent Known and Unknown Attacks in Local Area Network International Journal of Wireless Communications, Networking and Mobile Computing 2016; 3(4): 43-47 http://www.aascit.org/journal/wcnmc ISSN: 2381-1137 (Print); ISSN: 2381-1145 (Online) A Novel Approach

More information

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS)

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) The University of Akron IdeaExchange@UAkron Mechanical Engineering Faculty Research Mechanical Engineering Department 2008 High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) Ajay

More information

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

Multiple Sequence Alignment. Mark Whitsitt - NCSA

Multiple Sequence Alignment. Mark Whitsitt - NCSA Multiple Sequence Alignment Mark Whitsitt - NCSA What is a Multiple Sequence Alignment (MA)? GMHGTVYANYAVDSSDLLLAFGVRFDDRVTGKLEAFASRAKIVHIDIDSAEIGKNKQPHV GMHGTVYANYAVEHSDLLLAFGVRFDDRVTGKLEAFASRAKIVHIDIDSAEIGKNKTPHV

More information

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018 CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2018 Admin Assignment 2 is due Friday. Assignment 1 grades available? Midterm rooms are now booked. October 18 th at 6:30pm (BUCH A102

More information

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering

More information

Bioinformatics explained: BLAST. March 8, 2007

Bioinformatics explained: BLAST. March 8, 2007 Bioinformatics Explained Bioinformatics explained: BLAST March 8, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com Bioinformatics

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2016

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2016 CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2016 Admin Assignment 1 solutions will be posted after class. Assignment 2 is out: Due next Friday, but start early! Calculus and linear

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT Asif Ali Khan*, Laiq Hassan*, Salim Ullah* ABSTRACT: In bioinformatics, sequence alignment is a common and insistent task. Biologists align

More information

Artificial Immune System against Viral Attack

Artificial Immune System against Viral Attack Artificial Immune System against Viral Attack Hyungjoon Lee 1, Wonil Kim 2*, and Manpyo Hong 1 1 Digital Vaccine Lab, G,raduated School of Information and Communication Ajou University, Suwon, Republic

More information

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade

More information

GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment

GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms CLUSTAL W Courtesy of jalview Motivations Collective (or aggregate) statistic

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

A Rule-Based Intrusion Alert Correlation System for Integrated Security Management *

A Rule-Based Intrusion Alert Correlation System for Integrated Security Management * A Rule-Based Intrusion Correlation System for Integrated Security Management * Seong-Ho Lee 1, Hyung-Hyo Lee 2, and Bong-Nam Noh 1 1 Department of Computer Science, Chonnam National University, Gwangju,

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Recent Research Results. Evolutionary Trees Distance Methods

Recent Research Results. Evolutionary Trees Distance Methods Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 11 Nov. 2016, Page No.

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 11 Nov. 2016, Page No. www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 11 Nov. 2016, Page No. 19054-19062 Review on K-Mode Clustering Antara Prakash, Simran Kalera, Archisha

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics

More information

Activating Intrusion Prevention Service

Activating Intrusion Prevention Service Activating Intrusion Prevention Service Intrusion Prevention Service Overview Configuring Intrusion Prevention Service Intrusion Prevention Service Overview Intrusion Prevention Service (IPS) delivers

More information

Flow-based Worm Detection using Correlated Honeypot Logs

Flow-based Worm Detection using Correlated Honeypot Logs Flow-based Worm Detection using Correlated Honeypot Logs Falko Dressler, Wolfgang Jaegers, and Reinhard German Computer Networks and Communication Systems, University of Erlangen, Martensstr. 3, 91058

More information

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)

Notes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018) 1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS

ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS Saulius Grusnys, Ingrida Lagzdinyte Kaunas University of Technology, Department of Computer Networks, Studentu 50,

More information

Quiz section 10. June 1, 2018

Quiz section 10. June 1, 2018 Quiz section 10 June 1, 2018 Logistics Bring: 1 page cheat-sheet, simple calculator Any last logistics questions about the final? Logistics Bring: 1 page cheat-sheet, simple calculator Any last logistics

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the

More information

Information Integration of Partially Labeled Data

Information Integration of Partially Labeled Data Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de

More information

Anomaly Detection in Communication Networks

Anomaly Detection in Communication Networks Anomaly Detection in Communication Networks Prof. D. J. Parish High Speed networks Group Department of Electronic and Electrical Engineering D.J.Parish@lboro.ac.uk Loughborough University Overview u u

More information

ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS

ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS Ivan Vogel Doctoral Degree Programme (1), FIT BUT E-mail: xvogel01@stud.fit.vutbr.cz Supervised by: Jaroslav Zendulka E-mail: zendulka@fit.vutbr.cz

More information

IJSER. Virtualization Intrusion Detection System in Cloud Environment Ku.Rupali D. Wankhade. Department of Computer Science and Technology

IJSER. Virtualization Intrusion Detection System in Cloud Environment Ku.Rupali D. Wankhade. Department of Computer Science and Technology ISSN 2229-5518 321 Virtualization Intrusion Detection System in Cloud Environment Ku.Rupali D. Wankhade. Department of Computer Science and Technology Abstract - Nowadays all are working with cloud Environment(cloud

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

Pattern Mining in Frequent Dynamic Subgraphs

Pattern Mining in Frequent Dynamic Subgraphs Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de

More information

Approaches to Efficient Multiple Sequence Alignment and Protein Search

Approaches to Efficient Multiple Sequence Alignment and Protein Search Approaches to Efficient Multiple Sequence Alignment and Protein Search Thesis statements of the PhD dissertation Adrienn Szabó Supervisor: István Miklós Eötvös Loránd University Faculty of Informatics

More information

Computational Molecular Biology

Computational Molecular Biology Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive

More information

Evaluation of Parallel Programs by Measurement of Its Granularity

Evaluation of Parallel Programs by Measurement of Its Granularity Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl

More information

An experimental evaluation of a parallel genetic algorithm using MPI

An experimental evaluation of a parallel genetic algorithm using MPI 2009 13th Panhellenic Conference on Informatics An experimental evaluation of a parallel genetic algorithm using MPI E. Hadjikyriacou, N. Samaras, K. Margaritis Dept. of Applied Informatics University

More information

Polymorphic Blending Attacks. Slides by Jelena Mirkovic

Polymorphic Blending Attacks. Slides by Jelena Mirkovic Polymorphic Blending Attacks Slides by Jelena Mirkovic 1 Motivation! Polymorphism is used by malicious code to evade signature-based IDSs Anomaly-based IDSs detect polymorphic attacks because their byte

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Intrusion Detection System using AI and Machine Learning Algorithm

Intrusion Detection System using AI and Machine Learning Algorithm Intrusion Detection System using AI and Machine Learning Algorithm Syam Akhil Repalle 1, Venkata Ratnam Kolluru 2 1 Student, Department of Electronics and Communication Engineering, Koneru Lakshmaiah Educational

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

Multiple Sequence Alignment Using Reconfigurable Computing

Multiple Sequence Alignment Using Reconfigurable Computing Multiple Sequence Alignment Using Reconfigurable Computing Carlos R. Erig Lima, Heitor S. Lopes, Maiko R. Moroz, and Ramon M. Menezes Bioinformatics Laboratory, Federal University of Technology Paraná

More information

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence 2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da

More information

Active defence through deceptive IPS

Active defence through deceptive IPS Active defence through deceptive IPS Authors Apostolis Machas, MSc (Royal Holloway, 2016) Peter Komisarczuk, ISG, Royal Holloway Abstract Modern security mechanisms such as Unified Threat Management (UTM),

More information

WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM

WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM WEB USAGE MINING: ANALYSIS DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE ALGORITHM K.Dharmarajan 1, Dr.M.A.Dorairangaswamy 2 1 Scholar Research and Development Centre Bharathiar University

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Comparison of Sequence Similarity Measures for Distant Evolutionary Relationships

Comparison of Sequence Similarity Measures for Distant Evolutionary Relationships Comparison of Sequence Similarity Measures for Distant Evolutionary Relationships Abhishek Majumdar, Peter Z. Revesz Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln,

More information

Intrusion Detection - Snort

Intrusion Detection - Snort Intrusion Detection - Snort 1 Sometimes, Defenses Fail Our defenses aren t perfect Patches aren t applied promptly enough AV signatures not always up to date 0-days get through Someone brings in an infected

More information

Text clustering based on a divide and merge strategy

Text clustering based on a divide and merge strategy Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and

More information

You will discuss topics related to ethical hacking, information risks, and security techniques which hackers will seek to circumvent.

You will discuss topics related to ethical hacking, information risks, and security techniques which hackers will seek to circumvent. IDPS Effectiveness and Primary Takeaways You will discuss topics related to ethical hacking, information risks, and security techniques which hackers will seek to circumvent. IDPS Effectiveness and Primary

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary

More information

Objective of clustering

Objective of clustering Objective of clustering Discover structures and patterns in high-dimensional data. Group data with similar patterns together. This reduces the complexity and facilitates interpretation. Expression level

More information

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017 CPSC 340: Machine Learning and Data Mining Finding Similar Items Fall 2017 Assignment 1 is due tonight. Admin 1 late day to hand in Monday, 2 late days for Wednesday. Assignment 2 will be up soon. Start

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Virtual CMS Honey pot capturing threats In web applications 1 BADI ALEKHYA, ASSITANT PROFESSOR, DEPT OF CSE, T.J.S ENGINEERING COLLEGE

Virtual CMS Honey pot capturing threats In web applications 1 BADI ALEKHYA, ASSITANT PROFESSOR, DEPT OF CSE, T.J.S ENGINEERING COLLEGE International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013 1492 Virtual CMS Honey pot capturing threats In web applications 1 BADI ALEKHYA, ASSITANT PROFESSOR, DEPT OF CSE,

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Flow-based Anomaly Intrusion Detection System Using Neural Network

Flow-based Anomaly Intrusion Detection System Using Neural Network Flow-based Anomaly Intrusion Detection System Using Neural Network tational power to analyze only the basic characteristics of network flow, so as to Intrusion Detection systems (KBIDES) classify the data

More information

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Analyzing Dshield Logs Using Fully Automatic Cross-Associations Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu

More information

Crises Management in Multiagent Workflow Systems

Crises Management in Multiagent Workflow Systems Crises Management in Multiagent Workflow Systems Małgorzata Żabińska Department of Computer Science, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland zabinska@agh.edu.pl

More information

Dynamic Programming & Smith-Waterman algorithm

Dynamic Programming & Smith-Waterman algorithm m m Seminar: Classical Papers in Bioinformatics May 3rd, 2010 m m 1 2 3 m m Introduction m Definition is a method of solving problems by breaking them down into simpler steps problem need to contain overlapping

More information

A Survey And Comparative Analysis Of Data

A Survey And Comparative Analysis Of Data A Survey And Comparative Analysis Of Data Mining Techniques For Network Intrusion Detection Systems In Information Security, intrusion detection is the act of detecting actions that attempt to In 11th

More information

Mahalanobis Distance Map Approach for Anomaly Detection

Mahalanobis Distance Map Approach for Anomaly Detection Edith Cowan University Research Online Australian Information Security Management Conference Conferences, Symposia and Campus Events 2010 Mahalanobis Distance Map Approach for Anomaly Detection Aruna Jamdagnil

More information

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Dr.K.Duraiswamy Dean, Academic K.S.Rangasamy College of Technology Tiruchengode, India V. Valli Mayil (Corresponding

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Analysis and Extensions of Popular Clustering Algorithms

Analysis and Extensions of Popular Clustering Algorithms Analysis and Extensions of Popular Clustering Algorithms Renáta Iváncsy, Attila Babos, Csaba Legány Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

Special course in Computer Science: Advanced Text Algorithms

Special course in Computer Science: Advanced Text Algorithms Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg

More information

A Parallel Community Detection Algorithm for Big Social Networks

A Parallel Community Detection Algorithm for Big Social Networks A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic

More information

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data : Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal

More information

Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA

Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA Journal of Computer Science 2 (3): 292-296, 2006 ISSN 1549-3636 2006 Science Publications Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA 1 E.Ramaraj and 2 M.Punithavalli

More information

On the Efficacy of Haskell for High Performance Computational Biology

On the Efficacy of Haskell for High Performance Computational Biology On the Efficacy of Haskell for High Performance Computational Biology Jacqueline Addesa Academic Advisors: Jeremy Archuleta, Wu chun Feng 1. Problem and Motivation Biologists can leverage the power of

More information

Long Read RNA-seq Mapper

Long Read RNA-seq Mapper UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...

More information

Distributed and clustering techniques for Multiprocessor Systems

Distributed and clustering techniques for Multiprocessor Systems www.ijcsi.org 199 Distributed and clustering techniques for Multiprocessor Systems Elsayed A. Sallam Associate Professor and Head of Computer and Control Engineering Department, Faculty of Engineering,

More information

Clustering of Proteins

Clustering of Proteins Melroy Saldanha saldanha@stanford.edu CS 273 Project Report Clustering of Proteins Introduction Numerous genome-sequencing projects have led to a huge growth in the size of protein databases. Manual annotation

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information