A Comparison of Word Frequency and N-Gram Based Vulnerability Categorization Using SOM
|
|
- Ruth Walsh
- 6 years ago
- Views:
Transcription
1 A Comparison of Word Frequency and N-Gram Based Vulnerability Categorization Using SOM Melanie Tupper Supervised by Nur Zincir-Heywood Faculty of Computer Science, Dalhousie University October 2008 Abstract: Network attackers exploit software vulnerabilities on network computers to facilitate successful attacks. Many organizations keep track of the existing software vulnerabilities in the form of vulnerability databases. However, categorizing vulnerabilities is difficult due to the large number of different attributes maintained. In this work we apply a dataclustering algorithm (SOM) to two different representations of information contained in an existing online vulnerability databases. After identifying the more valuable approach for this task, we are able to identify critical vulnerability features inherent in the dataset.
2 Acknowledgements I would like to thank The Computer Research Association s Committee on the Status of Women in Computing Research (CRA-W) and The Natural Sciences and Engineering Research Council of Canada (NSERC) for supporting this research. I would like to thank my mentor, Dr Nur Zincir-Heywood for her inspiration, and my husband, Stewart Hardie, for his encouragement, love, and support. 1
3 Table of Contents Section 1 - Introduction Motivation Overview... 3 Section 2 Related Work... 4 Section 3 - Methodology Data Preprocessing SOM Toolkit... 8 Section 4 - Results Section 5 Conclusions and Future Work References Appendix A: Stop Words Appendix B: Unlabeled U-Matrix Representations Appendix C: Labeled U-Matrix Representation
4 Section1 Introduction 1.1 Motivation Network attackers exploit software vulnerabilities on network computers to facilitate successful attacks. There are many types of software vulnerabilities, with differences in various attributes, including the level of authorization needed for execution, the impact on the target network, the complexity of the exploit, and others. Without software vulnerabilities, attacks would not be possible. Therefore, understanding these vulnerabilities holds the key to configuring secure computer networks. Many organizations keep track of the various existing software vulnerabilities in the form of vulnerability databases. These databases maintain a wealth of information in connection with specific vulnerabilities. Data collected may include when the vulnerability first appeared, which programs or operating systems are affected, what effects a successful exploit would have on the target network, and whether a patch for the vulnerable software is available. However, each different database may record different vulnerability attributes, and categorizing vulnerabilities is difficult due to the large number of different attributes maintained. By applying SOM, a data clustering algorithm, to an online vulnerability databases, we will be able to identify critical vulnerability features inherent in the data. Furthermore, analysis of these features will allow us to propose a standardization strategy for vulnerability classification in the future. 1.2 Overview The following section, Section 2, provides a brief introduction to SOM and n-grams, as well as a review of the current state of vulnerability classification. Section 3 describes each stage of our research and Section 4 summarizes our results. We present our conclusions and offer suggestions for furthering this research in Section 5. 3
5 Section 2 Related Work Data clustering is a classification technique used to group data into subsets based on data commonalities. A Self-Organizing Feature Map (SOM), also commonly referred to as an unsupervised learning network, is a data clustering algorithm that groups the data according to features or categories that are inherent in the dataset [1]. Recent research efforts utilizing the SOM algorithm have presented a wide variety of applications, including document classification [2] and computer network attack behavior categorization [3]. Vulnerability classification is a research area that is currently attracting much attention due to the potential benefits of a standardized vulnerability classification scheme. Li et al. [4] propose a system for standardizing vulnerability categories as a result of applying the SOM algorithm to a word frequency vector representing the text. While word frequency vectors are often used as input to data clustering algorithms, another approach is proving useful in areas of text categorization: n-grams. An n-gram is defined to be a smaller sequence of n items from a larger sequence [5]. In the case of text, items can refer to either characters or words. In our application, we will use n-grams of words. For example, 'ceramics collected by' is a 3-gram appearing in the Google n-gram corpus; each word is considered to be an item. Recent works that have used n-gram vectors as input to the SOM algorithm include a comparison of text clustering [6] and categorization [7]. In this work we compare word vector and n-gram approaches as applied to the problem of classifying vulnerabilities using the SOM algorithm. As mentioned above, there is currently no standardized vulnerability classification system. Most vulnerability databases and vulnerability scanning software includes a proprietary set of vulnerability categories. Vulnerability standardization would provide a platform for future databases and vulnerability scanners, as well as offer security administrators a common frame of reference when considering new and existing vulnerabilities that may affect specific network configurations. 4
6 Section 3 Methodology 3.1 Data Preprocessing Our goal to identify critical vulnerability features prompted us to explore and compare the various online vulnerability databases. Since we want to be able to work with the description of the different vulnerabilities, we sought out the data in a downloadable format. Of the different vulnerability databases that we explored, there is only one that offers a downloadable version of the data: the Common Vulnerabilities and Exposures Database [8]. At this point is it worth noting that CVE differentiates between two different types of vulnerabilities: entry and candidate. Entries are accepted vulnerabilities, whereas candidates are currently under review. For our purposes, we consider only the entries, and not the candidates, and download the corresponding file in XML format. Having the data in XML format allows us to process it easily using simple Unix SED commands. Figure 3.1 shows a sample of the downloaded file before processing. We easily remove all unnecessary data from the file leaving only the text located between the <desc> and </desc> tags. <item type="cve" name="cve " seq=" "> <status>entry</status> <desc>microsoft Data Access Component Internet Publishing Provider and earlier allows remote attackers to bypass Security Zone restrictions via WebDAV requests.</desc> <refs> <ref source="ms" url=" <ref source="ciac" url=" 074.shtml">L-074</ref> <ref source="xf" url=" </refs> </item> Figure 3.1. CVE Entry in XML Format 5
7 Next we remove the punctuation, numbers, and stop words from the text. Numbers and punctuation are, again, removed using simple Unix commands. Stop words, also known as noise words, are removed by writing a java program. In short, the program works by storing the stop words in a data structure then comparing each word in the text to all the words in the data structure. If the word is found, it is discarded, if not, it is written to an output file. The complete list of stop words was obtained online at [9] and can be found in Appendix A. The resulting file is considered to be the text corpus, or corpus. For the purpose of the experiment, we wish to compare two approaches, which we will refer to as word vector and bigram vector approaches. For the word approach, we simply write a java program that stores the list all the words that occur in the corpus and the number of times the word occurs. Since the SOM algorithm does not perform well if more than 3000 vectors are presented to it, we will employ a reduction technique to reduce the word space, which is done by considering the word counts. We choose to discard any word that occurs less than 3 times in the corpus, then write the word to an output file, allowing one word per line. This technique effectively reduces the number of words to allowable limit; the resulting file is known as the word file. For the bigram approach we begin by constructing a master list of all n-grams (with n=2) of words contained in the corpus. This is done using a Perl script written by Ted Pedersen, which can be found online at [10]. Figure 3.2 shows a sample of the output generated by this script. allows<>remote<> remote<>attackers<> allows<>local<> local<>users<> denial<>service<> execute<>arbitrary<> cause<>denial<>
8 The output displays the n-gram component words separated by <>, followed by three numbers. The first number is the frequency count for the n-gram is displayed; the second number indicates how often the first word occurs on the left in any n-grams of the text. Similarly, the last number indicates how many times the second word occurs as the rightmost word in any n-gram in the text. For the purpose of this investigation, we do not require the second and third number. Therefore, to prepare the resulting file of resulting n-grams for use, we remove the diamond characters and the two unused numbers, and limit the bigrams to be considered to ones that occur more than twice in the corpus. This is known as the bigram file. Since the SOM toolkit we will be using requires that the data be in a vector format, we still need to generate such vectors before we are able to run the SOM algorithm on the data. To do this, we write a program to do the following: Determine the number of vulnerabilities, V, represented by the corpus Read the word file into a word array of length N Construct and initialize a 2 dimensional integer array of size V x N For each vulnerability, iterate through the corpus considering each word in sequence Compare the current word to the entries of the word array When a match is found, increment the integer at the corresponding index of the integer array After the entire corpus has been considered, write the integers to a file where each row of the 2 dimensional array is written to 1 line The resulting file is the word vector file. The steps above are repeated for the bigram file to produce a bigram vector file. These vector files represent the frequency of occurrence of each word or bigram and are in the format required by the SOM toolkit. 7
9 3.2 SOM Toolkit In the course of our testing, we first run the SOM algorithm on the bigram vectors using various map sizes and compare the quantization errors of the results to determine whether the smallest acceptable map size that can be used. We then run the SOM algorithm on the word vector file using the optimal map size as determine from the bigram tests. The SOM algorithm implemented by the SOM toolkit consists of four stages as described in [11]: 1. Initialization 2. Training 3. Quantization Error Evaluation 4. Visualization In the initialization stage, random values are assigned to the reference vectors using a command such as: randinit -din bigram.dat -cout bigrm.cod -xdim 6 -ydim 6 -topol hexa -neigh gaussian The above command specifies the input and output files, the map dimensions, lattice type, and neighborhood function type. Map sizes considered are 6 x 6, 10 x 12, 14 x 14, and 18 x 12. The second stage, map training, consists of two phases: the first phase orders the reference vectors, whereas the second phase fine-tunes the vector values. Both phases use a command structured as: vsom -din bigram.dat -cin bigrm.cod -cout bigram.cod -rlen alpha 0.5 -radius 15 The rlen parameter determines the number of training iterations, or epochs. We consider various values of this parameter, including , , and , for each map size. The alpha value specifies the learning rate, whereas the radius determines the neighborhood radius. Since the first phase is meant to be the coarser of the two, we choose alpha to be.5 with a radius of 15. Fine-tuning these values for phase two, we 8
10 choose an alpha value of.04 and a radius of 4. These values were suggested to me by other students who have previously conducted research using the SOM PAK algorithms. The third stage involves evaluating the quantization errors of the various maps by calculating the average error of the entire data sample in the original data file. This is done with the following command: qerror -din bigram.dat -cin bigram.cod For the final stage, visualization, we choose to consider the U-Matrix representations of the resulting maps. After comparing the bigram and word approaches, we choose the approach corresponding to the U-Matrix representation that yields distinguishable clusters. Since the SOM_PAK does not label the nodes of the U-Matrix, we also use an SOM for MatLab toolbox [12] to generate a labeled, colorized version of the preferred U- Matrix. We evaluate the cluster labels, and propose a high-level categorization scheme for the representative vulnerabilities. The map quantization errors and visualizations are considered in the following section. Conclusions and further applications of this work are considered in Section 5. 9
11 Section 4 Results Table 4.1 below shows the quantization error values calculated from the first round of testing. These tests included initialization and training of 4 map sizes for the bigram approach. Table 4.1. Bigram Vector Approach Quantization Error Values Map Size Number of Epochs Quantization Error 6 x x x x x x x x x x x x x x x Based on the above results for the bigram approach, we determine that a 6 x 6 map size is sufficient and suspect 1 million epochs is sufficient as well. For the second round of testing we run three tests using the word approach for 1 million, 3 million, and 5 million epochs. The results are summarized in Table 4.2. Table 4.2. Word Vector Approach Quantization Error Values Map Size Number of Epochs Quantization Error 6 x x x
12 The quantization error values in Tables 4.1 and 4.2 seem to plateau, which confirms our hypothesis that 1 million epochs is a sufficient number of training iterations for this dataset. This is further demonstrated below in the U-Matrix representations for 1 million (Figure 4.1) and 5 million epochs (Figure 4.2). The subtle difference between the two figures indicates a plateau in values. A comparison of U-Matrix diagrams for the n-gram and word approaches can be found in Appendix B. Figure 4.1. U-Matrix for N-gram approach with 1 million epochs Figure 4.2. U-Matrix for N-gram approach with 5 million epochs 11
13 By comparing the error values for the two approaches, we determine that the n-gram approach produced a more valuable result due to the lower quantization error values. Having declared the n-gram approach more valuable, we will use the resulting U-Matrix representation after labeling, as shown in Figure 4.3, to suggest high-level vulnerability categories. A larger version of Figure 4.3 can be found in Appendix C. Figure 4.3. Labeled U-Matrix for N-gram approach with 5 million epochs In the figure above, colours closer to the blue end of the spectrum indicate that the values are similar, whereas nodes with colouring from the other end of the spectrum represent values that are less similar. With this in mind, we can identify four prominent node clusters. Examining the bigrams of these node we look for words or bigrams that are the same or similar in meaning. The words or bigrams that occur repeatedly for the four cluster are: (1) remote and quot ; (2) remote attackers and denial service ; (3) local and buffer overflow ; and (4) arbitrary files. 12
14 Section 5 Conclusions and Future Work In this work, the objective was to compare the results of applying the SOM algorithm to two different representations of a dataset for the purpose of identifying vulnerability features inherent in the data. To this end we identified a textual representation of software vulnerabilities and performed various preprocessing tasks until we were left with only a vulnerability description. Using this text, we constructed two different representations of the data, words and n-grams, and presented both to SOM, a data-clustering algorithm. We ran the SOM algorithm for both word and n-gram vector approaches and compared the resulting quantization error values to determine an acceptable map size and number of iterations. Through a process of trial and error, we determined that a map size of 6 x 6 nodes and 1million iterations of training are sufficient for the dataset. We also compare the error values for word and n-gram trails to conclude that the n-gram approach produced more valuable U-Matrix representations. After labeling the n-gram U-Matrix representation, we identified four high-level vulnerability categories. Future work includes analysis of the defining features of the identified high-level categories, which will allow us to propose a standardization strategy for vulnerability classification in the future. Such a classification scheme can be compared to other available categorization taxonomies, including online vulnerability database and vulnerability scanning software. Since this research has many implications related to network vulnerabilities, another possibility for the future is to incorporate vulnerability classification factors based on this research into a novel or existing security metric. A standardized categorization scheme based on this information as described above could also be incorporated into existing vulnerability scanners or applied to existing vulnerability databases. 13
15 References [1] SOM Toolbox: Intro to SOM by Teuvo Kohonen: [2] Luo X., Zincir-Heywood A. N., A Comparison of SOM Based Document Categorization Systems, Proceedings of the IEEE International Joint Conference on Neural Networks, pp , [3] Kayacik H. G., Zincir-Heywood A. N., Using Self-Organizing Maps to Build an Attack Map for Forensic Analysis, Proceedings of the ACM International Conference on Privacy, Security, and Trust (PST 2006), pp , [4] Li Y., Venter H.S., and Eloff J. H. P., Categorizing Vulnerabilities using Data Clustering Techniques. Available online at: [5] N-gram: [6] Amine, A., Elberrichi, Z., Simonet, M., & Malki, M., "Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM". Available online at: [7] Berger, H., & Merkl, D., "A Comparison of Support Vector Machines and Self--Organizing Maps for Categorization". Available online at: [8] Common Vulnerabilities and Exposures (CVE). [9] Stopwords. [10] Ted Pedersen Ngram Statistics Package (NSP). [11] [12] SOM Toolbox. 14
16 15 Appendix A: Stop Words a about above across after afterwards again against all almost alone along already also although always am among amongst amoungst amount an and another any anyhow anyone anything anyway anywhere are around as at back be became because become becomes becoming been before beforehand behind being below beside besides between beyond bill both bottom but by call can cannot cant co computer con could couldnt cry de describe detail do done down due during each eg eight either eleven else elsewhere empty enough etc even ever every everyone everything everywhere except few fifteen fify fill find fire first five for former formerly forty found four from front full further get give go had has hasnt have he hence her here hereafter hereby herein hereupon hers herself him himself his how however hundred i ie if in inc indeed interest into is it its itself keep last latter latterly least less ltd made many may me meanwhile might mill mine more
17 16 moreover most mostly move much must my myself name namely neither never nevertheless next nine no nobody none noone nor not nothing now nowhere of off often on once one only onto or other others otherwise our ours ourselves out over own part per perhaps please put rather re same see seem seemed seeming seems serious several she should show side since sincere six sixty so some somehow someone something sometime sometimes somewhere still such system take ten than that the their them themselves then thence there thereafter thereby therefore therein thereupon these they thick thin third this those though three through throughout thru thus to together too top toward towards twelve twenty two un under until up upon us very via was we well were what whatever when whence whenever where whereafter whereas whereby wherein whereupon wherever whether which while whither who whoever whole whom whose why will with within without would yet you your yours yourself yourselves
18 Appendix B: Unlabeled U-Matrix Representations Figure B.1. 6x6 Bigram approach with 1 million epochs Figure B.2. 6x6 Word approach with 1 million epochs 17
19 Appendix C: Labeled U-Matrix Representations Figure C.1. 6x6 Bigram approach with 1 million epochs (Left) 18
20 Figure C.2. 6x6 Bigram approach with 1 million epochs (Right) 19
kwic.py: A Python module to generate a Key Word In Context (KWIC) index
: A Python module to generate a Key Word In Context (KWIC) index Abstract John W. Shipman 2011-11-16 15:45 KWIC (Key Word In Context) is a venerable method for indexing text. This publication describes
More informationIBM Security AppScan Enterprise v9.0.1 Importing Issues from Third Party Scanners
IBM Security AppScan Enterprise v9.0.1 Importing Issues from Third Party Scanners Anton Barua antonba@ca.ibm.com October 14, 2014 Abstract: To manage the challenge of addressing application security at
More informationHigh Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore
High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More information6.001 Notes: Section 8.1
6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything
More information(Refer Slide Time: 00:02:00)
Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 18 Polyfill - Scan Conversion of a Polygon Today we will discuss the concepts
More informationChapter 5. Repetition. Contents. Introduction. Three Types of Program Control. Two Types of Repetition. Three Syntax Structures for Looping in C++
Repetition Contents 1 Repetition 1.1 Introduction 1.2 Three Types of Program Control Chapter 5 Introduction 1.3 Two Types of Repetition 1.4 Three Structures for Looping in C++ 1.5 The while Control Structure
More informationCluster Analysis using Spherical SOM
Cluster Analysis using Spherical SOM H. Tokutaka 1, P.K. Kihato 2, K. Fujimura 2 and M. Ohkita 2 1) SOM Japan Co-LTD, 2) Electrical and Electronic Department, Tottori University Email: {tokutaka@somj.com,
More informationPATTERN RECOGNITION USING NEURAL NETWORKS
PATTERN RECOGNITION USING NEURAL NETWORKS Santaji Ghorpade 1, Jayshree Ghorpade 2 and Shamla Mantri 3 1 Department of Information Technology Engineering, Pune University, India santaji_11jan@yahoo.co.in,
More informationData Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005
Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate
More information(Refer Slide Time: 02.06)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 27 Depth First Search (DFS) Today we are going to be talking
More informationThe 4/5 Upper Bound on the Game Total Domination Number
The 4/ Upper Bound on the Game Total Domination Number Michael A. Henning a Sandi Klavžar b,c,d Douglas F. Rall e a Department of Mathematics, University of Johannesburg, South Africa mahenning@uj.ac.za
More informationADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN
More informationAnalyzing Dshield Logs Using Fully Automatic Cross-Associations
Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationNext Stop Recommender
Next Stop Recommender Ben Ripley, Dirksen Liu, Maiga Chang, and Kinshuk School of Computing and Information Systems Athabasca University Athabasca, Canada maiga@ms2.hinet.net, kinshuk@athabascau.ca Abstract
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationDr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions
Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More information(Refer Slide Time: 01.26)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture # 22 Why Sorting? Today we are going to be looking at sorting.
More informationThe som Package. September 19, Description Self-Organizing Map (with application in gene clustering)
The som Package September 19, 2004 Version 0.3-4 Date 2004-09-18 Title Self-Organizing Map Author Maintainer Depends R (>= 1.9.0) Self-Organizing Map (with application in gene clustering) License GPL version
More informationthe Computability Hierarchy
2013 0 the Computability Hierarchy Eric Hehner Department of Computer Science, University of Toronto hehner@cs.utoronto.ca Abstract A computability hierarchy cannot be constructed by halting oracles. Introduction
More informationClean & Speed Up Windows with AWO
Clean & Speed Up Windows with AWO C 400 / 1 Manage Windows with this Powerful Collection of System Tools Every version of Windows comes with at least a few programs for managing different aspects of your
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationparameters, network shape interpretations,
GIScience 20100 Short Paper Proceedings, Zurich, Switzerland, September. Formalizing Guidelines for Building Meaningful Self- Organizing Maps Jochen Wendel 1, Barbara. P. Buttenfield 1 1 Department of
More informationIntroduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays
Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #43 Multidimensional Arrays In this video will look at multi-dimensional arrays. (Refer Slide Time: 00:03) In
More informationClustering Documents in Large Text Corpora
Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science
More informationAssignment 0. Nothing here to hand in
Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very
More informationCHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)
63 CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR) 4.1 INTRODUCTION The Semantic Region Based Image Retrieval (SRBIR) system automatically segments the dominant foreground region and retrieves
More informationCHAPTER 18: CLIENT COMMUNICATION
CHAPTER 18: CLIENT COMMUNICATION Chapter outline When to communicate with clients What modes of communication to use How much to communicate How to benefit from client communication Understanding your
More informationHandwritten Hindi Numerals Recognition System
CS365 Project Report Handwritten Hindi Numerals Recognition System Submitted by: Akarshan Sarkar Kritika Singh Project Mentor: Prof. Amitabha Mukerjee 1 Abstract In this project, we consider the problem
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationParticle Swarm Optimization applied to Pattern Recognition
Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationPreprocessor Directives
C++ By 6 EXAMPLE Preprocessor Directives As you might recall from Chapter 2, What Is a Program?, the C++ compiler routes your programs through a preprocessor before it compiles them. The preprocessor can
More informationThe Stack, Free Store, and Global Namespace
Pointers This tutorial is my attempt at clarifying pointers for anyone still confused about them. Pointers are notoriously hard to grasp, so I thought I'd take a shot at explaining them. The more information
More informationOn Dataset Biases in a Learning System with Minimum A Priori Information for Intrusion Detection
On Dataset Biases in a Learning System with Minimum A Priori Information for Intrusion Detection H. G. Kayacik A. N. Zincir-Heywood M. I. Heywood Dalhousie University Faculty of Computer Science Halifax,
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 11 Coding Strategies and Introduction to Huffman Coding The Fundamental
More informationPrinciple of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore
Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 20 Intermediate code generation Part-4 Run-time environments
More informationComputer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14
Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again
More information14.1 Encoding for different models of computation
Lecture 14 Decidable languages In the previous lecture we discussed some examples of encoding schemes, through which various objects can be represented by strings over a given alphabet. We will begin this
More informationPredicting Messaging Response Time in a Long Distance Relationship
Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when
More informationi W E I R D U T O P I A i
i W E I R D U T O P I A i CHAPTER 9 1 EXPLODING DOTS CHAPTER 9 WEIRD AND WILD MACHINES All right. It is time to go wild and crazy. Here is a whole host of quirky and strange machines to ponder on, some
More informationDigital Life Analysis: Undergrad Single
Nathan Goulding November 21, 2006 CS 489 Digital Forensics Digital Life Analysis: Undergrad Single Executive Summary This paper analyzes the impact that people make in today's digital world. The use of
More informationContent-based Management of Document Access. Control
Content-based Management of Document Access Control Edgar Weippl, Ismail Khalil Ibrahim Software Competence Center Hagenberg Hauptstr. 99, A-4232 Hagenberg, Austria {edgar.weippl, ismail.khalil-ibrahim}@scch.at
More informationGiftGuide/GiftStory/GiftCalcs User Reference Manual
GiftGuide/GiftStory/GiftCalcs User Reference Manual Copyright (c) 2011 PG Calc Incorporated All rights reserved. Unauthorized use or duplication is prohibited. revised 6/16/2011 Table of Contents Getting
More informationVery Fast Image Retrieval
Very Fast Image Retrieval Diogo André da Silva Romão Abstract Nowadays, multimedia databases are used on several areas. They can be used at home, on entertainment systems or even in professional context
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationSupervised classification of law area in the legal domain
AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms
More informationA Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics
A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au
More informationUsability Test Report: Bento results interface 1
Usability Test Report: Bento results interface 1 Summary Emily Daly and Ian Sloat conducted usability testing on the functionality of the Bento results interface. The test was conducted at the temporary
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationAUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES
AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES K. P. M. L. P. Weerasinghe 149235H Faculty of Information Technology University of Moratuwa June 2017 AUTOMATED STUDENT S
More informationAdvanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras
Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture 18 All-Integer Dual Algorithm We continue the discussion on the all integer
More informationLecture 05 I/O statements Printf, Scanf Simple statements, Compound statements
Programming, Data Structures and Algorithms Prof. Shankar Balachandran Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 05 I/O statements Printf, Scanf Simple
More informationAssociative Cellular Learning Automata and its Applications
Associative Cellular Learning Automata and its Applications Meysam Ahangaran and Nasrin Taghizadeh and Hamid Beigy Department of Computer Engineering, Sharif University of Technology, Tehran, Iran ahangaran@iust.ac.ir,
More informationUsability Test Report: Requesting Library Material 1
Usability Test Report: Requesting Library Material 1 Summary Emily Daly and Kate Collins conducted usability testing on the processes of requesting library material. The test was conducted at the temporary
More informationSecurity. 1 Introduction. Alex S. 1.1 Authentication
Security Alex S. 1 Introduction Security is one of the most important topics in the IT field. Without some degree of security, we wouldn t have the Internet, e-commerce, ATM machines, emails, etc. A lot
More information6.001 Notes: Section 6.1
6.001 Notes: Section 6.1 Slide 6.1.1 When we first starting talking about Scheme expressions, you may recall we said that (almost) every Scheme expression had three components, a syntax (legal ways of
More informationMemory Addressing, Binary, and Hexadecimal Review
C++ By A EXAMPLE Memory Addressing, Binary, and Hexadecimal Review You do not have to understand the concepts in this appendix to become well-versed in C++. You can master C++, however, only if you spend
More informationSampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S
Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S May 2, 2009 Introduction Human preferences (the quality tags we put on things) are language
More informationObjective and Subjective Specifications
2017-7-10 0 Objective and Subjective Specifications Eric C.R. Hehner Department of Computer Science, University of Toronto hehner@cs.utoronto.ca Abstract: We examine specifications for dependence on the
More information(Refer Slide Time: 06:01)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 28 Applications of DFS Today we are going to be talking about
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationCombinatorics Prof. Dr. L. Sunil Chandran Department of Computer Science and Automation Indian Institute of Science, Bangalore
Combinatorics Prof. Dr. L. Sunil Chandran Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 5 Elementary concepts and basic counting principles So, welcome
More informationCHAPTER 4 PROPOSED ARCHITECTURE FOR INCREMENTAL PARALLEL WEBCRAWLER
CHAPTER 4 PROPOSED ARCHITECTURE FOR INCREMENTAL PARALLEL WEBCRAWLER 4.1 INTRODUCTION In 1994, the World Wide Web Worm (WWWW), one of the first web search engines had an index of 110,000 web pages [2] but
More informationOptimization of Boyer-Moore-Horspool-Sunday Algorithm
Optimization of Boyer-Moore-Horspool-Sunday Algorithm Rionaldi Chandraseta - 13515077 Program Studi Teknik Informatika Sekolah Teknik Elektro dan Informatika, Institut Teknologi Bandung Bandung, Indonesia
More informationThe semicolon [ ; ] is a powerful mark of punctuation with three uses.
The Semicolon Recognize a semicolon when you see one. The semicolon [ ; ] is a powerful mark of punctuation with three uses. The first appropriate use of the semicolon is to connect two related sentences.
More informationGuarantee permanent Model/Code consistency:
White paper Model driven Engineering Softeam 2000 Page 1 / 14 Guarantee permanent Model/Code consistency: "Model driven Engineering" (MDE) versus "Roundtrip engineering" (RTE) White Paper SOFTEAM 2000
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationAI32 Guide to Weka. Andrew Roberts 1st March 2005
AI32 Guide to Weka Andrew Roberts http://www.comp.leeds.ac.uk/andyr 1st March 2005 1 Introduction Weka is an excellent system for learning about machine learning techniques. Of course, it is a generic
More informationAttack Class: Address Spoofing
ttack Class: ddress Spoofing L. Todd Heberlein, Matt ishop Department of Computer Science University of California Davis, C 95616 bstract We present an analysis of a class of attacks we call address spoofing.
More informationSource Code Author Identification Based on N-gram Author Profiles
Source Code Author Identification Based on N-gram Author files Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Sokratis Katsikas Laboratory of Information and Communication Systems Security
More informationChapter 3. Set Theory. 3.1 What is a Set?
Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any
More informationThis shows a typical architecture that enterprises use to secure their networks: The network is divided into a number of segments Firewalls restrict
1 This shows a typical architecture that enterprises use to secure their networks: The network is divided into a number of segments Firewalls restrict access between segments This creates a layered defense
More informationHow invariants help writing loops Author: Sander Kooijmans Document version: 1.0
How invariants help writing loops Author: Sander Kooijmans Document version: 1.0 Why this document? Did you ever feel frustrated because of a nasty bug in your code? Did you spend hours looking at the
More informationRecording end-users security events: A step towards increasing usability
Section 1 Network Systems Engineering Recording end-users security events: A step towards increasing usability Abstract D.Chatziapostolou and S.M.Furnell Network Research Group, University of Plymouth,
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationSeismic regionalization based on an artificial neural network
Seismic regionalization based on an artificial neural network *Jaime García-Pérez 1) and René Riaño 2) 1), 2) Instituto de Ingeniería, UNAM, CU, Coyoacán, México D.F., 014510, Mexico 1) jgap@pumas.ii.unam.mx
More informationBox It Up (A Graphical Look)
. Name Date A c t i v i t y 1 0 Box It Up (A Graphical Look) The Problem Ms. Hawkins, the physical sciences teacher at Hinthe Middle School, needs several open-topped boxes for storing laboratory materials.
More informationProject Report. Prepared for: Dr. Liwen Shih Prepared by: Joseph Hayes. April 17, 2008 Course Number: CSCI
University of Houston Clear Lake School of Science & Computer Engineering Project Report Prepared for: Dr. Liwen Shih Prepared by: Joseph Hayes April 17, 2008 Course Number: CSCI 5634.01 University of
More information(Refer Slide Time: 00:01:30)
Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 32 Design using Programmable Logic Devices (Refer Slide Time: 00:01:30)
More informationA Taxonomy of Web Search
A Taxonomy of Web Search by Andrei Broder 1 Overview Ø Motivation Ø Classic model for IR Ø Web-specific Needs Ø Taxonomy of Web Search Ø Evaluation Ø Evolution of Search Engines Ø Conclusions 2 1 Motivation
More informationFunctional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute
Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Module # 02 Lecture - 03 Characters and Strings So, let us turn our attention to a data type we have
More informationAADL Graphical Editor Design
AADL Graphical Editor Design Peter Feiler Software Engineering Institute phf@sei.cmu.edu Introduction An AADL specification is a set of component type and implementation declarations. They are organized
More informationProgramming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 31 Static Members Welcome to Module 16 of Programming in C++.
More informationComparison of FP tree and Apriori Algorithm
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti
More informationAddressing (Cont. Introduction to Networks)
Addressing (Cont. Introduction to Networks) - Introduction - Purpose of this course - Your work - Definitions - The machine identification - Segmentation of networks - One address for the price of two
More informationMULTI-VIEW TARGET CLASSIFICATION IN SYNTHETIC APERTURE SONAR IMAGERY
MULTI-VIEW TARGET CLASSIFICATION IN SYNTHETIC APERTURE SONAR IMAGERY David Williams a, Johannes Groen b ab NATO Undersea Research Centre, Viale San Bartolomeo 400, 19126 La Spezia, Italy Contact Author:
More informationThe Dynamic Typing Interlude
CHAPTER 6 The Dynamic Typing Interlude In the prior chapter, we began exploring Python s core object types in depth with a look at Python numbers. We ll resume our object type tour in the next chapter,
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More informationCHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES
CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving
More informationAmbiguity Handling in Mobile-capable Social Networks
Ambiguity Handling in Mobile-capable Social Networks Péter Ekler Department of Automation and Applied Informatics Budapest University of Technology and Economics peter.ekler@aut.bme.hu Abstract. Today
More informationDATA MINING TEST 2 INSTRUCTIONS: this test consists of 4 questions you may attempt all questions. maximum marks = 100 bonus marks available = 10
COMP717, Data Mining with R, Test Two, Tuesday the 28 th of May, 2013, 8h30-11h30 1 DATA MINING TEST 2 INSTRUCTIONS: this test consists of 4 questions you may attempt all questions. maximum marks = 100
More informationThe Goal of this Document. Where to Start?
A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce
More informationLecture 6 Binary Search
Lecture 6 Binary Search 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning One of the fundamental and recurring problems in computer science is to find elements in collections, such
More informationSEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7
SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7 Hi everyone once again welcome to this lecture we are actually the course is Linux programming and scripting we have been talking about the Perl, Perl
More information===============================================================================
We have looked at how to use public key crypto (mixed with just the right amount of trust) for a website to authenticate itself to a user's browser. What about when Alice needs to authenticate herself
More information