File Classification Using Sub-Sequence Kernels
|
|
- Justin Rich
- 6 years ago
- Views:
Transcription
1 DIGITAL FORENSIC RESEARCH CONFERENCE File Classification Using Sub-Sequence Kernels By Olivier DeVel Presented At The Digital Forensic Research Conference DFRWS 2003 USA Cleveland, OH (Aug 6 th - 8 th ) DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics research. Ever since it organized the first open workshop devoted to digital forensics in 2001, DFRWS continues to bring academics and practitioners together in an informal environment. As a non-profit, volunteer organization, DFRWS sponsors technical working groups, annual conferences and challenges to help drive the direction of research and development.
2 1 File Classification Using Sub- Sequence Kernels Olivier de Vel Computer Forensics Group DSTO, Australia
3 2 Presentation: Document mining and computer forensics Semi-structured documents Broadband k-spectrum kernel Coloured generalized suffix tree representation Experimental methodology Results and Conclusion
4 3 Computer Forensics: Focus on Document Mining Eg, disk surface contents structured data semi-structured data unstructured data
5 4 Presentation: Document mining and computer forensics Semi-structured documents Broadband k-spectrum kernel Coloured generalized suffix tree representation Experimental methodology Results and Conclusion
6 5 Semi-Structured Documents May or may not include natural language text May be compound documents Include low-level, short-range structures. For example: byte block identifiers mark-up tokens
7 6 Semi-Structured Documents Traditional text mining techniques generally not appropriate.
8 7 Presentation: Document mining and computer forensics Semi-structured documents Broadband k-spectrum kernel Coloured generalized suffix tree representation Experimental methodology Results and Conclusion
9 8 Broadband k-spectrum Kernel: Use an extension to the string kernel. String (k-spectrum) Kernel: Is the set of all k-length contiguous subsequences that a sequence x i (of alphabet T, size T =l) contains Feature space dimension is up to T k (eg, if k=4 and T = 256 number of dimensions 10 9 ) X i = BOOKOOPERPPO (k=4) BOOK OOKO OKOO KOOP OOPE etc. Vector representation: ( 1 (x i ), 1 (x i ),, 1 (x i )) where 1, 2,, l T k
10 9 Broadband k-spectrum Kernel: Broadband k-spectrum Kernel: BOOKOOPERPPO Is the set of all (0, 1,..k)-length contiguous subsequences that a sequence (of alphabet T) contains Feature space dimension is even larger! (k=4) B, BO, BOO, BOOK O, OO, OOK, OOKO O, OK, OKO, OKOO K, KO, KOO, KOOP O, OO, OOP, OOPE etc.
11 10 Broadband k-spectrum Kernel: Matrix representation, (BB) (x i ) : ( 1 (1)(x i ) 2 (1)(x i ) l (1)(x i ), 1 (2)(x i ) 2 (2)(x i ) l (2)(x i ), : : : 1 (2)(x i ) 2 (2)(x i ) l (2)(x i )) where i (j) T j for i = 1, 2,..l
12 11 Broadband k-spectrum Kernel: What s the Big Deal? This allows us to define the inner product between two input sequences, x r and x s as: GramMatrix(x r, x s ) (BB) (x r ) (BB) (x s ) and use the kernel trick when maximising the margin of separation between two classes. The decision function for the SVM learning algorithm is: f(x) = sign( I y i [ (BB) (x r ) (BB) (x s )] + b) in the transformed feature space rather than the input feature space.
13 12 Presentation: Document mining and computer forensics Semi-structured documents Broadband k-spectrum kernel Coloured generalized suffix tree representation Experimental methodology Results and Conclusion
14 13 Coloured Generalized Suffix Tree (CGST): The CGST is a labelling of the GST representation of a sequence of symbols: It stores a count vector to be stored at each node in the GST each vector element value equals the frequency of the subsequence in the sequence. BOOKOOPERPPO PPEKOOBOKKOO (k=4) B O K K (1,1) (1,1) (0,1) O (1,0) E K O (1,1) (0,1) R (0,1) P (1,0) (1,0) (1,0) K K O (1,3) (0,1) (0,1) (1,2) (0,1) K (1,0) O (0,1) P O O O B (1,2) (0,1) (0,1) P (1,0)
15 14 CGST: Computing the Kernel (Gram) Matrix For N input sequences, the NxN Gram matrix G is computed as follows: initialize G r,s = 0 x r and x s traverse the CGST at each node, compute the product of each vector element pair and sum to G r,s
16 15 Presentation: Document mining and computer forensics Semi-structured documents Broadband k-spectrum kernel Coloured generalized suffix tree representation Experimental methodology Results and Conclusion
17 16 Experimental Methodology: Corpus Document corpus for a controlled experiment : 5 document classes (MSWord, JPEG, MSExcel, Java source, Adobe PDF) 465 documents (0.11KB min. to 523.3KB max. size) data are scaled to unit standard deviation and zero mean
18 17 Experimental Methodology: Classifier SVM light as the (two-way) classifier, Obtain two-way categorisation matrix for each document type category, using 10-fold cross-validation sampling, Calculate per-document-type category classification performance statistics precision, recall and F 1 statistics.
19 18 Presentation: Document mining and computer forensics Semi-structured documents Broadband k-spectrum kernel Coloured generalized suffix tree representation Experimental methodology Results and Conclusion
20 19 Experiments: Performance Results Classification performance was evaluated using the following parameters: Minimum and maximum sub-sequence window offset lengths,
21 20 Experiments: Performance Results Minimum and Maximum Sub-sequence Window Offset Length: Definition: For example; sub-sequence window (length = 5) MinSeqLength (=2) MaxSeqLength (=4)
22 21 Experiments: Performance Results Minimum Sub-sequence Offset Length, MinSeqLength: Optimal MinSeqLength= Use broadband spectrum 70 F1 Performance (%) PDF Faster drop-off for non-text Minimum Sub-sequence Offset Length 5 Java MSExcel documents (except JPEG) Document Type JPEG MSWord [Parameters: MaxStringLength=256 MaxSeqLength=5 MinSeqCount=10]
23 22 Experiments: Performance Results Maximum Sub-sequence Offset Length, MaxSeqLength: Optimal value is 4<MaxSeqLength<7 F1 Performance (%) No significant improvements Maximum Sub-sequence Offset Length JPEG MSWord PDF Java MSExcel Document Type for large sub-sequence offset values
24 23 Conclusions: Promising semi-structured document categorization results using the broadband k-spectrum kernel. Experiments suggest: use broadband kernels rather than narrow band kernels (ie fixed k-length sequence), document length can be small, use a minimum suffix count to achieve improved classification performance and tree compression. Extensions: Increase the number of document types Investigate techniques for capturing longer-range data structures in documents
25 24 Questions? Advertisement: Computer and Intrusion Forensics by G. Mohay, A. Anderson, B.Collie, O. de Vel and R. McKemmish Artech House ISBN (2003)
Categories of Digital Investigation Analysis Techniques Based On The Computer History Model
DIGITAL FORENSIC RESEARCH CONFERENCE Categories of Digital Investigation Analysis Techniques Based On The Computer History Model By Brian Carrier, Eugene Spafford Presented At The Digital Forensic Research
More informationThe Normalized Compression Distance as a File Fragment Classifier
DIGITAL FORENSIC RESEARCH CONFERENCE The Normalized Compression Distance as a File Fragment Classifier By Stefan Axelsson Presented At The Digital Forensic Research Conference DFRWS 2010 USA Portland,
More informationFile Fragment Encoding Classification: An Empirical Approach
DIGITAL FORENSIC RESEARCH CONFERENCE File Fragment Encoding Classification: An Empirical Approach By Vassil Roussev and Candice Quates Presented At The Digital Forensic Research Conference DFRWS 2013 USA
More informationExtracting Hidden Messages in Steganographic Images
DIGITAL FORENSIC RESEARCH CONFERENCE Extracting Hidden Messages in Steganographic Images By Tu-Thach Quach Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver, CO (Aug 3 rd - 6
More informationRanking Algorithms For Digital Forensic String Search Hits
DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,
More informationDeveloping a New Digital Forensics Curriculum
DIGITAL FORENSIC RESEARCH CONFERENCE Developing a New Digital Forensics Curriculum By Anthony Lang, Masooda Bashir, Roy Campbell and Lizanne Destefano Presented At The Digital Forensic Research Conference
More informationSystem for the Proactive, Continuous, and Efficient Collection of Digital Forensic Evidence
DIGITAL FORENSIC RESEARCH CONFERENCE System for the Proactive, Continuous, and Efficient Collection of Digital Forensic Evidence By Clay Shields, Ophir Frieder and Mark Maloof Presented At The Digital
More informationMonitoring Access to Shared Memory-Mapped Files
DIGITAL FORENSIC RESEARCH CONFERENCE Monitoring Access to Shared Memory-Mapped Files By Christian Sarmoria, Steve Chapin Presented At The Digital Forensic Research Conference DFRWS 2005 USA New Orleans,
More informationA Novel Approach of Mining Write-Prints for Authorship Attribution in Forensics
DIGITAL FORENSIC RESEARCH CONFERENCE A Novel Approach of Mining Write-Prints for Authorship Attribution in E-mail Forensics By Farkhund Iqbal, Rachid Hadjidj, Benjamin Fung, Mourad Debbabi Presented At
More informationRapid Forensic Imaging of Large Disks with Sifting Collectors
DIGITAL FORENSIC RESEARCH CONFERENCE Rapid Forensic Imaging of Large Disks with Sifting Collectors By Jonathan Grier and Golden Richard Presented At The Digital Forensic Research Conference DFRWS 2015
More informationA Correlation Method for Establishing Provenance of Timestamps in Digital Evidence
DIGITAL FORENSIC RESEARCH CONFERENCE A Correlation Method for Establishing Provenance of Timestamps in Digital Evidence By Bradley Schatz, George Mohay, Andrew Clark Presented At The Digital Forensic Research
More informationAn Evaluation Platform for Forensic Memory Acquisition Software
DIGITAL FORENSIC RESEARCH CONFERENCE An Evaluation Platform for Forensic Memory Acquisition Software By Stefan Voemel and Johannes Stuttgen Presented At The Digital Forensic Research Conference DFRWS 2013
More informationAutomated Identification of Installed Malicious Android Applications
DIGITAL FORENSIC RESEARCH CONFERENCE Automated Identification of Installed Malicious Android Applications By Mark Guido, Justin Grover, Jared Ondricek, Dave Wilburn, Drew Hunt and Thanh Nguyen Presented
More informationCAT Detect: A Tool for Detecting Inconsistency in Computer Activity Timelines
DIGITAL FORENSIC RESEARCH CONFERENCE CAT Detect: A Tool for Detecting Inconsistency in Computer Activity Timelines By Andrew Marrington, Ibrahim Baggili, George Mohay and Andrew Clark Presented At The
More informationA Functional Reference Model of Passive Network Origin Identification
DIGITAL FORENSIC RESEARCH CONFERENCE A Functional Reference Model of Passive Network Origin Identification By Thomas Daniels Presented At The Digital Forensic Research Conference DFRWS 2003 USA Cleveland,
More informationUnification of Digital Evidence from Disparate Sources
DIGITAL FORENSIC RESEARCH CONFERENCE Unification of Digital Evidence from Disparate Sources By Philip Turner Presented At The Digital Forensic Research Conference DFRWS 2005 USA New Orleans, LA (Aug 17
More informationHash Based Disk Imaging Using AFF4
DIGITAL FORENSIC RESEARCH CONFERENCE Hash Based Disk Imaging Using AFF4 By Michael Cohen and Bradley Schatz Presented At The Digital Forensic Research Conference DFRWS 2010 USA Portland, OR (Aug 2 nd -
More informationDesign Tradeoffs for Developing Fragmented Video Carving Tools
DIGITAL FORENSIC RESEARCH CONFERENCE Design Tradeoffs for Developing Fragmented Video Carving Tools By Eoghan Casey and Rikkert Zoun Presented At The Digital Forensic Research Conference DFRWS 2014 USA
More informationFast Indexing Strategies for Robust Image Hashes
DIGITAL FORENSIC RESEARCH CONFERENCE Fast Indexing Strategies for Robust Image Hashes By Christian Winter, Martin Steinebach and York Yannikos Presented At The Digital Forensic Research Conference DFRWS
More informationBreaking the Performance Wall: The Case for Distributed Digital Forensics
DIGITAL FORENSIC RESEARCH CONFERENCE Breaking the Performance Wall: The Case for Distributed Digital Forensics By Vassil Roussev, Golden Richard Presented At The Digital Forensic Research Conference DFRWS
More informationA Study of User Data Integrity During Acquisition of Android Devices
DIGITAL FORENSIC RESEARCH CONFERENCE By Namheun Son, Yunho Lee, Dohyun Kim, Joshua I. James, Sangjin Lee and Kyungho Lee Presented At The Digital Forensic Research Conference DFRWS 2013 USA Monterey, CA
More informationOn Criteria for Evaluating Similarity Digest Schemes
DIGITAL FORENSIC RESEARCH CONFERENCE On Criteria for Evaluating Similarity Digest Schemes By Jonathan Oliver Presented At The Digital Forensic Research Conference DFRWS 2015 EU Dublin, Ireland (Mar 23
More informationPrivacy-Preserving Forensics
DIGITAL FORENSIC RESEARCH CONFERENCE Privacy-Preserving Email Forensics By Frederik Armknecht, Andreas Dewald and Michael Gruhn Presented At The Digital Forensic Research Conference DFRWS 2015 USA Philadelphia,
More informationHoneynets and Digital Forensics
DIGITAL FORENSIC RESEARCH CONFERENCE Honeynets and Digital Forensics By Lance Spitzner Presented At The Digital Forensic Research Conference DFRWS 2004 USA Baltimore, MD (Aug 11 th - 13 th ) DFRWS is dedicated
More informationA Framework for Attack Patterns Discovery in Honeynet Data
DIGITAL FORENSIC RESEARCH CONFERENCE A Framework for Attack Patterns Discovery in Honeynet Data By Olivier Thonnard, Marc Dacier Presented At The Digital Forensic Research Conference DFRWS 2008 USA Baltimore,
More informationMultidimensional Investigation of Source Port 0 Probing
DIGITAL FORENSIC RESEARCH CONFERENCE Multidimensional Investigation of Source Port 0 Probing By Elias Bou-Harb, Nour-Eddine Lakhdari, Hamad Binsalleeh and Mourad Debbabi Presented At The Digital Forensic
More informationDesigning Robustness and Resilience in Digital Investigation Laboratories
DIGITAL FORENSIC RESEARCH CONFERENCE Designing Robustness and Resilience in Digital Investigation Laboratories By Philipp Amann and Joshua James Presented At The Digital Forensic Research Conference DFRWS
More informationLeveraging CybOX to Standardize Representation and Exchange of Digital Forensic Information
DIGITAL FORENSIC RESEARCH CONFERENCE Leveraging CybOX to Standardize Representation and Exchange of Digital Forensic Information By Eoghan Casey, Greg Back, and Sean Barnum Presented At The Digital Forensic
More informationCSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18
CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample
More informationAndroid Forensics: Automated Data Collection And Reporting From A Mobile Device
DIGITAL FORENSIC RESEARCH CONFERENCE Android Forensics: Automated Data Collection And Reporting From A Mobile Device By Justin Grover Presented At The Digital Forensic Research Conference DFRWS 2013 USA
More informationArchival Science, Digital Forensics and New Media Art
DIGITAL FORENSIC RESEARCH CONFERENCE Archival Science, Digital Forensics and New Media Art By Dianne Dietrich and Frank Adelstein Presented At The Digital Forensic Research Conference DFRWS 2015 USA Philadelphia,
More informationBinComp: A Stratified Approach to Compiler Provenance Attribution
DIGITAL FORENSIC RESEARCH CONFERENCE BinComp: A Stratified Approach to Compiler Provenance Attribution By Saed Alrabaee, Paria Shirani, Mourad Debbabi, Ashkan Rahimian and Lingyu Wang Presented At The
More informationCan Digital Evidence Endure the Test of Time?
DIGITAL FORENSIC RESEARCH CONFERENCE By Michael Duren, Chet Hosmer Presented At The Digital Forensic Research Conference DFRWS 2002 USA Syracuse, NY (Aug 6 th - 9 th ) DFRWS is dedicated to the sharing
More informationAutomatic training example selection for scalable unsupervised record linkage
Automatic training example selection for scalable unsupervised record linkage Peter Christen Department of Computer Science, The Australian National University, Canberra, Australia Contact: peter.christen@anu.edu.au
More informationA Strategy for Testing Hardware Write Block Devices
DIGITAL FORENSIC RESEARCH CONFERENCE A Strategy for Testing Hardware Write Block Devices By James Lyle Presented At The Digital Forensic Research Conference DFRWS 2006 USA Lafayette, IN (Aug 14 th - 16
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationCS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University
Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [MEMORY MANAGEMENT] Shrideep Pallickara Computer Science Colorado State University MS-DOS.COM? How does performing fast
More informationAutomated Classification. Lars Marius Garshol Topic Maps
Automated Classification Lars Marius Garshol Topic Maps 2007 2007-03-21 Automated classification What is it? Why do it? 2 What is automated classification? Create parts of a topic map
More informationIntroduction to Kernels (part II)Application to sequences p.1
Introduction to Kernels (part II) Application to sequences Liva Ralaivola liva@ics.uci.edu School of Information and Computer Science Institute for Genomics and Bioinformatics Introduction to Kernels (part
More informationBehavioral Data Mining. Lecture 10 Kernel methods and SVMs
Behavioral Data Mining Lecture 10 Kernel methods and SVMs Outline SVMs as large-margin linear classifiers Kernel methods SVM algorithms SVMs as large-margin classifiers margin The separating plane maximizes
More informationCHEMINSTRUMENTS DYNAMIC SHEAR TESTER MODEL DS-1000 OPERATING INSTRUCTIONS
CHEMINSTRUMENTS DYNAMIC SHEAR TESTER MODEL DS-1000 OPERATING INSTRUCTIONS Overview...3 Operation...4 Load Cell Calibration...5 Units...5 Speed...5 Offset...5 Auto Return...6 Break Percent...6 Help...6
More informationLECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from
LECTURE 5: DUAL PROBLEMS AND KERNELS * Most of the slides in this lecture are from http://www.robots.ox.ac.uk/~az/lectures/ml Optimization Loss function Loss functions SVM review PRIMAL-DUAL PROBLEM Max-min
More informationSupport vector machine (II): non-linear SVM. LING 572 Fei Xia
Support vector machine (II): non-linear SVM LING 572 Fei Xia 1 Linear SVM Maximizing the margin Soft margin Nonlinear SVM Kernel trick A case study Outline Handling multi-class problems 2 Non-linear SVM
More informationIrfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin and Man-Pyo Hong
Chapter 5 FAST CONTENT-BASED FILE TYPE IDENTIFICATION Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin and Man-Pyo Hong Abstract Digital forensic examiners often need to identify the type of a file or file
More informationCoding for Random Projects
Coding for Random Projects CS 584: Big Data Analytics Material adapted from Li s talk at ICML 2014 (http://techtalks.tv/talks/coding-for-random-projections/61085/) Random Projections for High-Dimensional
More informationPredicting the Types of File Fragments
Predicting the Types of File Fragments William C. Calhoun and Drue Coles Department of Mathematics, Computer Science and Statistics Bloomsburg, University of Pennsylvania Bloomsburg, PA 17815 Thanks to
More informationCode Type Revealing Using Experiments Framework
Code Type Revealing Using Experiments Framework Rami Sharon, Sharon.Rami@gmail.com, the Open University, Ra'anana, Israel. Ehud Gudes, ehud@cs.bgu.ac.il, Ben-Gurion University, Beer-Sheva, Israel. Abstract.
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationCSE141 Problem Set #4 Solutions
CSE141 Problem Set #4 Solutions March 5, 2002 1 Simple Caches For this first problem, we have a 32 Byte cache with a line length of 8 bytes. This means that we have a total of 4 cache blocks (cache lines)
More informationProfile-based String Kernels for Remote Homology Detection and Motif Extraction
Profile-based String Kernels for Remote Homology Detection and Motif Extraction Ray Kuang, Eugene Ie, Ke Wang, Kai Wang, Mahira Siddiqi, Yoav Freund and Christina Leslie. Department of Computer Science
More information12 Classification using Support Vector Machines
160 Bioinformatics I, WS 14/15, D. Huson, January 28, 2015 12 Classification using Support Vector Machines This lecture is based on the following sources, which are all recommended reading: F. Markowetz.
More informationAnnotated Suffix Trees for Text Clustering
Annotated Suffix Trees for Text Clustering Ekaterina Chernyak and Dmitry Ilvovsky National Research University Higher School of Economics Moscow, Russia echernyak,dilvovsky@hse.ru Abstract. In this paper
More informationApproach Research of Keyword Extraction Based on Web Pages Document
2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Approach Research Keyword Extraction Based on Web Pages Document Yangxin
More informationText Classification using String Kernels
Text Classification using String Kernels Huma Lodhi John Shawe-Taylor Nello Cristianini Chris Watkins Department of Computer Science Royal Holloway, University of London Egham, Surrey TW20 0E, UK fhuma,
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationSelf-tuning ongoing terminology extraction retrained on terminology validation decisions
Self-tuning ongoing terminology extraction retrained on terminology validation decisions Alfredo Maldonado and David Lewis ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationSTUDYING OF CLASSIFYING CHINESE SMS MESSAGES
STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2
More informationThe Bhopal School of Social Sciences, Bhopal
Marking scheme for M.Sc. (Computer Science) II Sem. Semester II Paper Title of the paper Theory CCE Total I Data Structure & Algorithms 70 30 100 II Operating System 70 30 100 III Computer Networks with
More informationKBSVM: KMeans-based SVM for Business Intelligence
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2004 Proceedings Americas Conference on Information Systems (AMCIS) December 2004 KBSVM: KMeans-based SVM for Business Intelligence
More informationSurveying The User Space Through User Allocations
DIGITAL FORENSIC RESEARCH CONFERENCE Surveying The User Space Through User Allocations By Andrew White, Bradley Schatz and Ernest Foo Presented At The Digital Forensic Research Conference DFRWS 2012 USA
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationA Novel Support Vector Machine Approach to High Entropy Data Fragment Classification
A Novel Support Vector Machine Approach to High Entropy Data Fragment Classification Q. Li 1, A. Ong 2, P. Suganthan 2 and V. Thing 1 1 Cryptography & Security Dept., Institute for Infocomm Research, Singapore
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationSVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines
SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines Boriana Milenova, Joseph Yarmus, Marcos Campos Data Mining Technologies Oracle Overview Support Vector
More informationNew String Kernels for Biosequence Data
Workshop on Kernel Methods in Bioinformatics New String Kernels for Biosequence Data Christina Leslie Department of Computer Science Columbia University Biological Sequence Classification Problems Protein
More informationLecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy
Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples
More informationExercise 4. AMTH/CPSC 445a/545a - Fall Semester October 30, 2017
Exercise 4 AMTH/CPSC 445a/545a - Fall Semester 2016 October 30, 2017 Problem 1 Compress your solutions into a single zip file titled assignment4.zip, e.g. for a student named Tom
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationIMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM
IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT
More informationAdvanced Data Mining Techniques
Advanced Data Mining Techniques David L. Olson Dursun Delen Advanced Data Mining Techniques Dr. David L. Olson Department of Management Science University of Nebraska Lincoln, NE 68588-0491 USA dolson3@unl.edu
More informationWeb Service Matchmaking Using Web Search Engine and Machine Learning
International Journal of Web Engineering 2012, 1(1): 1-5 DOI: 10.5923/j.web.20120101.01 Web Service Matchmaking Using Web Search Engine and Machine Learning Incheon Paik *, Eigo Fujikawa School of Computer
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More informationOutline of the course
Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (30%) User Services (10%) Additional topics (15%) Buliding of a (small) digital library
More informationESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016
ESERCITAZIONE PIATTAFORMA WEKA Croce Danilo Web Mining & Retrieval 2015/2016 Outline Weka: a brief recap ARFF Format Performance measures Confusion Matrix Precision, Recall, F1, Accuracy Question Classification
More informationInformation Assurance In A Distributed Forensic Cluster
DIGITAL FORENSIC RESEARCH CONFERENCE Information Assurance In A Distributed Forensic Cluster By Nicholas Pringle and Mikhaila Burgess Presented At The Digital Forensic Research Conference DFRWS 2014 EU
More informationTEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationDeterminant of homography-matrix-based multiple-object recognition
Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom
More informationTopic 1 Classification Alternatives
Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent
More informationSVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]
SVM cont d A method of choice when examples are represented by vectors or matrices Input space cannot be readily used as attribute-vector (e.g. too many attrs) Kernel methods: map data from input space
More informationPacket Classification using Support Vector Machines with String Kernels
RESEARCH ARTICLE Packet Classification using Support Vector Machines with String Kernels Sarthak Munshi *Department Of Computer Engineering, Pune Institute Of Computer Technology, Savitribai Phule Pune
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationChakra Chennubhotla and David Koes
MSCBIO/CMPBIO 2065: Support Vector Machines Chakra Chennubhotla and David Koes Nov 15, 2017 Sources mmds.org chapter 12 Bishop s book Ch. 7 Notes from Toronto, Mark Schmidt (UBC) 2 SVM SVMs and Logistic
More informationAdding Source Code Searching Capability to Yioop
Adding Source Code Searching Capability to Yioop Advisor - Dr Chris Pollett Committee Members Dr Sami Khuri and Dr Teng Moh Presented by Snigdha Rao Parvatneni AGENDA Introduction Preliminary work Git
More information2. Design Methodology
Content-aware Email Multiclass Classification Categorize Emails According to Senders Liwei Wang, Li Du s Abstract People nowadays are overwhelmed by tons of coming emails everyday at work or in their daily
More informationPredict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry
Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice
More informationChapter 3: Important Concepts (3/29/2015)
CISC 3595 Operating System Spring, 2015 Chapter 3: Important Concepts (3/29/2015) 1 Memory from programmer s perspective: you already know these: Code (functions) and data are loaded into memory when the
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationRobust 1-Norm Soft Margin Smooth Support Vector Machine
Robust -Norm Soft Margin Smooth Support Vector Machine Li-Jen Chien, Yuh-Jye Lee, Zhi-Peng Kao, and Chih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationTransductive Learning: Motivation, Model, Algorithms
Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal
More informationMulti-version Data recovery for Cluster Identifier Forensics Filesystem with Identifier Integrity
Multi-version Data recovery for Cluster Identifier Forensics Filesystem with Identifier Integrity Mohammed Alhussein, Duminda Wijesekera Department of Computer Science George Mason University Fairfax,
More informationContent Based Image Retrieval system with a combination of Rough Set and Support Vector Machine
Shahabi Lotfabadi, M., Shiratuddin, M.F. and Wong, K.W. (2013) Content Based Image Retrieval system with a combination of rough set and support vector machine. In: 9th Annual International Joint Conferences
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationCS294-1 Final Project. Algorithms Comparison
CS294-1 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 2013-05-15 1 INTRODUCTION In this project, we
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More information