Syntactic N-grams as Machine Learning. Features for Natural Language Processing. Marvin Gülzow. Basics. Approach. Results.

Similar documents
NLP in practice, an example: Semantic Role Labeling

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

CSE 573: Artificial Intelligence Autumn 2010

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Exam Marco Kuhlmann. This exam consists of three parts:

ECG782: Multidimensional Digital Signal Processing

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Application of Support Vector Machine Algorithm in Spam Filtering

Search Engines. Information Retrieval in Practice

Applying Supervised Learning

Slides for Data Mining by I. H. Witten and E. Frank

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Lecture 7: Decision Trees

Support Vector Machines

Decision tree learning

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Support Vector Machines

Machine Learning Techniques for Data Mining

Network Traffic Measurements and Analysis

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Machine Learning: Algorithms and Applications Mockup Examination

Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*

Detection and Extraction of Events from s

CSEP 573: Artificial Intelligence

Data Mining in Bioinformatics Day 1: Classification

Lecture outline. Decision-tree classification

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Supervised vs unsupervised clustering

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-based Parsing with Neural Nets

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CS 229 Midterm Review

Naïve Bayes for text classification

Support Vector Machines + Classification for IR

Automatic Summarization

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Data Preprocessing. Supervised Learning

Machine Learning in GATE

Support Vector Machines

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

Structured Learning. Jun Zhu

Review on Text Mining

Keyword Extraction by KNN considering Similarity among Features

Link Prediction for Social Network

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Semi-supervised Learning

Dynamic Feature Selection for Dependency Parsing

Lecture 9: Support Vector Machines

Domain Based Named Entity Recognition using Naive Bayes

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Support Vector Machines

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Discriminative Training with Perceptron Algorithm for POS Tagging Task

Support Vector Machines.

Classification. 1 o Semestre 2007/2008

Data Mining Practical Machine Learning Tools and Techniques

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Beyond Bags of Features

Machine Learning for NLP

Sentiment Analysis on Twitter Data using KNN and SVM

CSE4334/5334 DATA MINING

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

Random Forest A. Fornaser

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CS229 Final Project: Predicting Expected Response Times

Domain-specific Concept-based Information Retrieval System

All lecture slides will be available at CSC2515_Winter15.html

ADVANCED CLASSIFICATION TECHNIQUES

Accelerometer Gesture Recognition

Machine Learning with MATLAB --classification

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Supervised Learning Classification Algorithms Comparison

Log- linear models. Natural Language Processing: Lecture Kairit Sirts

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

CS Machine Learning

Using Character n gram g Profiles. University of the Aegean

Data Engineering. Data preprocessing and transformation

Classification: Feature Vectors

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Classification Algorithms in Data Mining

On the automatic classification of app reviews

Base Noun Phrase Chunking with Support Vector Machines

BRACE: A Paradigm For the Discretization of Continuously Valued Data

FLL: Answering World History Exams by Utilizing Search Results and Virtual Examples

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Natural Language Processing

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Data Mining and Knowledge Discovery Practice notes 2

Transcription:

s

Table of Contents s 1 s 2 3 4 5 6

TL;DR s Introduce n-grams Use them for authorship attribution Compare machine learning approaches s J48 (decision tree) SVM + S work well

Section 1 s

s Definition n-gram: [2] w = (w 1,..., w n ) Σ n sequential items from a text item : characters, words, phonetic units, linguistic features,... sequential : Neighborship relation required Text fragments Probatilistic features

N-Gram: s Definition N-Gram: An n-gram obtained based on the order in which the elements appear in syntactic trees [1], [2] Items: SR-Tag (syntactic-relation tag) Neighborship relation: Lie on same path tree: Parse result according to formal grammar Issue: language processing? Stanford NLP suite S of SR-tags

S Example s Cars with wheels can move 1 -> move /VB ( root ) 2 -> Cars / NNS ( nsubj ) 3 -> wheels / NNS ( nmod : with ) 4 -> with /IN ( case ) 5 -> can /MD ( aux ) Ships with hulls can move 1 -> move /VB ( root ) 2 -> Ships / NNS ( nsubj ) 3 -> hulls / NNS ( nmod : with ) 4 -> with /IN ( case ) 5 -> can /MD ( aux )

S Example s

S Example s

Resulting S s (aux) (nsubj, nmod) (nmod, case) (nsubj, nmod, case) Independent of content.

s Advantages real neighbors: No arbitrary influence from content Assumption: Captures author s writing style Disadvantages Preprocessing is expensive (only once though). Parser Quality determines results Good parsers not available for every language

(SVM) s [4] Deterministic binary classifier linear separation of classes Separator: Hyperplane Gap between classes has maximum width Non-linearly separable Data?

Kernel trick s

Kernel trick s Add dimensions Warp data Transformation via kernel-function Restricted to numerical data Multiclass-classification via multiple Binary classification

SVM learning s [6] 1 Choose appropriate kernel (human) 2 Project data into target vector space 3 Find optimum separator Maximize distance of each object to separator Items defining border are support vectors

(SVM) s [5] Advantages Non-linear spearation Tunable to noise More robust against biased data Unique, global solution exists High accuracy Disadvantages Only work on numerical data Learned model not interpretable Training in O(n 2 )

(SVM) s WEKA/LibSVM SVMs work on numerical Data We have: Nominal data Map semantic relation to numbers

s [4] How many times does an attribute appear in a class? Look at each attribute of item to classify Probabilities determine class Each classified object contributes to training set Used as a reference for other learners

s [4] Bayes Theorem P(A B) = P(B A) P(A) P(B) Naïve assumption: all attributes are independant P(E = (a 1,..., a n ) h) = P(a i h) a i E P(a i h) = #data from class h with A i = a i #data from class h Object class most probable

s Advantages Easy to implement Fast implementation possible Learns with each example Somewhat accurate Standard comparison for other classifiers Disadvantages Attributes are usually not independant Probabilities may be unavailable

J48 s [6] Descision tree builder Entropy based Which attribute yields the highest information gain? Builds optimum descision tree Human-interpretable model

J48 - Information Gain s [6] Given: Labelled dataset Find: Attribute which is optimal for discriminating between classes Calculate entropy of training set T k e(t ) = p i log 2 p i i=1 Calculate information gain for attribute A m T i IG(T, A) = e(t ) T e(t i) i=1 Tree splits data on this attribute Repeat Other split critera: Gini-Index, χ 2, Randomly,...

J48 s Advantages Model can be interpreted for other uses Fast classification (precomputed model) Can fix missing values (parser errors) Disadvantages Require pruning Sensitive to noise Greedy approach can get stuck

Section 2 s

Dataset s English novels Booth Tarkington (13) George Vaizey (13) Louis Tracy (13) 24 for Training, 11 for classification Total of 6.1 MB

Algorithm s 1 Parse Corpus using StanfordNLP 2 Extract syntactic relations (SR-tags) 3 Construct SN-grams Profile 4 Classify as usual 5 Establish baseline using other classifiers

Experiments s Word based POS (Part Of Speech) Character based SR-Tags Vary n-gram size form 2 to 5 Profile sizes from 400 to 11000 Use J48 and NB as baseline

Section 3 s

- In brief s All classifiers better than 50% accuracy SVMs outperform other classifiers SR-tags yield better results than other tags Bigrams and trigrams better than 4- and 5-grams 100% accuracy in some cases [1], [3]

s 1 // Show tables from the paper now. 2 goto PAPER_RESULTS ;

Section 4 s

Positive s S provably more accurate than other approaches Able to reliably identify author in a small pool of possible authors Solid theoretical basis (SVM and parsing) Hard to hide author s grammatical habits

Negative s Parsing takes considerable time on 39 novels Mentioned in paper, as expected Parser has extreme influence on result What about wierd texts? Non-natives with the speaking of bad grammatics Fantasy/Scifi bogus words SVM models not interpretable

Paper quality s Positive: Good explanation of S Thorough comparison of many cases Clear results New, practical method found Negative: Hard to reproduce: Examples inconsistent No concrete parameters given (Learners!) Tool versions missing Small set of candidate authors (3)

Section 5 s

s Not just yet :(

Section 6 s

s G. Sidorov et. al.: learning Natual, CIC Mexico, IPN Mexico, University of the Aegean (Greece) Stanford Cousera lecture on language modeling (https://class.coursera.org/nlp/lecture/17) Efstathios Stamatatos, A Survey of Modern Authorship Attribution Methods, Dept. of Information and Communication Systems Eng, University of the Aegean Michael Berthold and Iris Adae: SVMs and Rule Lecture held at the University of Constance, Winter term 2014/15 Laura Auria and Rouslan A. Moro: s (SVM) as a Technique for Solvency Analysis, DIW Berlin, 2008