Handling Place References in Text

Size: px
Start display at page:

Download "Handling Place References in Text"

Transcription

1 Handling Place References in Text

2 Introduction Most (geographic) information is available in the form of textual documents Place reference resolution involves two-subtasks: Recognition : Delimiting occurrences of place references in text Disambiguation : Resolving place references to geo-coordinates Essential task in Geographic Information Retrieval Supports access through geography to textual documents Existing methods mostly rely on hand-tuned heuristics Labor-intensive to develop, optimize and maintain Current research focusing on data-driven methods Main problems are related to natural language ambiguity!

3 Ambiguity Geographic/non-geographic ambiguity refers to the case of place names having other, non geographic meanings: Reading in England, Buffalo in the US, Should be addressed while recognizing place references Geographic/geographic ambiguity arises when multiple distinct places share the same name: Almost every major city in Europe has a sister city of the same name in the New World Should be addressed while disambiguating place references

4 Place reference recognition Approaches based on dictionaries Sliding window approaches Aho-Corasick algorithm (finite state automaton) Good recall, good performance, often poor precision Approaches based on rules Regular expression patterns (finite state automaton) Grammatical rules Large human effort involved in creating the rules Approaches based on machine learning Hidden Markov Modeling Conditional Random Fields Good generalization behavior, requires large amounts of training data

5 Aho-Corasick String Matching Locate all occurrences of any of a finite number of keywords (e.g., location names) in a string of text. Consists of two steps: Constructing a finite state pattern matching machine from the keywords Using the pattern matching machine to process the text string in a single pass

6 Pattern Matching Machine Let P y, y,, y 1 2 k be a finite set of string patterns which we shall call keywords Let x be an arbitrary string which we shall call the text string (i.e., the document). The behavior of the pattern matching machine is dictated by three functions: a goto function g a failure function f an output function output

7 The Pattern Matching Machine Goto function g :maps a pair consisting of a state and an input symbol into a state or fail. Failure function f :maps a state into a state, and is consulted whenever the goto function reports fail. Fast transitions between failed pattern matches (e.g. a search for cat in a tree that does not contain cat, but contains cart, and thus would fail at the node prefixed by ca) to other branches of the tree that share a common prefix (e.g., in the previous case, a branch for attribute might be the best lateral transition) Output function:associating a set of keyword patterns (possibly empty) with every state.

8 Aho-Corasick Algorithm Pattern Tree State Machine h 0 s Goto Function e 1 i 3 h Black Arrows Failure Function Blue Arrows s r s 7 4 e 5 Output Function Red Dots 9 Pattern set { he, she, his, hers }

9 Aho-Corasick Search Algorithm l: the starting position in Text String T c: the current character of T to be compared with a character on the tree K w: the current node on the tree K Input: Pattern set P and text T Output: all occurrences in T of any pattern from P Algorithm: Aho-Corasick l=1; c=1; w=root of K Repeat while there is an edge (w, w ) labeled with T[c] if w` is numbered by pattern i then report that p i occurs in T starting at l; w=w ; c++; w=failure(w) and l=c-length-prefix(w); Until c> T

10 Hidden Markov Models HMMs are the standard sequence modeling tool in NLP and IE Finite state model Graphical model... S t - 1 S t S t+1 transitions observations Generates: O t - 1 O t O t +1 State sequence Observation sequence o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 P( s, o) o t1 P( s t s t1 ) P( ot st ) Parameters: for all states S={s 1,s 2, } Start state probabilities: P(s t ) Transition probabilities: P(s t s t-1 ) Observation (emission) probabilities: P(o t s t ) Training: Maximize probability of training observations

11 Placename Extraction with HMMs Given a sequence of observations: Yesterday Bruno Martins went to Campo Grande and a trained HMM: person name location name background Find the most likely state sequence: (Viterbi) arg max s P( s, o) Yesterday Bruno Martins went to Campo Grande. Any words said to be generated by the designated location name state are extracted as a location name: Location name: Campo Grande

12 B-I-O Encoding Encode the chunking problem of recognizing place references into a tagging problem os assigning classes to individual word tokens. Begin_place Inside_place Other O B_per I_per O O B_loc I_loc Yesterday Bruno Martins went to Campo Grande.

13 Hidden Markov Models Learning the model with training data General algorithm based on Expectation-Maximization (EM) 1. Initialise model λ 0 2. Compute new model λ, using λ 0 and observed sequence 3. Adjust the model λ 0 λ 4. Repeat steps 2 and 3 until log P(X,Y λ) log P(X,Y λ 0 ) < d Using the model (i.e., decoding) Choose output label sequence that maximizes the probability of the token observation sequence Viterbi dynamic programming algorithm that keeps the best label sequence at each instance

14 The Viterbi Algorithm The algorithm sweeps through all the tag possibilities for each word, computing the best sequence leading to each possibility. Dynamic Programming Approach: The key that makes this algorithm efficient is that we only need to know the best sequences leading to the previous word, because of the Markov assumption used in the Model.

15 The Viterbi Algorithm Let T = # of tags in our annotation problem (e.g., B-I-O tags for each entity type) W = # of words in the text to be annotated /* Initialization Step */ for t = 1 to T Score(t, 1) = Pr(Word 1 Tag t ) * Pr(Tag t ) BackPtr(t, 1) = 0; /* Iteration Step */ for w = 2 to W for t = 1 to T Score(t, w) = Pr(Word w Tag t ) *M AX j=1,t (Score(j, w-1) * Pr(Tag t Tag j )) BackPtr(t, w) = index of j that gave the max above /* Sequence Identification */ Seq(W ) = t that maximizes Score(t,W ) for w = W -1 to 1 Seq(w) = BackPtr(Seq(w+1),w+1)

16 Disambiguation and Gazetteers Place reference disambiguation relies on (external) gazetteer data for places. A gazetteer is a database associating place names to the corresponding place metadata Similar to address geocoding service

17 Some Popular Gazetteer Services The Alexandria Digital Library (ADL) Gazetteer Pioneering effort in defining data models and XML access protocols for managing gazetteer data Their dataset was built by integrating data from multiple sources, but usage requires a private license The geonames.org world gazetteer Dataset built by integrating data from multiple sources, with 8 million geographic names, in multiple languages, for more than 6.5 million unique geographic features Geographic features are only associated with centroid coordinates, as opposed to polygons or MBRs Does not include historical place names of time periods The Getty Thesaurus of Geographical Names Describes about 1 million places around the globe, with alternative names in multiple languages Usage of TGN data requires a private license Includes historical place names (associated with time periods), but not names of historical periods The Yahoo! Geoplanet Database Many more

18 Place Reference Disambiguation Most approaches leverage on contextual information: External : information on gazetteers (e.g., population, types,...) Internal : words and other entities surrounding the place reference. Disambiguation heuristics can be grouped into: Default senses : Disambiguation should be made to the most important candidate referent, estimated with basis on geometric area or population. Spatial minimalism : Disambiguation should be made to the candidate that minimizes the distance towards other place references in the same context, or the geometric area that covers all place references in the same context. Attribute coherence : Disambiguation should be made to the candidate referent that has attributes (e.g., the place type) similar to those that are mentioned in the textual context where the reference appears.

19 Disambiguation with Machine Learning Disambiguation can be seen as a problem of ranking candidate referents and choosing the best candidate The ranking can be based on a estimation of the geospatial distance between the candidate referent and a referent corresponding to the true disambiguation Regression models used to estimate geospatial distance Several features that are co-related with the geospatial distance Find a function that combines the available features in order to estimate the geospatial distance associated to the candidate Linear regression Genetic Programming SVM regression

20 Disambiguation with Machine Learning

21 Disambiguation Features String similarity between candidate name for the referent and the reference string in the text Population count for the candidate referent Geospatial area for the candidate referent Number of alternative names for the candidate referent Geospatial distance between candidate referent and closest interpretation for place references in the same textual unit (e.g., the same paragraph). Area of the convex hull covering candidate referent and all candidates of place references in the same text unit many more have been tested in the related literature

22 State of the art results

23 Current research challenges Some commercial services already exist... Yahoo! Placemaker Metacarta Text Geotagging Service But there are many open research challenges: Multilingual place reference resolution with Mach. Learning Requires more annotation standards/corpora such as SpatialML Using advanced sequence tagging models Considering other geospatial reference resolution tasks: Resolution of geospatial relations given in text Fine-grained classification of place references in text

24 Questions?

Annotating Spatio-Temporal Information in Documents

Annotating Spatio-Temporal Information in Documents Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de

More information

GIR experiements with Forostar at GeoCLEF 2007

GIR experiements with Forostar at GeoCLEF 2007 GIR experiements with Forostar at GeoCLEF 2007 Simon Overell 1, João Magalhães 1 and Stefan Rüger 2,1 1 Multimedia & Information Systems Department of Computing, Imperial College London, SW7 2AZ, UK 2

More information

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012 A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of

More information

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001 Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

A Comparison of Different Approaches for Assigning Geographic Scopes to Documents

A Comparison of Different Approaches for Assigning Geographic Scopes to Documents A Comparison of Different Approaches for Assigning Geographic Scopes to Documents Ivo Anastácio, Bruno Martins, and Pável Calado Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016

More information

The Language for Specifying Lexical Analyzer

The Language for Specifying Lexical Analyzer The Language for Specifying Lexical Analyzer We shall now study how to build a lexical analyzer from a specification of tokens in the form of a list of regular expressions The discussion centers around

More information

A Hybrid Neural Model for Type Classification of Entity Mentions

A Hybrid Neural Model for Type Classification of Entity Mentions A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type

More information

Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library

Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library SWIB14 Semantic Web in Libraries Conference Bonn, December 2014 Outline Introduction to The

More information

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Outline Objective Approach Experiment Conclusion and Future work Objective Automatically establish linguistic indexing of pictures

More information

Georeferencing Wikipedia pages using language models from Flickr

Georeferencing Wikipedia pages using language models from Flickr Georeferencing Wikipedia pages using language models from Flickr Chris De Rouck 1, Olivier Van Laere 1, Steven Schockaert 2, and Bart Dhoedt 1 1 Department of Information Technology, IBBT, Ghent University,

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction

Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction Jiyi Xiao Lamei Zou Chuanqi Li School of Computer Science and Technology, University of South China, Hengyang 421001,

More information

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Many machine learning models for coreference resolution have been created, using not only different feature sets but also fundamentally

More information

Query classification by using named entity recognition systems and clue keywords

Query classification by using named entity recognition systems and clue keywords Query classification by using named entity recognition systems and clue keywords Masaharu Yoshioka Graduate School of Information Science and echnology, Hokkaido University N14 W9, Kita-ku, Sapporo-shi

More information

Detection and Extraction of Events from s

Detection and Extraction of Events from  s Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

Closing the Loop in Webpage Understanding

Closing the Loop in Webpage Understanding IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Closing the Loop in Webpage Understanding Chunyu Yang, Student Member, IEEE, Yong Cao, Zaiqing Nie, Jie Zhou, Senior Member, IEEE, and Ji-Rong Wen

More information

Semantically Driven Snippet Selection for Supporting Focused Web Searches

Semantically Driven Snippet Selection for Supporting Focused Web Searches Semantically Driven Snippet Selection for Supporting Focused Web Searches IRAKLIS VARLAMIS Harokopio University of Athens Department of Informatics and Telematics, 89, Harokopou Street, 176 71, Athens,

More information

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Yingjie Hu 1, Krzysztof Janowicz 1, Sathya Prasad 2, and Song Gao 1 1 STKO Lab, Department

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

Sequence Labeling: The Problem

Sequence Labeling: The Problem Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used

More information

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju (Purdue University) Navendu Jain (Microsoft Research) Cristina Nita-Rotaru (Purdue University) April

More information

An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information

An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information P. Smart, A.I. Abdelmoty and C.B. Jones School of Computer Science, Cardiff University, Cardiff,

More information

3 Publishing Technique

3 Publishing Technique Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach

More information

Semi-Markov Conditional Random Fields for Information Extraction

Semi-Markov Conditional Random Fields for Information Extraction Semi-Markov Conditional Random Fields for Information Extraction S U N I T A S A R A W A G I A N D W I L L I A M C O H E N N I P S 2 0 0 4 P R E S E N T E D B Y : D I N E S H K H A N D E L W A L S L I

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

MEMMs (Log-Linear Tagging Models)

MEMMs (Log-Linear Tagging Models) Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter

More information

Token Identification Using HMM and PPM Models

Token Identification Using HMM and PPM Models Token Identification Using HMM and PPM Models Yingying Wen, Ian H. Witten, and Dianhui Wang School of Computer Science and Software Engineering Monash University, Clayton, Victoria 3800, AUSTRALIA ywen@csse.monash.edu.au

More information

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder

More information

The Edinburgh Geoparser

The Edinburgh Geoparser The Edinburgh Geoparser A Tool to Geoparse Text Beatrice Alex balex@inf.ed.ac.uk, @bea_alex Projects UK Connectivity DEEP Palimpsest LitLong GAP/GapVis The developers Claire Grover, Richard Tobin, Kate

More information

Intelligent Hands Free Speech based SMS System on Android

Intelligent Hands Free Speech based SMS System on Android Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1, Dr. Vilas Thakare 3, Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Named Entity Detection and Entity Linking in the Context of Semantic Web

Named Entity Detection and Entity Linking in the Context of Semantic Web [1/52] Concordia Seminar - December 2012 Named Entity Detection and in the Context of Semantic Web Exploring the ambiguity question. Eric Charton, Ph.D. [2/52] Concordia Seminar - December 2012 Challenge

More information

Best Practices for World-Class Search

Best Practices for World-Class Search Best Practices for World-Class Search MARY HOLSTEGE Distinguished Engineer, MarkLogic @mathling 4 June 2018 MARKLOGIC CORPORATION SLIDE: 2 4 June 2018 MARKLOGIC CORPORATION Search Application: Search for

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang

Extracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion

More information

BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling. Colin Dewey (adapted from slides by Mark Craven)

BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling. Colin Dewey (adapted from slides by Mark Craven) BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling Colin Dewey (adapted from slides by Mark Craven) 2007.04.12 1 Modeling RNA with Stochastic Context Free Grammars consider

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending

More information

Integer Linear Programming

Integer Linear Programming Integer Linear Programming Micha Elsner April 5, 2017 2 Integer linear programming A framework for inference: Reading: Clarke and Lapata 2008 Global Inference for Sentence Compression An Integer Linear

More information

Bus Detection and recognition for visually impaired people

Bus Detection and recognition for visually impaired people Bus Detection and recognition for visually impaired people Hangrong Pan, Chucai Yi, and Yingli Tian The City College of New York The Graduate Center The City University of New York MAP4VIP Outline Motivation

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Conditional Random Fields for Object Recognition

Conditional Random Fields for Object Recognition Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu

More information

Mining Wikipedia for Geospatial Entities. and Relationships

Mining Wikipedia for Geospatial Entities. and Relationships Mining Wikipedia for Geospatial Entities and Relationships by Jeremy T. Witmer B.S., University of Colorado, Colorado Springs, 2005 A thesis submitted to the Graduate Faculty of the University of Colorado

More information

Machine Learning in GATE

Machine Learning in GATE Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE Supervised learning Effort

More information

Chapter 10. Conclusion Discussion

Chapter 10. Conclusion Discussion Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with

More information

Presented by Kit Na Goh

Presented by Kit Na Goh Developing A Geo-Spatial Search Tool Using A Relational Database Implementation of the FGDC CSDGM Model Presented by Kit Na Goh Introduction Executive Order 12906 was issued on April 13, 1994 with the

More information

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and

More information

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018 Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments

More information

Introduction to SLAM Part II. Paul Robertson

Introduction to SLAM Part II. Paul Robertson Introduction to SLAM Part II Paul Robertson Localization Review Tracking, Global Localization, Kidnapping Problem. Kalman Filter Quadratic Linear (unless EKF) SLAM Loop closing Scaling: Partition space

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Lecture 9. Support Vector Machines

Lecture 9. Support Vector Machines Lecture 9. Support Vector Machines COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Support vector machines (SVMs) as maximum

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

NERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017

NERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017 NERD workshop Luca Foppiano @ ALMAnaCH - Inria Paris Berlin, 18/09/2017 Agenda Introducing the (N)ERD service NERD REST API Usages and use cases Entities Rigid textual expressions corresponding to certain

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:

More information

Bias-Variance Trade-off + Other Models and Problems

Bias-Variance Trade-off + Other Models and Problems CS 1699: Intro to Computer Vision Bias-Variance Trade-off + Other Models and Problems Prof. Adriana Kovashka University of Pittsburgh November 3, 2015 Outline Support Vector Machines (review + other uses)

More information

Semi-Supervised Learning of Named Entity Substructure

Semi-Supervised Learning of Named Entity Substructure Semi-Supervised Learning of Named Entity Substructure Alden Timme aotimme@stanford.edu CS229 Final Project Advisor: Richard Socher richard@socher.org Abstract The goal of this project was two-fold: (1)

More information

Modeling Sequence Data

Modeling Sequence Data Modeling Sequence Data CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: Manning/Schuetze, Sections 9.1-9.3 (except 9.3.1) Leeds Online HMM Tutorial (except Forward and

More information

A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval

A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval Florent Perronnin, Yan Liu and Jean-Michel Renders Xerox Research Centre Europe (XRCE) Textual and

More information

Digital Libraries: Language Technologies

Digital Libraries: Language Technologies Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................

More information

Applying Auto-Data Classification Techniques for Large Data Sets

Applying Auto-Data Classification Techniques for Large Data Sets SESSION ID: PDAC-W02 Applying Auto-Data Classification Techniques for Large Data Sets Anchit Arora Program Manager InfoSec, Cisco The proliferation of data and increase in complexity 1995 2006 2014 2020

More information

Token Gazetteer and Character Gazetteer for Named Entity Recognition

Token Gazetteer and Character Gazetteer for Named Entity Recognition Token Gazetteer and Character Gazetteer for Named Entity Recognition Giang Nguyen, Štefan Dlugolinský, Michal Laclavík, Martin Šeleng Institute of Informatics, Slovak Academy of Sciences Dúbravská cesta

More information

Lecture 5: Markov models

Lecture 5: Markov models Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

INSPIRE WS2 METADATA: Describing GeoSpatial Data

INSPIRE WS2 METADATA: Describing GeoSpatial Data WS2 METADATA: Describing GeoSpatial Data Susana Fontano Planning General concepts about metadata The use of standards Items about the creation of metadata Software How to create metadata The ISO19115 Standard

More information

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based

More information

Extending Blaise Capabilities in Complex Data Collections

Extending Blaise Capabilities in Complex Data Collections Extending Blaise Capabilities in Complex Data Collections Paul Segel and Kathleen O Reagan,Westat International Blaise Users Conference, April 2012, London, UK Summary: Westat Visual Survey (WVS) was developed

More information

ECP-2007-GEO OneGeology-Europe. Annex 1: Cookbook

ECP-2007-GEO OneGeology-Europe. Annex 1: Cookbook ECP-2007-GEO-317001 OneGeology-Europe Annex 1: Cookbook for creating multilingual metadata records using the OneGeology-Europe Metadata system (MIcKA) Authors: Lucie Kondrová, Robert Tomas, Štěpán Kafka

More information

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

Succinct dictionary matching with no slowdown

Succinct dictionary matching with no slowdown LIAFA, Univ. Paris Diderot - Paris 7 Dictionary matching problem Set of d patterns (strings): S = {s 1, s 2,..., s d }. d i=1 s i = n characters from an alphabet of size σ. Queries: text T occurrences

More information

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION * Prof. Dr. Ban Ahmed Mitras ** Ammar Saad Abdul-Jabbar * Dept. of Operation Research & Intelligent Techniques ** Dept. of Mathematics. College

More information

Getting Started with Omeka Music Library Association March 5, 2016

Getting Started with Omeka Music Library Association March 5, 2016 Quick setup v Sign up for a basic Omeka.net account at http://omeka.net. Additional help with creating an account can be found on the Manage Websites & Account page http://info.omeka.net/manage- an- account/]

More information

Assessing the Quality of Natural Language Text

Assessing the Quality of Natural Language Text Assessing the Quality of Natural Language Text DC Research Ulm (RIC/AM) daniel.sonntag@dfki.de GI 2004 Agenda Introduction and Background to Text Quality Text Quality Dimensions Intrinsic Text Quality,

More information

Using idocument for Document Categorization in Nepomuk Social Semantic Desktop

Using idocument for Document Categorization in Nepomuk Social Semantic Desktop Using idocument for Document Categorization in Nepomuk Social Semantic Desktop Benjamin Adrian, Martin Klinkigt,2 Heiko Maus, Andreas Dengel,2 ( Knowledge-Based Systems Group, Department of Computer Science

More information

S Y N T A X A N A L Y S I S LR

S Y N T A X A N A L Y S I S LR LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number

More information

Finite Math Linear Programming 1 May / 7

Finite Math Linear Programming 1 May / 7 Linear Programming Finite Math 1 May 2017 Finite Math Linear Programming 1 May 2017 1 / 7 General Description of Linear Programming Finite Math Linear Programming 1 May 2017 2 / 7 General Description of

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

New in WorldMap Version 1.5 Center for Geographic Analysis, Harvard

New in WorldMap Version 1.5 Center for Geographic Analysis, Harvard New in Version 1.5 Center for Geographic Analysis, Harvard 1.0 Overview This document provides guidance for the new Version 1.5 features. For information on the other parts of the system please use the

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Homework 2: HMM, Viterbi, CRF/Perceptron

Homework 2: HMM, Viterbi, CRF/Perceptron Homework 2: HMM, Viterbi, CRF/Perceptron CS 585, UMass Amherst, Fall 2015 Version: Oct5 Overview Due Tuesday, Oct 13 at midnight. Get starter code from the course website s schedule page. You should submit

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017) Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week

More information

US Geo-Explorer User s Guide. Web:

US Geo-Explorer User s Guide. Web: US Geo-Explorer User s Guide Web: http://usgeoexplorer.org Updated on October 26, 2016 TABLE OF CONTENTS Introduction... 3 1. System Interface... 5 2. Administrative Unit... 7 2.1 Region Selection... 7

More information

An Introduction to Hidden Markov Models

An Introduction to Hidden Markov Models An Introduction to Hidden Markov Models Max Heimel Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin http://www.dima.tu-berlin.de/ 07.10.2010 DIMA TU Berlin 1 Agenda

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

Automated Extraction of Event Details from Text Snippets

Automated Extraction of Event Details from Text Snippets Automated Extraction of Event Details from Text Snippets Kavi Goel, Pei-Chin Wang December 16, 2005 1 Introduction We receive emails about events all the time. A message will typically include the title

More information

CIF Changes to the specification. 27 July 2011

CIF Changes to the specification. 27 July 2011 CIF Changes to the specification 27 July 2011 This document specifies changes to the syntax and binary form of CIF. We refer to the current syntax specification of CIF as CIF1, and the new specification

More information