Handling Place References in Text
|
|
- Arthur Manning
- 6 years ago
- Views:
Transcription
1 Handling Place References in Text
2 Introduction Most (geographic) information is available in the form of textual documents Place reference resolution involves two-subtasks: Recognition : Delimiting occurrences of place references in text Disambiguation : Resolving place references to geo-coordinates Essential task in Geographic Information Retrieval Supports access through geography to textual documents Existing methods mostly rely on hand-tuned heuristics Labor-intensive to develop, optimize and maintain Current research focusing on data-driven methods Main problems are related to natural language ambiguity!
3 Ambiguity Geographic/non-geographic ambiguity refers to the case of place names having other, non geographic meanings: Reading in England, Buffalo in the US, Should be addressed while recognizing place references Geographic/geographic ambiguity arises when multiple distinct places share the same name: Almost every major city in Europe has a sister city of the same name in the New World Should be addressed while disambiguating place references
4 Place reference recognition Approaches based on dictionaries Sliding window approaches Aho-Corasick algorithm (finite state automaton) Good recall, good performance, often poor precision Approaches based on rules Regular expression patterns (finite state automaton) Grammatical rules Large human effort involved in creating the rules Approaches based on machine learning Hidden Markov Modeling Conditional Random Fields Good generalization behavior, requires large amounts of training data
5 Aho-Corasick String Matching Locate all occurrences of any of a finite number of keywords (e.g., location names) in a string of text. Consists of two steps: Constructing a finite state pattern matching machine from the keywords Using the pattern matching machine to process the text string in a single pass
6 Pattern Matching Machine Let P y, y,, y 1 2 k be a finite set of string patterns which we shall call keywords Let x be an arbitrary string which we shall call the text string (i.e., the document). The behavior of the pattern matching machine is dictated by three functions: a goto function g a failure function f an output function output
7 The Pattern Matching Machine Goto function g :maps a pair consisting of a state and an input symbol into a state or fail. Failure function f :maps a state into a state, and is consulted whenever the goto function reports fail. Fast transitions between failed pattern matches (e.g. a search for cat in a tree that does not contain cat, but contains cart, and thus would fail at the node prefixed by ca) to other branches of the tree that share a common prefix (e.g., in the previous case, a branch for attribute might be the best lateral transition) Output function:associating a set of keyword patterns (possibly empty) with every state.
8 Aho-Corasick Algorithm Pattern Tree State Machine h 0 s Goto Function e 1 i 3 h Black Arrows Failure Function Blue Arrows s r s 7 4 e 5 Output Function Red Dots 9 Pattern set { he, she, his, hers }
9 Aho-Corasick Search Algorithm l: the starting position in Text String T c: the current character of T to be compared with a character on the tree K w: the current node on the tree K Input: Pattern set P and text T Output: all occurrences in T of any pattern from P Algorithm: Aho-Corasick l=1; c=1; w=root of K Repeat while there is an edge (w, w ) labeled with T[c] if w` is numbered by pattern i then report that p i occurs in T starting at l; w=w ; c++; w=failure(w) and l=c-length-prefix(w); Until c> T
10 Hidden Markov Models HMMs are the standard sequence modeling tool in NLP and IE Finite state model Graphical model... S t - 1 S t S t+1 transitions observations Generates: O t - 1 O t O t +1 State sequence Observation sequence o 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 P( s, o) o t1 P( s t s t1 ) P( ot st ) Parameters: for all states S={s 1,s 2, } Start state probabilities: P(s t ) Transition probabilities: P(s t s t-1 ) Observation (emission) probabilities: P(o t s t ) Training: Maximize probability of training observations
11 Placename Extraction with HMMs Given a sequence of observations: Yesterday Bruno Martins went to Campo Grande and a trained HMM: person name location name background Find the most likely state sequence: (Viterbi) arg max s P( s, o) Yesterday Bruno Martins went to Campo Grande. Any words said to be generated by the designated location name state are extracted as a location name: Location name: Campo Grande
12 B-I-O Encoding Encode the chunking problem of recognizing place references into a tagging problem os assigning classes to individual word tokens. Begin_place Inside_place Other O B_per I_per O O B_loc I_loc Yesterday Bruno Martins went to Campo Grande.
13 Hidden Markov Models Learning the model with training data General algorithm based on Expectation-Maximization (EM) 1. Initialise model λ 0 2. Compute new model λ, using λ 0 and observed sequence 3. Adjust the model λ 0 λ 4. Repeat steps 2 and 3 until log P(X,Y λ) log P(X,Y λ 0 ) < d Using the model (i.e., decoding) Choose output label sequence that maximizes the probability of the token observation sequence Viterbi dynamic programming algorithm that keeps the best label sequence at each instance
14 The Viterbi Algorithm The algorithm sweeps through all the tag possibilities for each word, computing the best sequence leading to each possibility. Dynamic Programming Approach: The key that makes this algorithm efficient is that we only need to know the best sequences leading to the previous word, because of the Markov assumption used in the Model.
15 The Viterbi Algorithm Let T = # of tags in our annotation problem (e.g., B-I-O tags for each entity type) W = # of words in the text to be annotated /* Initialization Step */ for t = 1 to T Score(t, 1) = Pr(Word 1 Tag t ) * Pr(Tag t ) BackPtr(t, 1) = 0; /* Iteration Step */ for w = 2 to W for t = 1 to T Score(t, w) = Pr(Word w Tag t ) *M AX j=1,t (Score(j, w-1) * Pr(Tag t Tag j )) BackPtr(t, w) = index of j that gave the max above /* Sequence Identification */ Seq(W ) = t that maximizes Score(t,W ) for w = W -1 to 1 Seq(w) = BackPtr(Seq(w+1),w+1)
16 Disambiguation and Gazetteers Place reference disambiguation relies on (external) gazetteer data for places. A gazetteer is a database associating place names to the corresponding place metadata Similar to address geocoding service
17 Some Popular Gazetteer Services The Alexandria Digital Library (ADL) Gazetteer Pioneering effort in defining data models and XML access protocols for managing gazetteer data Their dataset was built by integrating data from multiple sources, but usage requires a private license The geonames.org world gazetteer Dataset built by integrating data from multiple sources, with 8 million geographic names, in multiple languages, for more than 6.5 million unique geographic features Geographic features are only associated with centroid coordinates, as opposed to polygons or MBRs Does not include historical place names of time periods The Getty Thesaurus of Geographical Names Describes about 1 million places around the globe, with alternative names in multiple languages Usage of TGN data requires a private license Includes historical place names (associated with time periods), but not names of historical periods The Yahoo! Geoplanet Database Many more
18 Place Reference Disambiguation Most approaches leverage on contextual information: External : information on gazetteers (e.g., population, types,...) Internal : words and other entities surrounding the place reference. Disambiguation heuristics can be grouped into: Default senses : Disambiguation should be made to the most important candidate referent, estimated with basis on geometric area or population. Spatial minimalism : Disambiguation should be made to the candidate that minimizes the distance towards other place references in the same context, or the geometric area that covers all place references in the same context. Attribute coherence : Disambiguation should be made to the candidate referent that has attributes (e.g., the place type) similar to those that are mentioned in the textual context where the reference appears.
19 Disambiguation with Machine Learning Disambiguation can be seen as a problem of ranking candidate referents and choosing the best candidate The ranking can be based on a estimation of the geospatial distance between the candidate referent and a referent corresponding to the true disambiguation Regression models used to estimate geospatial distance Several features that are co-related with the geospatial distance Find a function that combines the available features in order to estimate the geospatial distance associated to the candidate Linear regression Genetic Programming SVM regression
20 Disambiguation with Machine Learning
21 Disambiguation Features String similarity between candidate name for the referent and the reference string in the text Population count for the candidate referent Geospatial area for the candidate referent Number of alternative names for the candidate referent Geospatial distance between candidate referent and closest interpretation for place references in the same textual unit (e.g., the same paragraph). Area of the convex hull covering candidate referent and all candidates of place references in the same text unit many more have been tested in the related literature
22 State of the art results
23 Current research challenges Some commercial services already exist... Yahoo! Placemaker Metacarta Text Geotagging Service But there are many open research challenges: Multilingual place reference resolution with Mach. Learning Requires more annotation standards/corpora such as SpatialML Using advanced sequence tagging models Considering other geospatial reference resolution tasks: Resolution of geospatial relations given in text Fine-grained classification of place references in text
24 Questions?
Annotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationGIR experiements with Forostar at GeoCLEF 2007
GIR experiements with Forostar at GeoCLEF 2007 Simon Overell 1, João Magalhães 1 and Stefan Rüger 2,1 1 Multimedia & Information Systems Department of Computing, Imperial College London, SW7 2AZ, UK 2
More informationA CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012
A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of
More informationShallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001
Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques
More informationA Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models
A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University
More informationConceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence
More informationA Comparison of Different Approaches for Assigning Geographic Scopes to Documents
A Comparison of Different Approaches for Assigning Geographic Scopes to Documents Ivo Anastácio, Bruno Martins, and Pável Calado Instituto Superior Técnico, INESC-ID, Av. Professor Cavaco Silva, 2744-016
More informationThe Language for Specifying Lexical Analyzer
The Language for Specifying Lexical Analyzer We shall now study how to build a lexical analyzer from a specification of tokens in the form of a list of regular expressions The discussion centers around
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationLinked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library
Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library SWIB14 Semantic Web in Libraries Conference Bonn, December 2014 Outline Introduction to The
More informationLinked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library
Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationAutomatic Linguistic Indexing of Pictures by a Statistical Modeling Approach
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Outline Objective Approach Experiment Conclusion and Future work Objective Automatically establish linguistic indexing of pictures
More informationGeoreferencing Wikipedia pages using language models from Flickr
Georeferencing Wikipedia pages using language models from Flickr Chris De Rouck 1, Olivier Van Laere 1, Steven Schockaert 2, and Bart Dhoedt 1 1 Department of Information Technology, IBBT, Ghent University,
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More informationOptimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction
Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction Jiyi Xiao Lamei Zou Chuanqi Li School of Computer Science and Technology, University of South China, Hengyang 421001,
More informationSupervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example
Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Many machine learning models for coreference resolution have been created, using not only different feature sets but also fundamentally
More informationQuery classification by using named entity recognition systems and clue keywords
Query classification by using named entity recognition systems and clue keywords Masaharu Yoshioka Graduate School of Information Science and echnology, Hokkaido University N14 W9, Kita-ku, Sapporo-shi
More informationDetection and Extraction of Events from s
Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationReading group on Ontologies and NLP:
Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.
More informationClosing the Loop in Webpage Understanding
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Closing the Loop in Webpage Understanding Chunyu Yang, Student Member, IEEE, Yong Cao, Zaiqing Nie, Jie Zhou, Senior Member, IEEE, and Ji-Rong Wen
More informationSemantically Driven Snippet Selection for Supporting Focused Web Searches
Semantically Driven Snippet Selection for Supporting Focused Web Searches IRAKLIS VARLAMIS Harokopio University of Athens Department of Informatics and Telematics, 89, Harokopou Street, 176 71, Athens,
More informationMetadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online
Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Yingjie Hu 1, Krzysztof Janowicz 1, Sathya Prasad 2, and Song Gao 1 1 STKO Lab, Department
More informationHidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi
Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential
More informationSequence Labeling: The Problem
Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used
More informationJuggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets
Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju (Purdue University) Navendu Jain (Microsoft Research) Cristina Nita-Rotaru (Purdue University) April
More informationAn Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information
An Evaluation of Geo-Ontology Representation Languages for Supporting Web Retrieval of Geographical Information P. Smart, A.I. Abdelmoty and C.B. Jones School of Computer Science, Cardiff University, Cardiff,
More information3 Publishing Technique
Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach
More informationSemi-Markov Conditional Random Fields for Information Extraction
Semi-Markov Conditional Random Fields for Information Extraction S U N I T A S A R A W A G I A N D W I L L I A M C O H E N N I P S 2 0 0 4 P R E S E N T E D B Y : D I N E S H K H A N D E L W A L S L I
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationToken Identification Using HMM and PPM Models
Token Identification Using HMM and PPM Models Yingying Wen, Ian H. Witten, and Dianhui Wang School of Computer Science and Software Engineering Monash University, Clayton, Victoria 3800, AUSTRALIA ywen@csse.monash.edu.au
More informationHidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney
Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder
More informationThe Edinburgh Geoparser
The Edinburgh Geoparser A Tool to Geoparse Text Beatrice Alex balex@inf.ed.ac.uk, @bea_alex Projects UK Connectivity DEEP Palimpsest LitLong GAP/GapVis The developers Claire Grover, Richard Tobin, Kate
More informationIntelligent Hands Free Speech based SMS System on Android
Intelligent Hands Free Speech based SMS System on Android Gulbakshee Dharmale 1, Dr. Vilas Thakare 3, Dr. Dipti D. Patil 2 1,3 Computer Science Dept., SGB Amravati University, Amravati, INDIA. 2 Computer
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationNamed Entity Detection and Entity Linking in the Context of Semantic Web
[1/52] Concordia Seminar - December 2012 Named Entity Detection and in the Context of Semantic Web Exploring the ambiguity question. Eric Charton, Ph.D. [2/52] Concordia Seminar - December 2012 Challenge
More informationBest Practices for World-Class Search
Best Practices for World-Class Search MARY HOLSTEGE Distinguished Engineer, MarkLogic @mathling 4 June 2018 MARKLOGIC CORPORATION SLIDE: 2 4 June 2018 MARKLOGIC CORPORATION Search Application: Search for
More informationImproving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationExtracting Layers and Recognizing Features for Automatic Map Understanding. Yao-Yi Chiang
Extracting Layers and Recognizing Features for Automatic Map Understanding Yao-Yi Chiang 0 Outline Introduction/ Problem Motivation Map Processing Overview Map Decomposition Feature Recognition Discussion
More informationBMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling. Colin Dewey (adapted from slides by Mark Craven)
BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling Colin Dewey (adapted from slides by Mark Craven) 2007.04.12 1 Modeling RNA with Stochastic Context Free Grammars consider
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending
More informationInteger Linear Programming
Integer Linear Programming Micha Elsner April 5, 2017 2 Integer linear programming A framework for inference: Reading: Clarke and Lapata 2008 Global Inference for Sentence Compression An Integer Linear
More informationBus Detection and recognition for visually impaired people
Bus Detection and recognition for visually impaired people Hangrong Pan, Chucai Yi, and Yingli Tian The City College of New York The Graduate Center The City University of New York MAP4VIP Outline Motivation
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationConditional Random Fields for Object Recognition
Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu
More informationMining Wikipedia for Geospatial Entities. and Relationships
Mining Wikipedia for Geospatial Entities and Relationships by Jeremy T. Witmer B.S., University of Colorado, Colorado Springs, 2005 A thesis submitted to the Graduate Faculty of the University of Colorado
More informationMachine Learning in GATE
Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE Supervised learning Effort
More informationChapter 10. Conclusion Discussion
Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with
More informationPresented by Kit Na Goh
Developing A Geo-Spatial Search Tool Using A Relational Database Implementation of the FGDC CSDGM Model Presented by Kit Na Goh Introduction Executive Order 12906 was issued on April 13, 1994 with the
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationAssignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018
Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments
More informationIntroduction to SLAM Part II. Paul Robertson
Introduction to SLAM Part II Paul Robertson Localization Review Tracking, Global Localization, Kidnapping Problem. Kalman Filter Quadratic Linear (unless EKF) SLAM Loop closing Scaling: Partition space
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationIntroduction to Lexical Analysis
Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More informationLecture 9. Support Vector Machines
Lecture 9. Support Vector Machines COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Support vector machines (SVMs) as maximum
More informationAssignment 4 CSE 517: Natural Language Processing
Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set
More informationNERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017
NERD workshop Luca Foppiano @ ALMAnaCH - Inria Paris Berlin, 18/09/2017 Agenda Introducing the (N)ERD service NERD REST API Usages and use cases Entities Rigid textual expressions corresponding to certain
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:
More informationBias-Variance Trade-off + Other Models and Problems
CS 1699: Intro to Computer Vision Bias-Variance Trade-off + Other Models and Problems Prof. Adriana Kovashka University of Pittsburgh November 3, 2015 Outline Support Vector Machines (review + other uses)
More informationSemi-Supervised Learning of Named Entity Substructure
Semi-Supervised Learning of Named Entity Substructure Alden Timme aotimme@stanford.edu CS229 Final Project Advisor: Richard Socher richard@socher.org Abstract The goal of this project was two-fold: (1)
More informationModeling Sequence Data
Modeling Sequence Data CS4780/5780 Machine Learning Fall 2011 Thorsten Joachims Cornell University Reading: Manning/Schuetze, Sections 9.1-9.3 (except 9.3.1) Leeds Online HMM Tutorial (except Forward and
More informationA Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval
A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval Florent Perronnin, Yan Liu and Jean-Michel Renders Xerox Research Centre Europe (XRCE) Textual and
More informationDigital Libraries: Language Technologies
Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................
More informationApplying Auto-Data Classification Techniques for Large Data Sets
SESSION ID: PDAC-W02 Applying Auto-Data Classification Techniques for Large Data Sets Anchit Arora Program Manager InfoSec, Cisco The proliferation of data and increase in complexity 1995 2006 2014 2020
More informationToken Gazetteer and Character Gazetteer for Named Entity Recognition
Token Gazetteer and Character Gazetteer for Named Entity Recognition Giang Nguyen, Štefan Dlugolinský, Michal Laclavík, Martin Šeleng Institute of Informatics, Slovak Academy of Sciences Dúbravská cesta
More informationLecture 5: Markov models
Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationINSPIRE WS2 METADATA: Describing GeoSpatial Data
WS2 METADATA: Describing GeoSpatial Data Susana Fontano Planning General concepts about metadata The use of standards Items about the creation of metadata Software How to create metadata The ISO19115 Standard
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationAutomatic Linguistic Indexing of Pictures by a Statistical Modeling Approach
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based
More informationExtending Blaise Capabilities in Complex Data Collections
Extending Blaise Capabilities in Complex Data Collections Paul Segel and Kathleen O Reagan,Westat International Blaise Users Conference, April 2012, London, UK Summary: Westat Visual Survey (WVS) was developed
More informationECP-2007-GEO OneGeology-Europe. Annex 1: Cookbook
ECP-2007-GEO-317001 OneGeology-Europe Annex 1: Cookbook for creating multilingual metadata records using the OneGeology-Europe Metadata system (MIcKA) Authors: Lucie Kondrová, Robert Tomas, Štěpán Kafka
More informationRevealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization
Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the
More informationEnhanced retrieval using semantic technologies:
Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008
More informationSuccinct dictionary matching with no slowdown
LIAFA, Univ. Paris Diderot - Paris 7 Dictionary matching problem Set of d patterns (strings): S = {s 1, s 2,..., s d }. d i=1 s i = n characters from an alphabet of size σ. Queries: text T occurrences
More informationNOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION
NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION * Prof. Dr. Ban Ahmed Mitras ** Ammar Saad Abdul-Jabbar * Dept. of Operation Research & Intelligent Techniques ** Dept. of Mathematics. College
More informationGetting Started with Omeka Music Library Association March 5, 2016
Quick setup v Sign up for a basic Omeka.net account at http://omeka.net. Additional help with creating an account can be found on the Manage Websites & Account page http://info.omeka.net/manage- an- account/]
More informationAssessing the Quality of Natural Language Text
Assessing the Quality of Natural Language Text DC Research Ulm (RIC/AM) daniel.sonntag@dfki.de GI 2004 Agenda Introduction and Background to Text Quality Text Quality Dimensions Intrinsic Text Quality,
More informationUsing idocument for Document Categorization in Nepomuk Social Semantic Desktop
Using idocument for Document Categorization in Nepomuk Social Semantic Desktop Benjamin Adrian, Martin Klinkigt,2 Heiko Maus, Andreas Dengel,2 ( Knowledge-Based Systems Group, Department of Computer Science
More informationS Y N T A X A N A L Y S I S LR
LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number
More informationFinite Math Linear Programming 1 May / 7
Linear Programming Finite Math 1 May 2017 Finite Math Linear Programming 1 May 2017 1 / 7 General Description of Linear Programming Finite Math Linear Programming 1 May 2017 2 / 7 General Description of
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationNew in WorldMap Version 1.5 Center for Geographic Analysis, Harvard
New in Version 1.5 Center for Geographic Analysis, Harvard 1.0 Overview This document provides guidance for the new Version 1.5 features. For information on the other parts of the system please use the
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationHomework 2: HMM, Viterbi, CRF/Perceptron
Homework 2: HMM, Viterbi, CRF/Perceptron CS 585, UMass Amherst, Fall 2015 Version: Oct5 Overview Due Tuesday, Oct 13 at midnight. Get starter code from the course website s schedule page. You should submit
More informationInvariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction
Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of
More informationSchool of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)
Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week
More informationUS Geo-Explorer User s Guide. Web:
US Geo-Explorer User s Guide Web: http://usgeoexplorer.org Updated on October 26, 2016 TABLE OF CONTENTS Introduction... 3 1. System Interface... 5 2. Administrative Unit... 7 2.1 Region Selection... 7
More informationAn Introduction to Hidden Markov Models
An Introduction to Hidden Markov Models Max Heimel Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin http://www.dima.tu-berlin.de/ 07.10.2010 DIMA TU Berlin 1 Agenda
More informationInformation Extraction Techniques in Terrorism Surveillance
Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism
More informationAutomated Extraction of Event Details from Text Snippets
Automated Extraction of Event Details from Text Snippets Kavi Goel, Pei-Chin Wang December 16, 2005 1 Introduction We receive emails about events all the time. A message will typically include the title
More informationCIF Changes to the specification. 27 July 2011
CIF Changes to the specification 27 July 2011 This document specifies changes to the syntax and binary form of CIF. We refer to the current syntax specification of CIF as CIF1, and the new specification
More information