Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation

Size: px
Start display at page:

Download "Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation"

Transcription

1 Center for Reflected Text Analytics Lecture 2 Annotation tools & Segmentation

2 Summary of Part 1 Annotation theory Guidelines Inter-Annotator agreement Inter-subjective annotations Annotation exercise Discuss disagreements with your neighbor Improve annotation guidelines University of Stuttgart 2

3 Annotation Tool Support Tools can support the annotation process at various stages Managing multiple annotators Assign documents to annotate Supervise their progress Analyse disagreements Display disagreements (only) Calculate quantitative IAA (κ) Create a gold standard Make decisions on disagreements Record final decisions Usable tools: See handout University of Stuttgart 3

4 Segmentation University of Stuttgart 4

5 Segmentation Tool Download = University of Stuttgart 5

6 Segmentation Abstract definition No meaning of a segment implied The task of separating a text into multiple parts ( segments ) Segmentation according to various criteria based on Structure (chapters, acts, letters, speeches) Linguistics (sentences, paragraphs) Narrative content (scenes, time, place) Content (topics under discussion) No generic criterion covering multiple research questions University of Stuttgart 6

7 Segmentation Viewpoints Focus on segments Spans of text Focus on segment boundaries Positions in a text Views are equivalent we will switch between them when appropriate Segment 1 Segment 2 Segment 3 Segment 4 Segment Boundary Segment Boundary Segment Boundary University of Stuttgart 7

8 Entities + Segments = Networks Mary Peter Paul Co-Occurrence Network University of Stuttgart 8

9 Entities + Segments = Networks Slightly more abstract description Segmented text with the appearing entities {A, B}, {A, B, B, B, A}, {A, C} A B C Convert into an (quadratic) adjacency matrix Diagonal is typically uninteresting Matrix is symmetric A 2 1 B 2 0 C 1 0 Create network A node is created for each row (or cell) A An edge is created for each cell, weighted according to cell value B C University of Stuttgart 9

10 Segmentation Annotation Theoretically Segments can be annotated just like entity references Both cover sequences of words Appropriate annotation guidelines would define when to annotate segments Practically Segmentation criterion closely tied to research question No reasonable generic abstraction layer That works for multiple research questions and/or text corpora Single texts only contain a few segments Much more annotated texts needed for any kind of automatisation University of Stuttgart 10

11 Segment Annotation Tool Web-based UI Beta-Software Automatic annotation through rules and tools Entity annotation Stanford Named Entity Recognizer (Finkel et al., 2005) Only proper names, no descriptive noun phrases Rules (regular expressions) to specify the entity references Segment annotation Rules (regular expressions) to specify the segment boundaries Unsupervised segmentation algorithm (TextTiling; Hearst, 1994) Network export Gephi University of Stuttgart 11

12 Gephi Network Tool Free and open source Wide range of metric, filter and layout algorithms Network editing (e.g., merge nodes) Plugins Export into static images University of Stuttgart 12

13 demo University of Stuttgart 13

14 Regular Expressions Useful text processing skills 101 A powerful way to describe sets of character sequences Many search tools support REs, and all programming languages do Looks cryptic, but is quite systematic REs on slides/handout are marked in forward slashes / / for readability they don t need to be typed in the tool Basics Many regular characters stand for themselves The RE /a/ finds occurrences of the character a Sequences of characters stand for sequences of themselves The RE /the/ finds occurrences of the string the University of Stuttgart 14

15 Regular Expressions Basics Many regular characters stand for themselves The RE /a/ finds occurrences of the character a Sequences of characters stand for sequences of themselves The RE /the/ finds occurrences of the string the Meta characters ( quantifiers ) are applied on the previous character?: previous character optional (0-1 times) /them?/ finds both the and them +: Previous character one or more times /ab+/ finds ab, abb, abbb, The kleene star * finds the previous character zero or more times /ab*/ finds a, ab, abb, abbb, University of Stuttgart 15

16 Regular Expressions Alternations and Character Classes /(re1 re2)/ finds everything that finds either re1 or re2 /(good better best)/ finds comparative and superlative forms of the adjective good /great(er est)?/ finds comp. and sup. forms of great The question mark makes the suffixes optional We can mark alternatives on character level in square brackets: [ ] /[Tt]he/ finds upper and lower case forms of the Square brackets support ranges of characters /[A-Z]/ finds upper-case characters (beware: locale) /[0-9]/ finds digits University of Stuttgart 16

17 Regular Expressions Special cases and exceptions The dot. matches everything /a.*b/ finds everything that begins with a and ends with b Escape character: Backslash In order to find a dot, we need to prevent its special meaning /.*\.doc/ finds everything that ends on.doc (e.g., filenames) University of Stuttgart 17

18 Regular Expressions Real examples Chapter 10. /Chapter [0-9]+\./ Chapter V. (Roman numbers) /Chapter [IVXCM]+\./ Beware: Possible over-matching Dates: MAY 22., AUGUST 23. /[A-Z]+ [0-9]+\./ Beware: Possible over-matching University of Stuttgart 18

19 TextTiling Hearst (1994) Unsupervised segmentation algorithm, developed for expository texts Compares lexicon in a window left and right of a target sentence gap step size = 3 n n+1 sentence boundary window size = v 1 = v 7 2 = dist(v 2 1, v 2 ) = 0 9 d n University of Stuttgart 19

20 TextTiling Hearst (1994) Unsupervised segmentation algorithm, developed for expository texts Compares lexicon in a window left and right of a target sentence gap n n+1 sentence boundary d n d n+3 University of Stuttgart 20

21 TextTiling Hearst (1994) More powerful algorithms are available E.g., topic segmentation Clear adaptation possibilities How to create word vectors? Which words are included (function/content words)? Which value is represented in the vector (frequency, tf*idf, information, ) How to calculate similarity/distance? Cosine, manhattan, But: Evaluation is hard No gold standard available Different expectations University of Stuttgart 21

22 Hands-On Session 2 Go to Load a text of your liking (it s better if you are familiar with it) Add entity references by applying the Stanford NER system Make a brief check, if the important entities are included ( Passepartout, for instance, is not) You can add specific names by specifying regular expressions Add reasonable segment annotations Export a GEXF file and load it into Gephi Play with various options and see how the network changes University of Stuttgart 22

A bit of theory: Algorithms

A bit of theory: Algorithms A bit of theory: Algorithms There are different kinds of algorithms Vector space models. e.g. support vector machines Decision trees, e.g. C45 Probabilistic models, e.g. Naive Bayes Neural networks, e.g.

More information

--- stands for the horizontal line.

--- stands for the horizontal line. Content Proofs on zoxiy Subproofs on zoxiy Constants in proofs with quantifiers Boxed constants on zoxiy Proofs on zoxiy When you start an exercise, you re already given the basic form of the proof, with

More information

A Multilingual Social Media Linguistic Corpus

A Multilingual Social Media Linguistic Corpus A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th

More information

Lecture 14: Annotation

Lecture 14: Annotation Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose

More information

University of Sheffield, NLP. Chunking Practical Exercise

University of Sheffield, NLP. Chunking Practical Exercise Chunking Practical Exercise Chunking for NER Chunking, as we saw at the beginning, means finding parts of text This task is often called Named Entity Recognition (NER), in the context of finding person

More information

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points? Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not

More information

from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington

from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington Vol. 4 (2010), pp. 60-65 http://nflrc.hawaii.edu/ldc/ http://hdl.handle.net/10125/4467 TypeCraft from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington 1. OVERVIEW.

More information

Clustering analysis of gene expression data

Clustering analysis of gene expression data Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains

More information

Regular Expressions for Linguists: A Life Skill

Regular Expressions for Linguists: A Life Skill .. Regular Expressions for Linguists: A Life Skill Michael Yoshitaka Erlewine mitcho@mitcho.com Hackl Lab Turkshop March 2013 Regular Expressions What are regular expressions? Regular Expressions (aka

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

What s new in VisibleThread Docs Words Matter. Oct 19 th, Webinar

What s new in VisibleThread Docs Words Matter. Oct 19 th, Webinar What s new in VisibleThread Docs 2.14 Words Matter Oct 19 th, 2017 - Webinar Operational Notes & Agenda 1. Call will last between 30-40 minutes 2. Please ask any questions using the Questions facility

More information

Parallel Concordancing and Translation. Michael Barlow

Parallel Concordancing and Translation. Michael Barlow [Translating and the Computer 26, November 2004 [London: Aslib, 2004] Parallel Concordancing and Translation Michael Barlow Dept. of Applied Language Studies and Linguistics University of Auckland Auckland,

More information

A Mathematical Model For Treatment Selection Literature

A Mathematical Model For Treatment Selection Literature A Mathematical Model For Treatment Selection Literature G. Duncan and W. W. Koczkodaj Abstract Business Intelligence tools and techniques, when applied to a data store of bibliographical references, can

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A process has a: 1) A) pronoun label B) noun phrase label C) verb phrase label D) adjective

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning Cluster Validation Ke Chen Reading: [5.., KPM], [Wang et al., 9], [Yang & Chen, ] COMP4 Machine Learning Outline Motivation and Background Internal index Motivation and general ideas Variance-based internal

More information

VisibleThread for Docs 2.12 Product Update & Release Notes

VisibleThread for Docs 2.12 Product Update & Release Notes VisibleThread for Docs 2.12 Product Update & Release Notes October 2015 This major release focused on enhanced usability along with adding some great new features and tweaking some old ones. Here are the

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

NLP Final Project Fall 2015, Due Friday, December 18

NLP Final Project Fall 2015, Due Friday, December 18 NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,

More information

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following

More information

WebAnno: a flexible, web-based annotation tool for CLARIN

WebAnno: a flexible, web-based annotation tool for CLARIN WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike

More information

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation

More information

Tools for Annotating and Searching Corpora Practical Session 1: Annotating

Tools for Annotating and Searching Corpora Practical Session 1: Annotating Tools for Annotating and Searching Corpora Practical Session 1: Annotating Stefanie Dipper Institute of Linguistics Ruhr-University Bochum Corpus Linguistics Fest (CLiF) June 6-10, 2016 Indiana University,

More information

Analytical Evaluation

Analytical Evaluation Analytical Evaluation November 7, 2016 1 Questions? 2 Overview of Today s Lecture Analytical Evaluation Inspections Performance modelling 3 Analytical Evaluations Evaluations without involving users 4

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Compiler Design 1. Introduction to Programming Language Design and to Compilation

Compiler Design 1. Introduction to Programming Language Design and to Compilation Compiler Design 1 Introduction to Programming Language Design and to Compilation Administrivia Lecturer: Kostis Sagonas (Hus 1, 352) Course home page: http://user.it.uu.se/~kostis/teaching/kt1-11 If you

More information

Higher National Unit Specification. General information for centres. Unit title: CAD: 3D Modelling. Unit code: DW13 34

Higher National Unit Specification. General information for centres. Unit title: CAD: 3D Modelling. Unit code: DW13 34 Higher National Unit Specification General information for centres Unit code: DW13 34 Unit purpose: This Unit is designed to introduce candidates to computerised 3D modelling and enable them to understand

More information

Tables & Figures Abstracts ANSC 5307

Tables & Figures Abstracts ANSC 5307 Tables & Figures Abstracts ANSC 5307 Components of Tables & Figures 1. Stand alone Should not need text to explain what s in the table Should not repeat values from the tables or figures verbatim in the

More information

SoundWriter 2.0 Manual

SoundWriter 2.0 Manual SoundWriter 2.0 Manual 1 Overview SoundWriter 2.0 Manual John W. Du Bois SoundWriter (available free from http://www.linguistics.ucsb.edu/projects/transcription, for Windows only) is software designed

More information

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio.

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio. IBM Watson Application Developer Workshop Lab02 Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio January 2017 Duration: 60 minutes Prepared by Víctor L. Fandiño

More information

SEARCH ENGINE OPTIMIZATION Noun The process of maximizing the number of visitors to a particular website by ensuring that the site appears high on the list of results returned by a search engine such as

More information

successes without magic London,

successes without magic London, (\d)(?:\u0020 \u0209 \u202f \u200a){0,1}((m mm cm km V mv µv l ml C Nm A ma bar s kv Hz khz M Hz t kg g mg W kw MW Ah mah N kn obr min µm µs Pa MPa kpa hpa mbar µf db)\b) ^\t*'.+?' => ' (\d+)(,)(\d+)k

More information

Creating an Accessible Microsoft Word document

Creating an Accessible Microsoft Word document Creating an Accessible Microsoft Word document Use Built-in Formatting Styles Using built-in formatting styles could be the single most important step in making documents accessible. Built-in formatting

More information

MAXQDA and Chapter 9 Coding Schemes

MAXQDA and Chapter 9 Coding Schemes MAXQDA and Chapter 9 Coding Schemes Chapter 9 discusses how the structures of coding schemes, alternate groupings are key to moving forward with analysis. The nature and structures of the coding scheme

More information

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation 1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation Administrivia Lecturer: Kostis Sagonas (kostis@it.uu.se) Course home page (of previous year):

More information

Automatic Data Analysis in Visual Analytics Selected Methods

Automatic Data Analysis in Visual Analytics Selected Methods Automatic Data Analysis in Visual Analytics Selected Methods Multimedia Information Systems 2 VU (SS 2015, 707.025) Vedran Sabol Know-Center March 15 th, 2016 2 Lecture Overview Visual Analytics Overview

More information

Lecture 23: Domain-Driven Design (Part 1)

Lecture 23: Domain-Driven Design (Part 1) 1 Lecture 23: Domain-Driven Design (Part 1) Kenneth M. Anderson Object-Oriented Analysis and Design CSCI 6448 - Spring Semester, 2005 2 Goals for this lecture Introduce the main concepts of Domain-Driven

More information

27 Formulas and Variables

27 Formulas and Variables 27 Formulas and Variables Formulas and variables enable you to add custom calculations within reports. One advantage of variables is they are given a name and are re-usable across the whole document, whereas

More information

ATLAS.ti 8 THE NEXT LEVEL.

ATLAS.ti 8 THE NEXT LEVEL. ATLAS.ti 8 THE NEXT LEVEL. SOPHISTICATED DATA ANALYSIS. EASY TO USE LIKE NEVER BEFORE. FREE! BETA www.cloud.atlasti.com www.atlasti.com ATLAS.ti 8 AND ATLAS.ti DATA ANALYSIS WITH ATLAS.ti IS EASIER, FASTER

More information

ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators

ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie Laboratoire LATTICE CNRS, École Normale Supérieure, U Paris 3 Sorbonne Nouvelle

More information

Performing Matrix Operations on the TI-83/84

Performing Matrix Operations on the TI-83/84 Page1 Performing Matrix Operations on the TI-83/84 While the layout of most TI-83/84 models are basically the same, of the things that can be different, one of those is the location of the Matrix key.

More information

2. Design Methodology

2. Design Methodology Content-aware Email Multiclass Classification Categorize Emails According to Senders Liwei Wang, Li Du s Abstract People nowadays are overwhelmed by tons of coming emails everyday at work or in their daily

More information

Step-by-Step Localization Eva Müller

Step-by-Step Localization Eva Müller Step-by-Step Localization Eva Müller Questions, answers and procedures for a successful localization process Steps in localization projects range from what is to be localized, who performs the localization

More information

POFT 2301 INTERMEDIATE KEYBOARDING LECTURE NOTES

POFT 2301 INTERMEDIATE KEYBOARDING LECTURE NOTES INTERMEDIATE KEYBOARDING LECTURE NOTES Be sure that you are reading the textbook information and the notes on the screen as you complete each part of the lessons in this Gregg Keyboarding Program (GDP).

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London What Is Text Clustering? Text Clustering = Grouping a set of documents into classes of similar

More information

Final Project Grading Criteria, CSCI 588, Fall 2001

Final Project Grading Criteria, CSCI 588, Fall 2001 Final Project Grading Criteria, CSCI 588, Fall 2001 Each team is required to hand out one hard copy for your final project on December 4, 2001. The date is firm. No delayed submission will be accepted.

More information

Using Fields, Forms, and Indexes

Using Fields, Forms, and Indexes Lesson 6 Page 1 Using Fields, Forms, and Indexes Lesson Skill Matrix Skill Exam Objective Objective Number Working with Fields Working with Forms Creating Indexes Add custom fields. Modify field properties.

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T The Muc7 T Corpus Katrin Tomanek and Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany {katrin.tomanek udo.hahn}@uni-jena.de 1 Introduction

More information

An Adaptive Framework for Named Entity Combination

An Adaptive Framework for Named Entity Combination An Adaptive Framework for Named Entity Combination Bogdan Sacaleanu 1, Günter Neumann 2 1 IMC AG, 2 DFKI GmbH 1 New Business Department, 2 Language Technology Department Saarbrücken, Germany E-mail: Bogdan.Sacaleanu@im-c.de,

More information

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

Digital Humanities. Tutorial Regular Expressions. March 10, 2014 Digital Humanities Tutorial Regular Expressions March 10, 2014 1 Introduction In this tutorial we will look at a powerful technique, called regular expressions, to search for specific patterns in corpora.

More information

How To Write Maintainable Engineering Specifications. Forrest Warthman

How To Write Maintainable Engineering Specifications. Forrest Warthman 1 How To Write Maintainable Engineering Specifications Forrest Warthman 2 Outline Motivations and audience Editing and vector-graphics tools Document formats and templates Inserting figures and tables

More information

University of Sheffield, NLP. Chunking Practical Exercise

University of Sheffield, NLP. Chunking Practical Exercise Chunking Practical Exercise Chunking for NER Chunking, as we saw at the beginning, means finding parts of text This task is often called Named Entity Recognition (NER), in the context of finding person

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

The KNIME Text Processing Plugin

The KNIME Text Processing Plugin The KNIME Text Processing Plugin Kilian Thiel Nycomed Chair for Bioinformatics and Information Mining, University of Konstanz, 78457 Konstanz, Deutschland, Kilian.Thiel@uni-konstanz.de Abstract. This document

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014 CMPSCI 250: Introduction to Computation Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014 Regular Expressions and Languages Regular Expressions The Formal Inductive Definition

More information

GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS

GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS Dear Participant of the MScIS Program, If you have chosen to follow an internship, one of the requirements is to write a Thesis. This document gives you

More information

Using Microsoft Office 2003 Intermediate Word Handout INFORMATION TECHNOLOGY SERVICES California State University, Los Angeles Version 1.

Using Microsoft Office 2003 Intermediate Word Handout INFORMATION TECHNOLOGY SERVICES California State University, Los Angeles Version 1. Using Microsoft Office 2003 Intermediate Word Handout INFORMATION TECHNOLOGY SERVICES California State University, Los Angeles Version 1.2 Summer 2010 Table of Contents Intermediate Microsoft Word 2003...

More information

1. Query and manipulate data with Entity Framework.

1. Query and manipulate data with Entity Framework. COLLEGE OF INFORMATION TECHNOLOGY DEPARTMENT OF MULTIMEDIA SCIENCE COURSE SYLLABUS/SPECIFICATION CODE & TITLE: ITMS 434 Developing Windows Azure and Web Services (MCSD 20486) WEIGHT: 2-2-3 PREREQUISITE:

More information

AEMLog Users Guide. Version 1.01

AEMLog Users Guide. Version 1.01 AEMLog Users Guide Version 1.01 INTRODUCTION...2 DOCUMENTATION...2 INSTALLING AEMLOG...4 AEMLOG QUICK REFERENCE...5 THE MAIN GRAPH SCREEN...5 MENU COMMANDS...6 File Menu...6 Graph Menu...7 Analysis Menu...8

More information

Cloze Wizard Version 2.0

Cloze Wizard Version 2.0 Cloze Wizard Version 2.0 Rush Software 1991-2005 Proofing and Testing By Simon Fitzgibbons www.rushsoftware.com.au support@rushsoftware.com.au CONTENTS Overview... p 3 Technical Support... p 4 Installation...

More information

(Refer Slide Time: 00:02:00)

(Refer Slide Time: 00:02:00) Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 18 Polyfill - Scan Conversion of a Polygon Today we will discuss the concepts

More information

University of Sheffield, NLP Annotation and Evaluation

University of Sheffield, NLP Annotation and Evaluation Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield Topics covered Defining annotation guidelines Manual annotation using the GATE GUI Annotation schemas and how they change the

More information

COMP 110 Project 1 Programming Project Warm-Up Exercise

COMP 110 Project 1 Programming Project Warm-Up Exercise COMP 110 Project 1 Programming Project Warm-Up Exercise Creating Java Source Files Over the semester, several text editors will be suggested for students to try out. Initially, I suggest you use JGrasp,

More information

Evaluation & Systems. Ling573 Systems & Applications April 7, 2016

Evaluation & Systems. Ling573 Systems & Applications April 7, 2016 Evaluation & Systems Ling573 Systems & Applications April 7, 2016 Evaluation: Scoring without models Roadmap Content selection: Unsupervised word-weighting approaches Non-trivial baseline system example:

More information

Matlab notes Matlab is a matrix-based, high-performance language for technical computing It integrates computation, visualisation and programming usin

Matlab notes Matlab is a matrix-based, high-performance language for technical computing It integrates computation, visualisation and programming usin Matlab notes Matlab is a matrix-based, high-performance language for technical computing It integrates computation, visualisation and programming using familiar mathematical notation The name Matlab stands

More information

Guide for Creating Accessible Content in D2L. Office of Distance Education. J u n e 2 1, P a g e 0 27

Guide for Creating Accessible Content in D2L. Office of Distance Education. J u n e 2 1, P a g e 0 27 Guide for Creating Accessible Content in D2L Learn how to create accessible web content within D2L from scratch. The guidelines listed in this guide will help ensure the content becomes WCAG 2.0 AA compliant.

More information

Error annotation in adjective noun (AN) combinations

Error annotation in adjective noun (AN) combinations Error annotation in adjective noun (AN) combinations This document describes the annotation scheme devised for annotating errors in AN combinations and explains how the inter-annotator agreement has been

More information

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Hierarchical Clustering INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score

More information

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 14 Python Exercise on knn and PCA Hello everyone,

More information

Work Breakdown Structure

Work Breakdown Structure MossAtre.book Page 491 Sunday, February 9, 2003 7:01 PM APPENDIX Work Breakdown Structure The work breakdown structure in this appendix reflects the contents of the enclosed CD-ROM. TASK_DATA 491 1 Your

More information

Programmiersprache C++ Winter 2005 Operator overloading (48)

Programmiersprache C++ Winter 2005 Operator overloading (48) Evaluation Methods Different methods When the evaluation is done How the evaluation is done By whom the evaluation is done Programmiersprache C++ Winter 2005 Operator overloading (48) When the evaluation

More information

Administrivia. Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday.

Administrivia. Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday. Administrivia Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday. Lab notebooks will be due the week after Thanksgiving, when

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without

More information

1. What tool do you use to check which cells are referenced in formulas that are assigned to the active cell?

1. What tool do you use to check which cells are referenced in formulas that are assigned to the active cell? Q75-100 1. What tool do you use to check which cells are referenced in formulas that are assigned to the active cell? A. Reference Finder B. Range Finder C. Reference Checker D. Address Finder B. Range

More information

Machine Learning in GATE

Machine Learning in GATE Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE Supervised learning Effort

More information

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing Ling/CSE 472: Introduction to Computational Linguistics 5/4/17 Parsing Reminders Revised project plan due tomorrow Assignment 4 is available Overview Syntax v. parsing Earley CKY (briefly) Chart parsing

More information

MATLAB Introduction To Engineering for ECE Topics Covered: 1. Creating Script Files (.m files) 2. Using the Real Time Debugger

MATLAB Introduction To Engineering for ECE Topics Covered: 1. Creating Script Files (.m files) 2. Using the Real Time Debugger 25.108 Introduction To Engineering for ECE Topics Covered: 1. Creating Script Files (.m files) 2. Using the Real Time Debugger SCRIPT FILE 77-78 A script file is a sequence of MATLAB commands, called a

More information

Submission Guideline Checklist

Submission Guideline Checklist Submission Guideline Checklist Please use this checklist as a guide to ensure that files are correctly prepared for submission. Please ensure that you have read the detailed Instructions for Authors before

More information

Automatic Metadata Extraction for Archival Description and Access

Automatic Metadata Extraction for Archival Description and Access Automatic Metadata Extraction for Archival Description and Access WILLIAM UNDERWOOD Georgia Tech Research Institute Abstract: The objective of the research reported is this paper is to develop techniques

More information

Structure Requirements for Written Case Deliverables [ updated: Tuesday, August 17, 2010 ]

Structure Requirements for Written Case Deliverables [ updated: Tuesday, August 17, 2010 ] Structure Requirements for Written Case Deliverables wayne.smith@csun.edu [ updated: Tuesday, August 17, 2010 ] Course: BUS 302 Title: The Gateway Experience (3 units) The skill of writing is to create

More information

DOCUMENTATION CHANGE NOTICE

DOCUMENTATION CHANGE NOTICE DOCUMENTATION CHANGE NOTICE Product/Manual: WORDPERFECT 5.1 WORKBOOK Manual(s) Dated: 12/90, 6/91 and 8/91 Machines: IBM PC and Compatibles This file documents all change made to the documentation since

More information

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation 1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation Administrivia Lecturer: Kostis Sagonas (kostis@it.uu.se) Course home page: http://www.it.uu.se/edu/course/homepage/komp/h18

More information

1DL321: Kompilatorteknik I (Compiler Design 1)

1DL321: Kompilatorteknik I (Compiler Design 1) Administrivia 1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation Lecturer: Kostis Sagonas (kostis@it.uu.se) Course home page: http://www.it.uu.se/edu/course/homepage/komp/ht16

More information

Regular expressions. LING78100: Methods in Computational Linguistics I

Regular expressions. LING78100: Methods in Computational Linguistics I Regular expressions LING78100: Methods in Computational Linguistics I String methods Python strings have methods that allow us to determine whether a string: Contains another string; e.g., assert "and"

More information

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 9: Data Mining (4/4) March 9, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides

More information

Formatting documents for NVivo, in Word 2007

Formatting documents for NVivo, in Word 2007 Formatting documents for NVivo, in Word 2007 Text in an NVivo document can incorporate most of the familiar richness of appearance that word processors provide, such as changes in font type, size and style,

More information

Programming Language Concepts, cs2104 Lecture 04 ( )

Programming Language Concepts, cs2104 Lecture 04 ( ) Programming Language Concepts, cs2104 Lecture 04 (2003-08-29) Seif Haridi Department of Computer Science, NUS haridi@comp.nus.edu.sg 2003-09-05 S. Haridi, CS2104, L04 (slides: C. Schulte, S. Haridi) 1

More information

Contents. CRITERION Vantage 3 Analysis Training Manual. Introduction 1. Basic Functionality of CRITERION Analysis 5. Charts and Reports 17

Contents. CRITERION Vantage 3 Analysis Training Manual. Introduction 1. Basic Functionality of CRITERION Analysis 5. Charts and Reports 17 CRITERION Vantage 3 Analysis Training Manual Contents Introduction 1 Basic Functionality of CRITERION Analysis 5 Charts and Reports 17 Preferences and Defaults 53 2 Contents 1 Introduction 4 Application

More information

EXTENSION. a 1 b 1 c 1 d 1. Rows l a 2 b 2 c 2 d 2. a 3 x b 3 y c 3 z d 3. This system can be written in an abbreviated form as

EXTENSION. a 1 b 1 c 1 d 1. Rows l a 2 b 2 c 2 d 2. a 3 x b 3 y c 3 z d 3. This system can be written in an abbreviated form as EXTENSION Using Matrix Row Operations to Solve Systems The elimination method used to solve systems introduced in the previous section can be streamlined into a systematic method by using matrices (singular:

More information

Alberto Messina, Maurizio Montagnuolo

Alberto Messina, Maurizio Montagnuolo A Generalised Cross-Modal Clustering Method Applied to Multimedia News Semantic Indexing and Retrieval Alberto Messina, Maurizio Montagnuolo RAI Centre for Research and Technological Innovation Madrid,

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

BYTE / BOOL A BYTE is an unsigned 8 bit integer. ABOOL is a BYTE that is guaranteed to be either 0 (False) or 1 (True).

BYTE / BOOL A BYTE is an unsigned 8 bit integer. ABOOL is a BYTE that is guaranteed to be either 0 (False) or 1 (True). NAME CQi tutorial how to run a CQP query DESCRIPTION This tutorial gives an introduction to the Corpus Query Interface (CQi). After a short description of the data types used by the CQi, a simple application

More information

and numbers and no spaces. Press after typing the name. You are now in the Program Editor. Each line of code begins with the colon character ( : ).

and numbers and no spaces. Press after typing the name. You are now in the Program Editor. Each line of code begins with the colon character ( : ). NEW! Calculator Coding Explore the basics of coding using TI Basic, and create your own program. Created by Texas Instruments for their TI Codes curriculum, this activity is a great starting point for

More information

Chapter 6 Evaluation Metrics and Evaluation

Chapter 6 Evaluation Metrics and Evaluation Chapter 6 Evaluation Metrics and Evaluation The area of evaluation of information retrieval and natural language processing systems is complex. It will only be touched on in this chapter. First the scientific

More information

Lecture 1 Getting Started with SAS

Lecture 1 Getting Started with SAS SAS for Data Management, Analysis, and Reporting Lecture 1 Getting Started with SAS Portions reproduced with permission of SAS Institute Inc., Cary, NC, USA Goals of the course To provide skills required

More information

EGR 102 Introduction to Engineering Modeling. Lab 05A Managing Data

EGR 102 Introduction to Engineering Modeling. Lab 05A Managing Data EGR 102 Introduction to Engineering Modeling Lab 05A Managing Data 1 Overview Review Structured vectors in MATLAB Creating Vectors/arrays:» Linspace» Colon operator» Concatenation Initializing variables

More information