Exam III March 17, 2010

Similar documents
ERROR CORRECTION USING NATURAL LANGUAGE PROCESSING. A Thesis NILESH KUMAR JAVAR

Morpho-syntactic Analysis with the Stanford CoreNLP

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

CSC401 Natural Language Computing

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

13.1 End Marks Using Periods Rule Use a period to end a declarative sentence a statement of fact or opinion.

Documentation and analysis of an. endangered language: aspects of. the grammar of Griko

A Multilingual Social Media Linguistic Corpus

NLP Final Project Fall 2015, Due Friday, December 18

CIS 660. Image Searching System using CNN-LSTM. Presented by. Mayur Rumalwala Sagar Dahiwala

THE knowledge needed by software developers

A tool for Cross-Language Pair Annotations: CLPA

Student Guide for Usage of Criterion

THE knowledge needed by software developers is captured

English Understanding: From Annotations to AMRs

Lab II - Product Specification Outline. CS 411W Lab II. Prototype Product Specification For CLASH. Professor Janet Brunelle Professor Hill Price

Ortolang Tools : MarsaTag

- Propositions describe relationship between different kinds

View and Submit an Assignment in Criterion

Identifying Idioms of Source Code Identifier in Java Context

Inter-Annotator Agreement for a German Newspaper Corpus

Sentiment Analysis using Support Vector Machine based on Feature Selection and Semantic Analysis

TS Wikipedia Corpus. TS_Wikipedia_ tri_gram.xml

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): / _20

The Earley Parser

OR, you can download the file nltk_data.zip from the class web site, using a URL given in class.

Unit 4 Voice. Answer Key. Objectives

The CKY algorithm part 1: Recognition

&27L* /,1/D3 D /DQJXDJH,QGHSHQGHQW 1/3$UFKLWHFWXUH XVHGDV*UDPPDU&KHFNHU

Maximum Entropy based Natural Language Interface for Relational Database

Semantic Pattern Classification

Download this zip file to your NLP class folder in the lab and unzip it there.

Course introduction. Marco Kuhlmann Department of Computer and Information Science. Language Technology (2018)

Flow Control Statements

Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora

1. [3 pts] What is your section number, the period your discussion meets, and the name of your discussion leader?

Restricted Use Case Modeling Approach

Lecture 14: Annotation

A Text to Image Story Teller Specially Challenged Children - Natural Language Processing Approach

Language Arts State Performance Indicator Sequence Grade 7. Standard 1- Language

Advanced Topics in Information Retrieval Natural Language Processing for IR & IR Evaluation. ATIR April 28, 2016

A Comparison of Automatic Categorization Algorithms

Complements?? who needs them?

Maca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology

10-1 Active sentences and passive sentences

VOCABULARY Starters Movers Flyers

An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus

Technique For Clustering Uncertain Data Based On Probability Distribution Similarity

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison

Christoph Treude. Bimodal Software Documentation

Logical analysis of texts in a natural language and a sense representation

Final Project Discussion. Adam Meyers Montclair State University

NAME: 1a. (10 pts.) Describe the characteristics of numbers for which this floating-point data type is well-suited. Give an example.

Alphabetical Index referenced by section numbers for PUNCTUATION FOR FICTION WRITERS by Rick Taubold, PhD and Scott Gamboe

Vision Plan. For KDD- Service based Numerical Entity Searcher (KSNES) Version 2.0

Ling/CSE 472: Introduction to Computational Linguistics. 5/9/17 Feature structures and unification

CS 224N Assignment 2 Writeup

A. The following is a tentative list of parts of speech we will use to match an existing parser:

A Short Introduction to CATMA

Ling 571: Deep Processing for Natural Language Processing

Question Answering Using XML-Tagged Documents

AUTOMATIC LFG GENERATION

Multiword deconstruction in AnCora dependencies and final release data

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

EDAN20 Language Technology Chapter 13: Dependency Parsing

ECTACO Partner E500T. English Spanish Talking Electronic Dictionary & Phrasebook USER MANUAL

Project Proposal. Spoke: a Language for Spoken Dialog Management

CSI33 Data Structures

Conceptual and Logical Design

1. He considers himself to be a genius. 2. He considered dieting to be unnecessary. 3. She considered that the waffle iron was broken. 4.

Rushin Shah Linguistic Data Consortium Under the guidance of Prof. Mark Liberman, Prof. Lyle Ungar and Mr. Mohamed Maamouri

Dependency grammar and dependency parsing

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Dependency grammar and dependency parsing

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University

Resilience Unit The Iliad & The Odyssey Subject to Change

Dependency grammar and dependency parsing

Ranking in a Domain Specific Search Engine

DR. H S 4 RULES. I have whittled the list down to 4 essential rules for college writing:

Session Student Book Workbook Grammar Book Vocabulary Structures Functions. Unit 1 Present Simple p Unit 2 Present Progressive p.

SURVEY PAPER ON WEB PAGE CONTENT VISUALIZATION

/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

Welcome to IEP Assistant Pro

ECTACO Partner EFa400T English Farsi Talking Electronic Dictionary & Phrasebook

Is It Better To Give Than To Receive? Unit subject to change

15-110: Principles of Computing, Spring Problem Set 2 (PS2) Due: Friday, February 2 by 2:30PM on Gradescope

Package phrasemachine

Keywords Text clustering, feature selection, part-of-speech, chunking, Standard K-means, Bisecting K-means.

Compsci 101 Fall 2015 Exam 1 Rubric. Problem 1 (24 points)

Corpus Linguistics. Seminar Resources for Computational Linguists SS Magdalena Wolska & Michaela Regneri

Interactive Visualization for Computational Linguistics

Modeling Crisis Management System With the Restricted Use Case Modeling Approach

/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model

Information Extraction Techniques in Terrorism Surveillance

Using NLP to Detect Requirements Defects: an Industrial Experience in the Railway Domain

Frameworks for Natural Language Processing of Textual Requirements

Compiler Construction

J.A.R.V.I.S. Group #14: Yao Rao, # Dianchen Jiang, # Minghui Lin, # Jensen Zhang, #

Transcription:

CIS 4930 NLP Print Your Name Exam III March 17, 2010 Total Score Your work is to be done individually. The exam is worth 106 points (six points of extra credit are available throughout the exam) and it has twelve questions. Unless a problem directly instructs you differently, there are no known errors within this document. If you are instructed to use specific functionality to solve a problem, then follow the guidelines given. Otherwise, you are allowed to utilize anything from Python modules, provided you include all statements allowing access to such functionality. Here is the simplified Brown Tag Set for your reference. Unless otherwise specified, all corpora will be tagged using the definitions of this set. Tag Meaning Tag Meaning Tag Meaning Tag Meaning ADJ Adjective ADV Adverb CNJ Conjunction DET Determiner EX Existential FW Foreign Word MOD Modal Verb N Noun NP Proper Noun NUM Number PRO Pronoun P Preposition TO The Word to UH Interjection V Verb VD Past Tense VG Present Participle VN Past Participle WH wh Determiner 1. [6 pts] Define and describe the following parts of speech. (a) Noun a person, place or thing (b) Past Participle - the form of a verb used to make perfect tenses and passive forms of verbs; verb form following some form of the verb has 2. [5 pts] Define and describe Bayes Rule. A theorem for finding the probability of a fact A being true given that fact B is true. 3. [5 pts] Define and describe the Null Hypothesis Test. The technique of setting up a hypothesis to be nullified or refuted in order to support an alternative hypothesis.

March 17, 2010 CIS 4930 Exam III Page 2 of 6 Score 4. [6 pts] State the formula for Pearson s Chi Square Test. 5. [6 pts] Using the values: a total of 5,000 total tokens on the course schedule page, CIS occurring 48 times, 4930 occurring 11 times, and CIS 4930 occuring 10 times, create the table of data used by Pearson s Chi Square Test. CIS!CIS 4930 10 1!4930 38 4950 6. [6 pts] We would like to calculate the mean differential between the tokens 4930 and 4905 when each is preceded by CIS. In the Spring 2010 course schedule, CIS occurs 48 times, CIS 4930 occurs 10 times, and CIS 4905 occurs 1 time. Resolve your calculation as much as you can by hand, you may leave your result in a fractional form. C(w 1 w) C(w 2 w) / sqrt(c(w 1 w) + C(w 2 w)) = 10 1 / sqrt(10 + 1) = 9/sqrt(11)

March 17, 2010 CIS 4930 Exam III Page 3 of 6 Score 7. [8 pts] A tagger exists within the file: Tagger.pkl. Show how to read this tagger into your program for re-use. from cpickle import load input = open( Tagger.pkl, rb ) tagger = load(input) input.close() 8. [8 pts] You are given a list of tagged sentences called training. Show how to create a bigram tagger using this set of training data and the tagger you read in from the prior question as your backoff tagger. t1 = nltk.bigramtagger(training, backoff=tagger) 9. [4 pts] Given a list of untagged data called data, show how to tag this data using the tagger you created in the prior question. t1.tag(data)

March 17, 2010 CIS 4930 Exam III Page 4 of 6 Score 10. [16 pts] Create a method that will receive a tagged corpus and a specific tag. The method will search the corpus for the tag that most commonly follows the tag received. Return a list composed of: the specified tag, the most commonly following tag, and the frequency with which the most common tag follows the specified tag. def findmostcommonfollower(tagged_corpus, specified) : from collections import defaultdict words = tagged_corpus.tagged_words(simplify_tags=true) followers = defaultdict(int) length = len(words) totalcount = 0 for i in range(length) : if words[i][1] == specified : totalcount += 1 if i!= length - 1 : followers[words[i + 1][1]] += 1 max = 0 maxtag = 0 for each in followers.keys() : nextcount = followers[each] if nextcount > max : max = nextcount maxtag = each return [specified, maxtag, (max + 0.0) / totalcount]

March 17, 2010 CIS 4930 Exam III Page 5 of 6 Score 11. [20 pts] Prepositional phrases are made up a preposition and some set of following words, and are ended with a noun (the object of the preposition). Create a method that will receive a tagged corpus. The method will search the corpus for all prepositions and return a list of tuples (or sub-lists) containing: the preposition, the entire prepositional phrase (including preposition), and the number of tokens (words) within the prepositional phrase. Consider: the drink spilled from my glass and landed on my new shoes, your method will return: [( from, from my glass, 3), ( on, on my new shoes, 4)]. def findprepositions(tagged_corpus) : words = tagged_corpus.tagged_words(simplify_tags=true) results = [] lastprep = None count = 0 phrase = '' for each in words : if each[1] == 'P' : lastprep = each[0] count = 1 phrase = lastprep elif lastprep!= None : phrase += ' ' + each[0] count += 1 if each[1] == 'N' : results.append([lastprep, phrase, count]) lastprep = None return results

March 17, 2010 CIS 4930 Exam III Page 6 of 6 Score 12. [16 pts] The feminine pronouns are: she, her, herself, and hers and the masculine pronouns are: he, him, himself, and his. Create a method that will receive a tagged corpus and return the ratio of feminine pronouns to masculine pronouns. def findgenderratio(tagged_corpus) : words = tagged_corpus.tagged_words(simplify_tags=true) feminine = 0 masculine = 0 for each in words : if each[1] == 'PRO' : if each[0] == 'she' or each[0] == 'her' or each[0] == 'herself' or each[0] == 'hers' : feminine += 1 elif each[0] == 'he' or each[0] == 'him' or each[0] == 'himself' or each[0] == 'his' : masculine += 1 return (feminine + 0.0) / masculine