Course introduction. Marco Kuhlmann Department of Computer and Information Science. TDDE17 Introduction to Language Technology (2017)

Size: px
Start display at page:

Download "Course introduction. Marco Kuhlmann Department of Computer and Information Science. TDDE17 Introduction to Language Technology (2017)"

Transcription

1 TDDE17 Introduction to Language Technology (2017) Course introduction Marco Kuhlmann Department of Computer and Information Science This work is licensed under a Creative Commons Attribution 4.0 International License.

2 What is language technology? Language technology is technology for the analysis and interpretation of natural language. not programming languages! Language technology is an interdisciplinary research area involving computer science, linguistics, and cognitive science. related names: natural language processing, computational linguistics

3 We are drowning in information but starved for knowledge. John Naisbitt (1982)

4 Total number of pages indexed by Google Source: statisticbrain.com

5 The Knowledge Gap unstructured data (text) analyst language technology structured data (database) analyst

6 Information extraction As of 15 Mar 2002, Hawaii state health officials reported one additional recent case of dengue fever and 6 cases that occurred last year but were not confirmed by laboratory testing until Attribute Value docno ProMed doc_date disease_name dengue fever norm_stime norm_etime victim_types Source: Grishman et al. (2002) location Hawaii

7 Commercial interest Sponsors of the ACL 2015 conference

8 Two challenges Ambiguity A linguistic expression can often mean several different things. Time flies like an arrow. Fruit flies like a banana. Contextuality A linguistic expression can only be fully interpreted in a context. A: Are you coming tonight? B: I need to work on my lab.

9 Ambiguity causes combinatorial explosion jag bad om en kort bit PN VB PP DT JJ NN NN NN SN PN AB VB PL RG NN AB NN Example by Joakim Nivre

10 Different parts-of-speech have different frequencies Word / tag PN VB PP DT JJ NN jag bad om en kort bit Data from the Stockholm Umeå Corpus

11 Different parts-of-speech have different frequencies Word / tag PN VB PP DT JJ NN jag 99,45 % 0,00 % 0,00 % 0,00 % 0,00 % 0,55 % bad 0,00 % 80,39 % 0,00 % 0,00 % 0,00 % 19,61 % om 0,00 % 0,00 % 100,00 % 0,00 % 0,00 % 0,00 % en 2,42 % 0,00 % 0,00 % 97,57 % 0,00 % 0,01 % kort 0,00 % 0,00 % 0,00 % 0,00 % 87,41 % 12,59 % bit 0,00 % 0,00 % 0,00 % 0,00 % 0,00 % 100,00 % Data from the Stockholm Umeå Corpus

12 What you will learn in this course basic methods and techniques for the analysis and interpretation of words, sentences, and texts language technology systems question answering systems validation methods tools, software libraries, and data

13 Recurring questions How does this method work? often some kind of algorithm or mathematical formula How can we evaluate this method? typically some evaluation measure, such as accuracy How does this method use data? estimate probabilities, learn weights of a neural network,

14 How text is stored on a computer

15 How text is stored on a computer Text is stored as a sequence of bytes. Each byte consists of 8 bits of information, yielding 256 different values. Bytes encode characters according to some encoding scheme. Unicode has been developed with the ambition to specify code points for all naturally occurring characters. natural languages (even extinct), mathematical symbols, emoji,

16 Sample page from the Unicode specification Unicode version 9.0 (June 2016): 128,237 different characters

17 UTF-8 8-bit Unicode Transformation Format Unicode has slots for 2 32 = 4,294,967,296 different characters. To encode Unicode characters into bytes, a single character is represented using more than one byte. character = 1 byte, 128 2,047 = 2 bytes, ,535 = 3 bytes, This scheme is called UTF-8 (8-bit Unicode Transformation Format) and is the most widely used encoding scheme today. August 2016: 7 out of 8 websites (Source: w3techs.com)

18 Varfà r blir det sã hã r? s å [SPC] h ä r Unicode UTF Latin s à [SPC] h à r Example by Per Starbäck

19 Text segmentation

20 Tokenisation Tokenisation refers to the task of segmenting a text into individual words or word-like units. numbers, punctuation marks Tokenisation is harder than splitting at whitespace. we re; don t; bl.a.; Hewlett-Packard; San Francisco Tokenisation is even harder for non-european languages. Chinese word segmentation

21 Tokenisation Raw text before tokenisation List of tokens after tokenisation The gorgeously elaborate continuation of The Lord of the Rings trilogy is so huge that a column of words cannot adequately describe co-writer/director Peter Jackson s expanded vision of J.R.R. Tolkien s Middle-earth. The gorgeously elaborate continuation of The Lord of the Rings trilogy is so huge that a column of words cannot adequately describe co-writer / director Peter Jackson s expanded vision of J.R.R. Tolkien s Middle-earth.

22 Word tokens and word types Rose is a rose is a rose is a rose. Gertrude Stein ( ) Corpus Tokens Types Shakespeare ca. 884,000 ca. 31,000 Riksmöte 2012/2013 4,645,560 96,114 Google Ngrams 1,176,470,663 13,588,391

23 Two very simple tokenisers # tokenise a sequence of lines using whitespace def tokenize(lines): for line in lines: for token in line.split(): yield token # tokenise a sequence of lines using a regular expression def tokenize(regex, lines): for line in lines: for match in re.finditer(regex, line): yield match.group(0)

24 Normalisation Lowercasing windows vs. Windows Harmonisation of spelling variants colour, color; gaol, jail; metre, meter Stemming (suffix removal) wanted want, wanting want, happily happily

25 Stop words a about above after again against all am an and any are aren t as at be because been before being below between both but by can t cannot could couldn t did didn t do does doesn t doing don t down during each few for from further had hadn t has hasn t have haven t having he he d he ll he s her here here s hers herself him himself his how how s i i d i ll i m i ve if in into is isn t it it s its itself let s me more most mustn t my myself no nor not of off on once only or other ought our ours ourselves out over own same shan t she she d she ll she s should shouldn t so some such than that that s the their theirs them themselves then there there s these they they d they ll they re they ve this those through to too under until up very was wasn t we we d we ll we re we ve were weren t what what s when when s where where s which while who who s whom why why s with won t would wouldn t you you d you ll you re you ve your yours yourself yourselves

26 Sentence segmentation For some applications, we want to identify not only words but also higher-level units such as sentences and paragraphs. Sentence segmentation refers to the task of dividing a text into individual sentences. Sentence segmentation is harder than splitting at periods. We visited the U.S. After that, we visited Canada.

27 From form to meaning, and back again pragmatics semantics analyse syntax generate words speech

Course introduction. Marco Kuhlmann Department of Computer and Information Science. Language Technology (2018)

Course introduction. Marco Kuhlmann Department of Computer and Information Science. Language Technology (2018) Language Technology (2018) Course introduction Marco Kuhlmann Department of Computer and Information Science This work is licensed under a Creative Commons Attribution 4.0 International License. What is

More information

1 Jane s dress is... yours. A the same than B the same to C similar than D similar to

1 Jane s dress is... yours. A the same than B the same to C similar than D similar to Test 5A 1 Jane s dress is... yours. A the same than B the same to C similar than D similar to 2 We ve proved that he was guilty but he... doesn t admit it. A yet B already C still D no longer 3 If I...

More information

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9 1 INF5830 2015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 4, 10.9 2 Working with texts From bits to meaningful units Today: 3 Reading in texts Character encodings and Unicode Word tokenization

More information

Research Tools: DIY Text Tools

Research Tools: DIY Text Tools As with the other Research Tools, the DIY Text Tools are primarily designed for small research projects at the undergraduate level. What are the DIY Text Tools for? These tools are designed to help you

More information

Modals TEST 8. Book 1 Part C. 1. Why didn t you help him? You have done it. A) must B) can C) could D) were to E) was able to

Modals TEST 8. Book 1 Part C. 1. Why didn t you help him? You have done it. A) must B) can C) could D) were to E) was able to TEST 8 1. Why didn t you help him? You have done it. A) must B) can C) could D) were to E) was able to 2. A: I phone you tonight? B: Yes, you. A) may / may B) must / might C) could / can t D) shouldn t

More information

Trombone players produce different pitches partly by varying the length of a tube.

Trombone players produce different pitches partly by varying the length of a tube. Trombone players produce different pitches partly by varying the length of a tube. 7 Variables A variable is a connection between a name and a value.* That sounds simple enough, but some complexities arise

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 02 The Term Vocabulary and Postings Lists Constructing Inverted Indexes The major steps in constructing

More information

The Unicode Standard Version 11.0 Core Specification

The Unicode Standard Version 11.0 Core Specification The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers

More information

LBSC 690: Information Technology Lecture 05 Structured data and databases

LBSC 690: Information Technology Lecture 05 Structured data and databases LBSC 690: Information Technology Lecture 05 Structured data and databases William Webber CIS, University of Maryland Spring semester, 2012 Interpreting bits "my" 13.5801 268 010011010110 3rd Feb, 2014

More information

OR, you can download the file nltk_data.zip from the class web site, using a URL given in class.

OR, you can download the file nltk_data.zip from the class web site, using a URL given in class. NLP Lab Session Week 2 September 8, 2011 Frequency Distributions and Bigram Distributions Installing NLTK Data Reinstall nltk-2.0b7.win32.msi and Copy and Paste nltk_data from H:\ nltk_data to C:\ nltk_data,

More information

Princeton University. Computer Science 217: Introduction to Programming Systems. Data Types in C

Princeton University. Computer Science 217: Introduction to Programming Systems. Data Types in C Princeton University Computer Science 217: Introduction to Programming Systems Data Types in C 1 Goals of C Designers wanted C to: Support system programming Be low-level Be easy for people to handle But

More information

Written Communication

Written Communication Module 2: Written Communication 1 Your Passport to Professionalism: Module 2 Written Communication Step 1 Learn Introduction Sooner or later, you will need to communicate in writing. You will write down

More information

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016 Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review

More information

Rosa Rawlings BTEC Level 2 Extended Certificate in Information and Creative Technology Unit 3: A Digital Portfolio Assignment 3: Review Your Portfolio

Rosa Rawlings BTEC Level 2 Extended Certificate in Information and Creative Technology Unit 3: A Digital Portfolio Assignment 3: Review Your Portfolio How The Final Portfolio Is Suitable For The Intended Audience: My portfolio of PDF files of my work which showcases my work to the company. I have created a range of written documents, images and videos

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

EDAN20 Language Technology Chapter 13: Dependency Parsing

EDAN20 Language Technology   Chapter 13: Dependency Parsing EDAN20 Language Technology http://cs.lth.se/edan20/ Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ September 19, 2016 Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/

More information

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation

More information

Binary, Hexadecimal and Octal number system

Binary, Hexadecimal and Octal number system Binary, Hexadecimal and Octal number system Binary, hexadecimal, and octal refer to different number systems. The one that we typically use is called decimal. These number systems refer to the number of

More information

Lecture 14: Annotation

Lecture 14: Annotation Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose

More information

Digital Libraries: Language Technologies

Digital Libraries: Language Technologies Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................

More information

Representing text on the computer: ASCII, Unicode, and UTF 8

Representing text on the computer: ASCII, Unicode, and UTF 8 Representing text on the computer: ASCII, Unicode, and UTF 8 STAT/CS 287 Jim Bagrow Question: computers can only understand numbers. In particular, only two numbers, 0 and 1 (binary digits or bits). So

More information

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES Saturday 10 th December 2016 09:30 to 11:30 INSTRUCTIONS

More information

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong Today s Topics Homework 4 out due next Tuesday by midnight Homework 3 should have been submitted yesterday Quick Homework 3 review Continue with Perl intro

More information

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)? Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides

More information

Math 126 Number Theory

Math 126 Number Theory Math 16 Number Theory Prof. D. Joyce, Clark University 8 Mar 006 Due Friday. Page 155: exercises 1,, 7. Choose one of the three and write it up completely. Whichever one you choose, find all those solutions

More information

5 R1 The one green in the same place so either of these could be green.

5 R1 The one green in the same place so either of these could be green. Page: 1 of 20 1 R1 Now. Maybe what we should do is write out the cases that work. We wrote out one of them really very clearly here. [R1 takes out some papers.] Right? You did the one here um where you

More information

Refresher on Dependency Syntax and the Nivre Algorithm

Refresher on Dependency Syntax and the Nivre Algorithm Refresher on Dependency yntax and Nivre Algorithm Richard Johansson 1 Introduction This document gives more details about some important topics that re discussed very quickly during lecture: dependency

More information

Introduction to Normalization and Modern Collation

Introduction to Normalization and Modern Collation Introduction to Normalization and Modern Collation Roozbeh Pournader Sharif FarsiWeb, Inc. roozbeh@farsiweb.info The gap that needed filling For compatibility reasons, Unicode has more than one way to

More information

PROBLEM SOLVING 11. July 24, 2012

PROBLEM SOLVING 11. July 24, 2012 PROBLEM SOLVING 11 COMPUTER SCIENCE 61A July 24, 2012 Today s section will be a kind of Meta-Section, we are going to walk through some medium to hard-ish problems in Scheme, and we will discuss some methods

More information

The Ultimate Supreme Keyword Combos

The Ultimate Supreme Keyword Combos SEOPressor Presents: The Definitive Guide To Keyword Strategy The Ultimate Supreme Keyword Combos Copyright 2014 SEOPressor All Rights Reserved A Contents Chapter 3 Keyword Combos and Nuances Identifying

More information

Text Analytics Introduction (Part 1)

Text Analytics Introduction (Part 1) Text Analytics Introduction (Part 1) Maha Althobaiti, Udo Kruschwitz, Massimo Poesio School of Computer Science and Electronic Engineering University of Essex udo@essex.ac.uk 23 September 2015 Text Analytics

More information

CoVec. Covington Vector Semantics Tools. Michael A. Covington, Ph.D September 19

CoVec. Covington Vector Semantics Tools. Michael A. Covington, Ph.D September 19 CoVec Covington Vector Semantics Tools Michael A. Covington, Ph.D. 2016 September 19 Page 2 of 10 About CoVec CoVec, the Covington Vector Semantics Tools, is a set of software tools for comparing words

More information

0/41. Alice Who? Authentication Protocols. Andreas Zeller/Stephan Neuhaus. Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken

0/41. Alice Who? Authentication Protocols. Andreas Zeller/Stephan Neuhaus. Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken 0/41 Alice Who? Authentication Protocols Andreas Zeller/Stephan Neuhaus Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken The Menu 1/41 Simple Authentication Protocols The Menu 1/41 Simple

More information

SchoolMessenger App. Teacher User Guide - Web. West Corporation. 100 Enterprise Way, Suite A-300. Scotts Valley, CA

SchoolMessenger App. Teacher User Guide - Web. West Corporation. 100 Enterprise Way, Suite A-300. Scotts Valley, CA SchoolMessenger App Teacher User Guide - Web West Corporation 100 Enterprise Way, Suite A-300 Scotts Valley, CA 95066 800-920-3897 www.schoolmessenger.com Contents Welcome!... 3 SchoolMessenger and the

More information

CS1 Lecture 3 Jan. 22, 2018

CS1 Lecture 3 Jan. 22, 2018 CS1 Lecture 3 Jan. 22, 2018 Office hours for me and for TAs have been posted, locations will change check class website regularly First homework available, due Mon., 9:00am. Discussion sections tomorrow

More information

WBJS Grammar Glossary Punctuation Section

WBJS Grammar Glossary Punctuation Section WBJS Grammar Glossary Punctuation Section Punctuation refers to the marks used in writing that help readers understand what they are reading. Sometimes words alone are not enough to convey a writer s message

More information

Apache UIMA and Mayo ctakes

Apache UIMA and Mayo ctakes Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured

More information

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder

More information

Alfresco Voice Writing for the user interface

Alfresco Voice Writing for the user interface Alfresco Voice Writing for the user interface One day I will find the right words, and they will be simple. Jack Kerouac Contents Writing Tone - how to write Language - what to write Capitalization Abbreviations

More information

Using Dreamweaver CC. 5 More Page Editing. Bulleted and Numbered Lists

Using Dreamweaver CC. 5 More Page Editing. Bulleted and Numbered Lists Using Dreamweaver CC 5 By now, you should have a functional template, with one simple page based on that template. For the remaining pages, we ll create each page based on the template and then save each

More information

What Are CSS and DHTML?

What Are CSS and DHTML? 6/14/01 10:31 AM Page 1 1 What Are CSS and DHTML? c h a p t e r ch01.qxd IN THIS CHAPTER What Is CSS? What Is DHTML? DHTML vs. Flash Browser Hell What You Need to Know Already Welcome to the world of CSS

More information

Software Compare and Contrast

Software Compare and Contrast Microsoft Software Compare and Contrast Word Easy to navigate. Compatible with all PC computers. Very versatile. There are lots of templates that can be used to create flyers, calendars, resumes, etc.

More information

Chapter 5: Character and Paragraph Formatting in MS Word

Chapter 5: Character and Paragraph Formatting in MS Word Chapter 5: Character and Paragraph Formatting in MS Word 5.0 Learning Objectives At the end of this unit, you should be able to: 1. Perform character formatting 2. Change font size and color. 3. Paragraph

More information

Cryptography Worksheet

Cryptography Worksheet Cryptography Worksheet People have always been interested in writing secret messages. In ancient times, people had to write secret messages to keep messengers and interceptors from reading their private

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Text Input and Conditionals

Text Input and Conditionals Text Input and Conditionals Text Input Many programs allow the user to enter information, like a username and password. Python makes taking input from the user seamless with a single line of code: input()

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

why you should use Ruby

why you should use Ruby LUKE KANIES why you should use Ruby Luke Kanies runs Reductive Labs (http://reductivelabs.com), a startup producing OSS software for centralized, automated server administration. He has been a UNIX sysadmin

More information

The Problem With NODUPLICATES, Continued

The Problem With NODUPLICATES, Continued The Problem With NODUPLICATES, Continued Jack Hamilton First Health West Sacramento, California JackHamilton@FirstHealth.com moredupsov.doc Wednesday, Wed Apr 28 1999 12:37 PM Page 1 of 11 What Should

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-05-03 1/ 36 Take-away

More information

the magazine of the Marketing Research and Intelligence Association YEARS OF RESEARCH INTELLIGENCE A FUTURESPECTIVE

the magazine of the Marketing Research and Intelligence Association YEARS OF RESEARCH INTELLIGENCE A FUTURESPECTIVE the magazine of the Marketing Research and Intelligence Association vuemay 2010 5 YEARS OF RESEARCH INTELLIGENCE A FUTURESPECTIVE If You Want to Rank in Google, Start by Fixing Your Site You have an informative,

More information

How to Add or Invite Colleagues

How to Add or Invite Colleagues Page 1 of 5 How to Add or Invite Colleagues This how-to document contains four sections, addressing the most common questions about Point K collaboration features: Do my colleagues have to be co-workers

More information

Introduction to Text Mining. Aris Xanthos - University of Lausanne

Introduction to Text Mining. Aris Xanthos - University of Lausanne Introduction to Text Mining Aris Xanthos - University of Lausanne Preliminary notes Presentation designed for a novice audience Text mining = text analysis = text analytics: using computational and quantitative

More information

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG. C 181: Natural Language Processing Lecture 9: Context Free Grammars Kim Bruce Pomona College pring 2008 Homework & NLTK Review MLE, Laplace, and Good-Turing in smoothing.py Disclaimer: lide contents borrowed

More information

Digging Deeper Reaching Further

Digging Deeper Reaching Further Digging Deeper Reaching Further Libraries Empowering Users to Mine the HathiTrust Digital Library Resources Module 3: Working with Textual Data Instructor Guide Further reading: go.illinois.edu/ddrf-resources

More information

Ortolang Tools : MarsaTag

Ortolang Tools : MarsaTag Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements

More information

CSC401 Natural Language Computing

CSC401 Natural Language Computing CSC401 Natural Language Computing Jan 19, 2018 TA: Willie Chang Varada Kolhatkar, Ka-Chun Won, and Aryan Arbabi) Mascots: r/sandersforpresident (left) and r/the_donald (right) To perform sentiment analysis

More information

Java/RealJ Troubleshooting Guide

Java/RealJ Troubleshooting Guide Java/RealJ Troubleshooting Guide by Bob Clark / Sharon Curtis / Simon Jones, September 2000 Some of these tips you will come across during your practical sessions, however we felt it would be helpful to

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1)

CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1) CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze

More information

How do I find the Penrith Squash Booking website? Simply go to on your computer/tablet/ phone.

How do I find the Penrith Squash Booking website? Simply go to   on your computer/tablet/ phone. Booking Website FAQ How do I find the Penrith Squash Booking website?... 2 I have been a member before the 1 st March 2018 - how do I register myself to use the booking site?... 2 I am a new member - how

More information

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms

Grade 6 Math Circles November 6 & Relations, Functions, and Morphisms Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and

More information

Nonstandard Models of True Arithmetic. other models of true arithmetic (the set of sentences true in the standard model) that aren t

Nonstandard Models of True Arithmetic. other models of true arithmetic (the set of sentences true in the standard model) that aren t Nonstandard Models of True Arithmetic We have talked a lot about the standard model of the language of arithmetic, but there are other models of true arithmetic (the set of sentences true in the standard

More information

Evaluation of Named Entity Recognition in Dutch online criminal complaints

Evaluation of Named Entity Recognition in Dutch online criminal complaints Evaluation of Named Entity Recognition in Dutch online criminal complaints Marijn Schraagen Floris Bex Matthieu Brinkhuis Utrecht University June 12, 2017 Internet fraud Online trade is widespread Transactions

More information

IBM Atlas Suite Users Guide: Legal Matter Reports. for IBM Atlas Suite v6

IBM Atlas Suite Users Guide: Legal Matter Reports. for IBM Atlas Suite v6 IBM Atlas Suite Users Guide: Legal Matter Reports for IBM Atlas Suite v6 IBM Atlas Suite Users Guide: Legal Matter Reports This edition applies to version 6.0 of IBM Atlas Suite (product numbers 5725-D75,

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Basic Text Processing! Thanks to Dan Jurafsky and Chris Manning for reuse of (some) slides! Basic text processing Before we can start processing a piece of text: Segment text

More information

CS1 Lecture 3 Jan. 18, 2019

CS1 Lecture 3 Jan. 18, 2019 CS1 Lecture 3 Jan. 18, 2019 Office hours for Prof. Cremer and for TAs have been posted. Locations will change check class website regularly First homework assignment will be available Monday evening, due

More information

Work Smart: Make presence work for you

Work Smart: Make presence work for you Work Smart: Make presence work for you Use this guide to help stay in touch with the people who are important to you and those who are key to getting your work done. What do you want to do? Let people

More information

A Word-Based Communication Program

A Word-Based Communication Program A Word-Based Communication Program PRENTKE ROMICH COMPANY 1022 Heyl Road Wooster, Ohio 44691 Telephone: (330) 262-1984 Fax: (330) 263-4829 E-Mail: info@prentrom.com Website: www.prentrom.com LIBERATOR

More information

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017) Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze)

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures

More information

Structured Prediction Basics

Structured Prediction Basics CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad

More information

Introduction to Lexing and Parsing

Introduction to Lexing and Parsing Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

Dynamic Feature Selection for Dependency Parsing

Dynamic Feature Selection for Dependency Parsing Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana

More information

This Paragraph Is Capitalized And The First Letter In Each Word Should Therefore Appear In Uppercase.

This Paragraph Is Capitalized And The First Letter In Each Word Should Therefore Appear In Uppercase. HtmlArea Supports HTML 4.0 and CSS 2.1 Contents 1. Text properties with Backgroundimage 2. List 3. External links 4. Table 5. Header Content 6. Image gallery 7. Form elements 8. Position-zindex 9. Invoice

More information

Momental. Adrienne I. (Observer) Juliana C. (Computer) Meredith M. (Greeter/Facilitator) Nhien T. (Observer)

Momental. Adrienne I. (Observer) Juliana C. (Computer) Meredith M. (Greeter/Facilitator) Nhien T. (Observer) Momental Adrienne I. (Observer) Juliana C. (Computer) Meredith M. (Greeter/Facilitator) Nhien T. (Observer) Introduction Momental Help the moment you need it. We are designing an application to allow Stanford

More information

WHAT'S NEW WITH SALESFORCE FOR OUTLOOK

WHAT'S NEW WITH SALESFORCE FOR OUTLOOK WHAT'S NEW WITH SALESFORCE FOR OUTLOOK Salesforce for Outlook v2.9.3 Salesforce for Outlook v2.9.3 includes improvements to the sync process for recurring events and to the installer. Sync Installer When

More information

MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel

MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel Objective: Find and fix a data set with incorrect values Learning Outcomes: Use Excel to identify incorrect

More information

Outlook Calendar 2013

Outlook Calendar 2013 Outlook Calendar 2013 Work Together Add appointments, view and share calendars Open Calendar View your calendar Create an appointment Change appointment details How other staff see your calendar Open another

More information

ES01-KA

ES01-KA Technological Empowerment for VET trainers. An Open Educational Resource (OER) to train VET trainers in the design and use of m-learning methodologies. Dragon Dictation Dragon Dictation for the iphone

More information

Helping Users Sync Contacts and Events with Exchange Sync (Beta)

Helping Users Sync Contacts and Events with Exchange Sync (Beta) Helping Users Sync Contacts and Events with Exchange Sync (Beta) Salesforce, Winter 16 @salesforcedocs Last updated: December 10, 2015 Copyright 2000 2015 salesforce.com, inc. All rights reserved. Salesforce

More information

1.7 Limit of a Function

1.7 Limit of a Function 1.7 Limit of a Function We will discuss the following in this section: 1. Limit Notation 2. Finding a it numerically 3. Right and Left Hand Limits 4. Infinite Limits Consider the following graph Notation:

More information

Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples. By John B. Owen All rights reserved 2011, revised 2015

Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples. By John B. Owen All rights reserved 2011, revised 2015 Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples By John B. Owen All rights reserved 2011, revised 2015 Table of Contents Objectives Hello World Lesson Sequence Compile Errors Lexical

More information

boolean queries Inverted index query processing Query optimization boolean model September 9, / 39

boolean queries Inverted index query processing Query optimization boolean model September 9, / 39 boolean model September 9, 2014 1 / 39 Outline 1 boolean queries 2 3 4 2 / 39 taxonomy of IR models Set theoretic fuzzy extended boolean set-based IR models Boolean vector probalistic algebraic generalized

More information

Textual Analysis & Introduction to Python

Textual Analysis & Introduction to Python Textual Analysis & Introduction to Python Feb 18 2016 CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 1 Tuesday s Class Wrap-up Tuesday s Class Using Matrix Multiplication (MMULT) What

More information

CISC-124. Casting. // this would fail because we can t assign a double value to an int // variable

CISC-124. Casting. // this would fail because we can t assign a double value to an int // variable CISC-124 20180122 Today we looked at casting, conditionals and loops. Casting Casting is a simple method for converting one type of number to another, when the original type cannot be simply assigned to

More information

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise. CS 2413 Data Structures EXAM 3 Fall 2016, Page 1 of 10 Student Name: Student ID # OU Academic Integrity Pledge On my honor I affirm that I have neither given nor received inappropriate aid in the completion

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

Introduction to Parallel Computing Errata

Introduction to Parallel Computing Errata Introduction to Parallel Computing Errata John C. Kirk 27 November, 2004 Overview Book: Introduction to Parallel Computing, Second Edition, first printing (hardback) ISBN: 0-201-64865-2 Official book website:

More information

Introduction 1. Chapter 1

Introduction 1. Chapter 1 This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates

More information

ISYS1055/1057 Database Concepts Week 6: Tute/Lab SQL Programming

ISYS1055/1057 Database Concepts Week 6: Tute/Lab SQL Programming School of Science/ Computer Science and Information Technology ISYS1055/1057 Database Concepts Week 6: Tute/Lab SQL Programming Semester 2 2018 1 Objective The objectives of this tute/lab session are:

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 2: Words (Jan 24, 2019) David Bamman, UC Berkeley Words as dimensionality reduction cat https://brenocon.com/blog/2012/07/the-60000-cat-deep-belief-networks-make-less-sense-for-language-than-vision/

More information

Karlen Communications Citations and Bibliography in Word. Karen McCall, M.Ed.

Karlen Communications Citations and Bibliography in Word. Karen McCall, M.Ed. Karlen Communications Citations and Bibliography in Word Karen McCall, M.Ed. Table of Contents Introduction... 3 Choose a Document Style Guide... 3 Citations... 4 Manage Sources Master List Dialog... 5

More information

Google Search Appliance

Google Search Appliance Google Search Appliance Search Appliance Internationalization Google Search Appliance software version 7.2 and later Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-INTL_200.01

More information

Recall that strings and tuples are immutable datatypes, while lists are mutable datatypes. What does this mean?

Recall that strings and tuples are immutable datatypes, while lists are mutable datatypes. What does this mean? 6.189 Day 4 Readings How To Think Like A Computer Scientist, chapters 7 and 8 6.01 Fall 2009 Course Notes page 27-29 ( Lists and Iterations over lists ; List Comprehensions is optional); sections 3.2-3.4

More information

How to Compute Halting

How to Compute Halting 2015-1-2 0 How to Compute Halting Eric C.R. Hehner Department of Computer Science, University of Toronto hehner@cs.utoronto.ca Abstract: A consistently specified halting function may be computed. Halting

More information

Mr G s Java Jive. #11: Formatting Numbers

Mr G s Java Jive. #11: Formatting Numbers Mr G s Java Jive #11: Formatting Numbers Now that we ve started using double values, we re bound to run into the question of just how many decimal places we want to show. This where we get to deal with

More information

Understanding PowerPoint s Text Capabilities

Understanding PowerPoint s Text Capabilities Page 1 of 14 Chapter 3: Working with Text In this chapter z Understanding PowerPoint s Text Capabilities z Adding Text z Formatting Text z Using Bullets z Using Numbered Lists z Checking Spelling and Style

More information