Course introduction. Marco Kuhlmann Department of Computer and Information Science. TDDE17 Introduction to Language Technology (2017)
|
|
- Gregory Hopkins
- 5 years ago
- Views:
Transcription
1 TDDE17 Introduction to Language Technology (2017) Course introduction Marco Kuhlmann Department of Computer and Information Science This work is licensed under a Creative Commons Attribution 4.0 International License.
2 What is language technology? Language technology is technology for the analysis and interpretation of natural language. not programming languages! Language technology is an interdisciplinary research area involving computer science, linguistics, and cognitive science. related names: natural language processing, computational linguistics
3 We are drowning in information but starved for knowledge. John Naisbitt (1982)
4 Total number of pages indexed by Google Source: statisticbrain.com
5 The Knowledge Gap unstructured data (text) analyst language technology structured data (database) analyst
6 Information extraction As of 15 Mar 2002, Hawaii state health officials reported one additional recent case of dengue fever and 6 cases that occurred last year but were not confirmed by laboratory testing until Attribute Value docno ProMed doc_date disease_name dengue fever norm_stime norm_etime victim_types Source: Grishman et al. (2002) location Hawaii
7 Commercial interest Sponsors of the ACL 2015 conference
8 Two challenges Ambiguity A linguistic expression can often mean several different things. Time flies like an arrow. Fruit flies like a banana. Contextuality A linguistic expression can only be fully interpreted in a context. A: Are you coming tonight? B: I need to work on my lab.
9 Ambiguity causes combinatorial explosion jag bad om en kort bit PN VB PP DT JJ NN NN NN SN PN AB VB PL RG NN AB NN Example by Joakim Nivre
10 Different parts-of-speech have different frequencies Word / tag PN VB PP DT JJ NN jag bad om en kort bit Data from the Stockholm Umeå Corpus
11 Different parts-of-speech have different frequencies Word / tag PN VB PP DT JJ NN jag 99,45 % 0,00 % 0,00 % 0,00 % 0,00 % 0,55 % bad 0,00 % 80,39 % 0,00 % 0,00 % 0,00 % 19,61 % om 0,00 % 0,00 % 100,00 % 0,00 % 0,00 % 0,00 % en 2,42 % 0,00 % 0,00 % 97,57 % 0,00 % 0,01 % kort 0,00 % 0,00 % 0,00 % 0,00 % 87,41 % 12,59 % bit 0,00 % 0,00 % 0,00 % 0,00 % 0,00 % 100,00 % Data from the Stockholm Umeå Corpus
12 What you will learn in this course basic methods and techniques for the analysis and interpretation of words, sentences, and texts language technology systems question answering systems validation methods tools, software libraries, and data
13 Recurring questions How does this method work? often some kind of algorithm or mathematical formula How can we evaluate this method? typically some evaluation measure, such as accuracy How does this method use data? estimate probabilities, learn weights of a neural network,
14 How text is stored on a computer
15 How text is stored on a computer Text is stored as a sequence of bytes. Each byte consists of 8 bits of information, yielding 256 different values. Bytes encode characters according to some encoding scheme. Unicode has been developed with the ambition to specify code points for all naturally occurring characters. natural languages (even extinct), mathematical symbols, emoji,
16 Sample page from the Unicode specification Unicode version 9.0 (June 2016): 128,237 different characters
17 UTF-8 8-bit Unicode Transformation Format Unicode has slots for 2 32 = 4,294,967,296 different characters. To encode Unicode characters into bytes, a single character is represented using more than one byte. character = 1 byte, 128 2,047 = 2 bytes, ,535 = 3 bytes, This scheme is called UTF-8 (8-bit Unicode Transformation Format) and is the most widely used encoding scheme today. August 2016: 7 out of 8 websites (Source: w3techs.com)
18 Varfà r blir det sã hã r? s å [SPC] h ä r Unicode UTF Latin s à [SPC] h à r Example by Per Starbäck
19 Text segmentation
20 Tokenisation Tokenisation refers to the task of segmenting a text into individual words or word-like units. numbers, punctuation marks Tokenisation is harder than splitting at whitespace. we re; don t; bl.a.; Hewlett-Packard; San Francisco Tokenisation is even harder for non-european languages. Chinese word segmentation
21 Tokenisation Raw text before tokenisation List of tokens after tokenisation The gorgeously elaborate continuation of The Lord of the Rings trilogy is so huge that a column of words cannot adequately describe co-writer/director Peter Jackson s expanded vision of J.R.R. Tolkien s Middle-earth. The gorgeously elaborate continuation of The Lord of the Rings trilogy is so huge that a column of words cannot adequately describe co-writer / director Peter Jackson s expanded vision of J.R.R. Tolkien s Middle-earth.
22 Word tokens and word types Rose is a rose is a rose is a rose. Gertrude Stein ( ) Corpus Tokens Types Shakespeare ca. 884,000 ca. 31,000 Riksmöte 2012/2013 4,645,560 96,114 Google Ngrams 1,176,470,663 13,588,391
23 Two very simple tokenisers # tokenise a sequence of lines using whitespace def tokenize(lines): for line in lines: for token in line.split(): yield token # tokenise a sequence of lines using a regular expression def tokenize(regex, lines): for line in lines: for match in re.finditer(regex, line): yield match.group(0)
24 Normalisation Lowercasing windows vs. Windows Harmonisation of spelling variants colour, color; gaol, jail; metre, meter Stemming (suffix removal) wanted want, wanting want, happily happily
25 Stop words a about above after again against all am an and any are aren t as at be because been before being below between both but by can t cannot could couldn t did didn t do does doesn t doing don t down during each few for from further had hadn t has hasn t have haven t having he he d he ll he s her here here s hers herself him himself his how how s i i d i ll i m i ve if in into is isn t it it s its itself let s me more most mustn t my myself no nor not of off on once only or other ought our ours ourselves out over own same shan t she she d she ll she s should shouldn t so some such than that that s the their theirs them themselves then there there s these they they d they ll they re they ve this those through to too under until up very was wasn t we we d we ll we re we ve were weren t what what s when when s where where s which while who who s whom why why s with won t would wouldn t you you d you ll you re you ve your yours yourself yourselves
26 Sentence segmentation For some applications, we want to identify not only words but also higher-level units such as sentences and paragraphs. Sentence segmentation refers to the task of dividing a text into individual sentences. Sentence segmentation is harder than splitting at periods. We visited the U.S. After that, we visited Canada.
27 From form to meaning, and back again pragmatics semantics analyse syntax generate words speech
Course introduction. Marco Kuhlmann Department of Computer and Information Science. Language Technology (2018)
Language Technology (2018) Course introduction Marco Kuhlmann Department of Computer and Information Science This work is licensed under a Creative Commons Attribution 4.0 International License. What is
More information1 Jane s dress is... yours. A the same than B the same to C similar than D similar to
Test 5A 1 Jane s dress is... yours. A the same than B the same to C similar than D similar to 2 We ve proved that he was guilty but he... doesn t admit it. A yet B already C still D no longer 3 If I...
More informationINF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9
1 INF5830 2015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 4, 10.9 2 Working with texts From bits to meaningful units Today: 3 Reading in texts Character encodings and Unicode Word tokenization
More informationResearch Tools: DIY Text Tools
As with the other Research Tools, the DIY Text Tools are primarily designed for small research projects at the undergraduate level. What are the DIY Text Tools for? These tools are designed to help you
More informationModals TEST 8. Book 1 Part C. 1. Why didn t you help him? You have done it. A) must B) can C) could D) were to E) was able to
TEST 8 1. Why didn t you help him? You have done it. A) must B) can C) could D) were to E) was able to 2. A: I phone you tonight? B: Yes, you. A) may / may B) must / might C) could / can t D) shouldn t
More informationTrombone players produce different pitches partly by varying the length of a tube.
Trombone players produce different pitches partly by varying the length of a tube. 7 Variables A variable is a connection between a name and a value.* That sounds simple enough, but some complexities arise
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 02 The Term Vocabulary and Postings Lists Constructing Inverted Indexes The major steps in constructing
More informationThe Unicode Standard Version 11.0 Core Specification
The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationLBSC 690: Information Technology Lecture 05 Structured data and databases
LBSC 690: Information Technology Lecture 05 Structured data and databases William Webber CIS, University of Maryland Spring semester, 2012 Interpreting bits "my" 13.5801 268 010011010110 3rd Feb, 2014
More informationOR, you can download the file nltk_data.zip from the class web site, using a URL given in class.
NLP Lab Session Week 2 September 8, 2011 Frequency Distributions and Bigram Distributions Installing NLTK Data Reinstall nltk-2.0b7.win32.msi and Copy and Paste nltk_data from H:\ nltk_data to C:\ nltk_data,
More informationPrinceton University. Computer Science 217: Introduction to Programming Systems. Data Types in C
Princeton University Computer Science 217: Introduction to Programming Systems Data Types in C 1 Goals of C Designers wanted C to: Support system programming Be low-level Be easy for people to handle But
More informationWritten Communication
Module 2: Written Communication 1 Your Passport to Professionalism: Module 2 Written Communication Step 1 Learn Introduction Sooner or later, you will need to communicate in writing. You will write down
More informationTopics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016
Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review
More informationRosa Rawlings BTEC Level 2 Extended Certificate in Information and Creative Technology Unit 3: A Digital Portfolio Assignment 3: Review Your Portfolio
How The Final Portfolio Is Suitable For The Intended Audience: My portfolio of PDF files of my work which showcases my work to the company. I have created a range of written documents, images and videos
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationEDAN20 Language Technology Chapter 13: Dependency Parsing
EDAN20 Language Technology http://cs.lth.se/edan20/ Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ September 19, 2016 Pierre Nugues EDAN20 Language Technology http://cs.lth.se/edan20/
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationBinary, Hexadecimal and Octal number system
Binary, Hexadecimal and Octal number system Binary, hexadecimal, and octal refer to different number systems. The one that we typically use is called decimal. These number systems refer to the number of
More informationLecture 14: Annotation
Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose
More informationDigital Libraries: Language Technologies
Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................
More informationRepresenting text on the computer: ASCII, Unicode, and UTF 8
Representing text on the computer: ASCII, Unicode, and UTF 8 STAT/CS 287 Jim Bagrow Question: computers can only understand numbers. In particular, only two numbers, 0 and 1 (binary digits or bits). So
More informationUNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES
UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES Saturday 10 th December 2016 09:30 to 11:30 INSTRUCTIONS
More informationLING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong Today s Topics Homework 4 out due next Tuesday by midnight Homework 3 should have been submitted yesterday Quick Homework 3 review Continue with Perl intro
More informationAdmin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?
Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides
More informationMath 126 Number Theory
Math 16 Number Theory Prof. D. Joyce, Clark University 8 Mar 006 Due Friday. Page 155: exercises 1,, 7. Choose one of the three and write it up completely. Whichever one you choose, find all those solutions
More information5 R1 The one green in the same place so either of these could be green.
Page: 1 of 20 1 R1 Now. Maybe what we should do is write out the cases that work. We wrote out one of them really very clearly here. [R1 takes out some papers.] Right? You did the one here um where you
More informationRefresher on Dependency Syntax and the Nivre Algorithm
Refresher on Dependency yntax and Nivre Algorithm Richard Johansson 1 Introduction This document gives more details about some important topics that re discussed very quickly during lecture: dependency
More informationIntroduction to Normalization and Modern Collation
Introduction to Normalization and Modern Collation Roozbeh Pournader Sharif FarsiWeb, Inc. roozbeh@farsiweb.info The gap that needed filling For compatibility reasons, Unicode has more than one way to
More informationPROBLEM SOLVING 11. July 24, 2012
PROBLEM SOLVING 11 COMPUTER SCIENCE 61A July 24, 2012 Today s section will be a kind of Meta-Section, we are going to walk through some medium to hard-ish problems in Scheme, and we will discuss some methods
More informationThe Ultimate Supreme Keyword Combos
SEOPressor Presents: The Definitive Guide To Keyword Strategy The Ultimate Supreme Keyword Combos Copyright 2014 SEOPressor All Rights Reserved A Contents Chapter 3 Keyword Combos and Nuances Identifying
More informationText Analytics Introduction (Part 1)
Text Analytics Introduction (Part 1) Maha Althobaiti, Udo Kruschwitz, Massimo Poesio School of Computer Science and Electronic Engineering University of Essex udo@essex.ac.uk 23 September 2015 Text Analytics
More informationCoVec. Covington Vector Semantics Tools. Michael A. Covington, Ph.D September 19
CoVec Covington Vector Semantics Tools Michael A. Covington, Ph.D. 2016 September 19 Page 2 of 10 About CoVec CoVec, the Covington Vector Semantics Tools, is a set of software tools for comparing words
More information0/41. Alice Who? Authentication Protocols. Andreas Zeller/Stephan Neuhaus. Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken
0/41 Alice Who? Authentication Protocols Andreas Zeller/Stephan Neuhaus Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken The Menu 1/41 Simple Authentication Protocols The Menu 1/41 Simple
More informationSchoolMessenger App. Teacher User Guide - Web. West Corporation. 100 Enterprise Way, Suite A-300. Scotts Valley, CA
SchoolMessenger App Teacher User Guide - Web West Corporation 100 Enterprise Way, Suite A-300 Scotts Valley, CA 95066 800-920-3897 www.schoolmessenger.com Contents Welcome!... 3 SchoolMessenger and the
More informationCS1 Lecture 3 Jan. 22, 2018
CS1 Lecture 3 Jan. 22, 2018 Office hours for me and for TAs have been posted, locations will change check class website regularly First homework available, due Mon., 9:00am. Discussion sections tomorrow
More informationWBJS Grammar Glossary Punctuation Section
WBJS Grammar Glossary Punctuation Section Punctuation refers to the marks used in writing that help readers understand what they are reading. Sometimes words alone are not enough to convey a writer s message
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationHidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney
Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder
More informationAlfresco Voice Writing for the user interface
Alfresco Voice Writing for the user interface One day I will find the right words, and they will be simple. Jack Kerouac Contents Writing Tone - how to write Language - what to write Capitalization Abbreviations
More informationUsing Dreamweaver CC. 5 More Page Editing. Bulleted and Numbered Lists
Using Dreamweaver CC 5 By now, you should have a functional template, with one simple page based on that template. For the remaining pages, we ll create each page based on the template and then save each
More informationWhat Are CSS and DHTML?
6/14/01 10:31 AM Page 1 1 What Are CSS and DHTML? c h a p t e r ch01.qxd IN THIS CHAPTER What Is CSS? What Is DHTML? DHTML vs. Flash Browser Hell What You Need to Know Already Welcome to the world of CSS
More informationSoftware Compare and Contrast
Microsoft Software Compare and Contrast Word Easy to navigate. Compatible with all PC computers. Very versatile. There are lots of templates that can be used to create flyers, calendars, resumes, etc.
More informationChapter 5: Character and Paragraph Formatting in MS Word
Chapter 5: Character and Paragraph Formatting in MS Word 5.0 Learning Objectives At the end of this unit, you should be able to: 1. Perform character formatting 2. Change font size and color. 3. Paragraph
More informationCryptography Worksheet
Cryptography Worksheet People have always been interested in writing secret messages. In ancient times, people had to write secret messages to keep messengers and interceptors from reading their private
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationText Input and Conditionals
Text Input and Conditionals Text Input Many programs allow the user to enter information, like a username and password. Python makes taking input from the user seamless with a single line of code: input()
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationwhy you should use Ruby
LUKE KANIES why you should use Ruby Luke Kanies runs Reductive Labs (http://reductivelabs.com), a startup producing OSS software for centralized, automated server administration. He has been a UNIX sysadmin
More informationThe Problem With NODUPLICATES, Continued
The Problem With NODUPLICATES, Continued Jack Hamilton First Health West Sacramento, California JackHamilton@FirstHealth.com moredupsov.doc Wednesday, Wed Apr 28 1999 12:37 PM Page 1 of 11 What Should
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-05-03 1/ 36 Take-away
More informationthe magazine of the Marketing Research and Intelligence Association YEARS OF RESEARCH INTELLIGENCE A FUTURESPECTIVE
the magazine of the Marketing Research and Intelligence Association vuemay 2010 5 YEARS OF RESEARCH INTELLIGENCE A FUTURESPECTIVE If You Want to Rank in Google, Start by Fixing Your Site You have an informative,
More informationHow to Add or Invite Colleagues
Page 1 of 5 How to Add or Invite Colleagues This how-to document contains four sections, addressing the most common questions about Point K collaboration features: Do my colleagues have to be co-workers
More informationIntroduction to Text Mining. Aris Xanthos - University of Lausanne
Introduction to Text Mining Aris Xanthos - University of Lausanne Preliminary notes Presentation designed for a novice audience Text mining = text analysis = text analytics: using computational and quantitative
More informationHomework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.
C 181: Natural Language Processing Lecture 9: Context Free Grammars Kim Bruce Pomona College pring 2008 Homework & NLTK Review MLE, Laplace, and Good-Turing in smoothing.py Disclaimer: lide contents borrowed
More informationDigging Deeper Reaching Further
Digging Deeper Reaching Further Libraries Empowering Users to Mine the HathiTrust Digital Library Resources Module 3: Working with Textual Data Instructor Guide Further reading: go.illinois.edu/ddrf-resources
More informationOrtolang Tools : MarsaTag
Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements
More informationCSC401 Natural Language Computing
CSC401 Natural Language Computing Jan 19, 2018 TA: Willie Chang Varada Kolhatkar, Ka-Chun Won, and Aryan Arbabi) Mascots: r/sandersforpresident (left) and r/the_donald (right) To perform sentiment analysis
More informationJava/RealJ Troubleshooting Guide
Java/RealJ Troubleshooting Guide by Bob Clark / Sharon Curtis / Simon Jones, September 2000 Some of these tips you will come across during your practical sessions, however we felt it would be helpful to
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationCSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1)
CSE 7/5337: Information Retrieval and Web Search Introduction and Boolean Retrieval (IIR 1) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze
More informationHow do I find the Penrith Squash Booking website? Simply go to on your computer/tablet/ phone.
Booking Website FAQ How do I find the Penrith Squash Booking website?... 2 I have been a member before the 1 st March 2018 - how do I register myself to use the booking site?... 2 I am a new member - how
More informationGrade 6 Math Circles November 6 & Relations, Functions, and Morphisms
Faculty of Mathematics Waterloo, Ontario N2L 3G1 Centre for Education in Mathematics and Computing Relations Let s talk about relations! Grade 6 Math Circles November 6 & 7 2018 Relations, Functions, and
More informationNonstandard Models of True Arithmetic. other models of true arithmetic (the set of sentences true in the standard model) that aren t
Nonstandard Models of True Arithmetic We have talked a lot about the standard model of the language of arithmetic, but there are other models of true arithmetic (the set of sentences true in the standard
More informationEvaluation of Named Entity Recognition in Dutch online criminal complaints
Evaluation of Named Entity Recognition in Dutch online criminal complaints Marijn Schraagen Floris Bex Matthieu Brinkhuis Utrecht University June 12, 2017 Internet fraud Online trade is widespread Transactions
More informationIBM Atlas Suite Users Guide: Legal Matter Reports. for IBM Atlas Suite v6
IBM Atlas Suite Users Guide: Legal Matter Reports for IBM Atlas Suite v6 IBM Atlas Suite Users Guide: Legal Matter Reports This edition applies to version 6.0 of IBM Atlas Suite (product numbers 5725-D75,
More informationNatural Language Processing
Natural Language Processing Basic Text Processing! Thanks to Dan Jurafsky and Chris Manning for reuse of (some) slides! Basic text processing Before we can start processing a piece of text: Segment text
More informationCS1 Lecture 3 Jan. 18, 2019
CS1 Lecture 3 Jan. 18, 2019 Office hours for Prof. Cremer and for TAs have been posted. Locations will change check class website regularly First homework assignment will be available Monday evening, due
More informationWork Smart: Make presence work for you
Work Smart: Make presence work for you Use this guide to help stay in touch with the people who are important to you and those who are key to getting your work done. What do you want to do? Let people
More informationA Word-Based Communication Program
A Word-Based Communication Program PRENTKE ROMICH COMPANY 1022 Heyl Road Wooster, Ohio 44691 Telephone: (330) 262-1984 Fax: (330) 263-4829 E-Mail: info@prentrom.com Website: www.prentrom.com LIBERATOR
More informationSchool of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)
Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze)
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures
More informationStructured Prediction Basics
CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad
More informationIntroduction to Lexing and Parsing
Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?
More informationThe Goal of this Document. Where to Start?
A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce
More informationDynamic Feature Selection for Dependency Parsing
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana
More informationThis Paragraph Is Capitalized And The First Letter In Each Word Should Therefore Appear In Uppercase.
HtmlArea Supports HTML 4.0 and CSS 2.1 Contents 1. Text properties with Backgroundimage 2. List 3. External links 4. Table 5. Header Content 6. Image gallery 7. Form elements 8. Position-zindex 9. Invoice
More informationMomental. Adrienne I. (Observer) Juliana C. (Computer) Meredith M. (Greeter/Facilitator) Nhien T. (Observer)
Momental Adrienne I. (Observer) Juliana C. (Computer) Meredith M. (Greeter/Facilitator) Nhien T. (Observer) Introduction Momental Help the moment you need it. We are designing an application to allow Stanford
More informationWHAT'S NEW WITH SALESFORCE FOR OUTLOOK
WHAT'S NEW WITH SALESFORCE FOR OUTLOOK Salesforce for Outlook v2.9.3 Salesforce for Outlook v2.9.3 includes improvements to the sync process for recurring events and to the installer. Sync Installer When
More informationMIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel
MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel Objective: Find and fix a data set with incorrect values Learning Outcomes: Use Excel to identify incorrect
More informationOutlook Calendar 2013
Outlook Calendar 2013 Work Together Add appointments, view and share calendars Open Calendar View your calendar Create an appointment Change appointment details How other staff see your calendar Open another
More informationES01-KA
Technological Empowerment for VET trainers. An Open Educational Resource (OER) to train VET trainers in the design and use of m-learning methodologies. Dragon Dictation Dragon Dictation for the iphone
More informationHelping Users Sync Contacts and Events with Exchange Sync (Beta)
Helping Users Sync Contacts and Events with Exchange Sync (Beta) Salesforce, Winter 16 @salesforcedocs Last updated: December 10, 2015 Copyright 2000 2015 salesforce.com, inc. All rights reserved. Salesforce
More information1.7 Limit of a Function
1.7 Limit of a Function We will discuss the following in this section: 1. Limit Notation 2. Finding a it numerically 3. Right and Left Hand Limits 4. Infinite Limits Consider the following graph Notation:
More informationLesson 1A - First Java Program HELLO WORLD With DEBUGGING examples. By John B. Owen All rights reserved 2011, revised 2015
Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples By John B. Owen All rights reserved 2011, revised 2015 Table of Contents Objectives Hello World Lesson Sequence Compile Errors Lexical
More informationboolean queries Inverted index query processing Query optimization boolean model September 9, / 39
boolean model September 9, 2014 1 / 39 Outline 1 boolean queries 2 3 4 2 / 39 taxonomy of IR models Set theoretic fuzzy extended boolean set-based IR models Boolean vector probalistic algebraic generalized
More informationTextual Analysis & Introduction to Python
Textual Analysis & Introduction to Python Feb 18 2016 CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 1 Tuesday s Class Wrap-up Tuesday s Class Using Matrix Multiplication (MMULT) What
More informationCISC-124. Casting. // this would fail because we can t assign a double value to an int // variable
CISC-124 20180122 Today we looked at casting, conditionals and loops. Casting Casting is a simple method for converting one type of number to another, when the original type cannot be simply assigned to
More informationOn my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.
CS 2413 Data Structures EXAM 3 Fall 2016, Page 1 of 10 Student Name: Student ID # OU Academic Integrity Pledge On my honor I affirm that I have neither given nor received inappropriate aid in the completion
More informationIntroduction to Information Retrieval
Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural
More informationIntroduction to Parallel Computing Errata
Introduction to Parallel Computing Errata John C. Kirk 27 November, 2004 Overview Book: Introduction to Parallel Computing, Second Edition, first printing (hardback) ISBN: 0-201-64865-2 Official book website:
More informationIntroduction 1. Chapter 1
This PDF file is an excerpt from The Unicode Standard, Version 5.2, issued and published by the Unicode Consortium. The PDF files have not been modified to reflect the corrections found on the Updates
More informationISYS1055/1057 Database Concepts Week 6: Tute/Lab SQL Programming
School of Science/ Computer Science and Information Technology ISYS1055/1057 Database Concepts Week 6: Tute/Lab SQL Programming Semester 2 2018 1 Objective The objectives of this tute/lab session are:
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 2: Words (Jan 24, 2019) David Bamman, UC Berkeley Words as dimensionality reduction cat https://brenocon.com/blog/2012/07/the-60000-cat-deep-belief-networks-make-less-sense-for-language-than-vision/
More informationKarlen Communications Citations and Bibliography in Word. Karen McCall, M.Ed.
Karlen Communications Citations and Bibliography in Word Karen McCall, M.Ed. Table of Contents Introduction... 3 Choose a Document Style Guide... 3 Citations... 4 Manage Sources Master List Dialog... 5
More informationGoogle Search Appliance
Google Search Appliance Search Appliance Internationalization Google Search Appliance software version 7.2 and later Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-INTL_200.01
More informationRecall that strings and tuples are immutable datatypes, while lists are mutable datatypes. What does this mean?
6.189 Day 4 Readings How To Think Like A Computer Scientist, chapters 7 and 8 6.01 Fall 2009 Course Notes page 27-29 ( Lists and Iterations over lists ; List Comprehensions is optional); sections 3.2-3.4
More informationHow to Compute Halting
2015-1-2 0 How to Compute Halting Eric C.R. Hehner Department of Computer Science, University of Toronto hehner@cs.utoronto.ca Abstract: A consistently specified halting function may be computed. Halting
More informationMr G s Java Jive. #11: Formatting Numbers
Mr G s Java Jive #11: Formatting Numbers Now that we ve started using double values, we re bound to run into the question of just how many decimal places we want to show. This where we get to deal with
More informationUnderstanding PowerPoint s Text Capabilities
Page 1 of 14 Chapter 3: Working with Text In this chapter z Understanding PowerPoint s Text Capabilities z Adding Text z Formatting Text z Using Bullets z Using Numbered Lists z Checking Spelling and Style
More information