Algorithms for NLP. Language Modeling II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Size: px
Start display at page:

Download "Algorithms for NLP. Language Modeling II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley"

Transcription

1 Algorithms for NLP Language Modeling II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

2 Announcements Should be able to really start project after today s lecture Get familiar with bit-twiddling in Java (e.g. &,, <<, >>) No external libraries / code We will go over KN again in recitation edge cases Tentative office hours: Me: Maria: Hieu: Akshay:

3 Language Models Language models are distributions over sentences N-gram models are built from local conditional probabilities The methods we ve seen are backed by corpus n-gram counts ˆP (w i w i 1,w i 2 )= c(w i 2,w i 1,w i ) c(w i 2,w i 1 )

4 Kneser-Ney Smoothing Kneser-Ney smoothing combines two ideas Discount and reallocate like absolute discounting In the backoff model, word probabilities are proportional to context fertility, not frequency P (w) / {w 0 : c(w 0,w) > 0} Theory and practice Practice: KN smoothing has been repeatedly proven both effective and efficient Theory: KN smoothing as approximate inference in a hierarchical Pitman-Yor process [Teh, 2006]

5 Kneser-Ney Edge Cases All orders recursively discount and back-off: P k (w prev k 1 )= max(c0 (prev k 1,w) d, 0) P v c0 (prev k 1,v) + (prev k 1)P k 1 (w prev k 2 ) The unigram base case does not need to discount (though it can) Alpha is computed to make the probability normalize (but if context count is zero, then fully back-off) For the highest order, c is the token count of the n-gram. For all others it is the context fertility of the n-gram (see Chen and Goodman p. 18): c 0 (x) = {u : c(u, x) > 0}

6 Idea 4: Big Data There s no data like more data.

7 Data >> Method? Having more data is better Entropy n-gram order 100,000 Katz 100,000 KN 1,000,000 Katz 1,000,000 KN 10,000,000 Katz 10,000,000 KN all Katz all KN but so is using a better estimator Another issue: N > 3 has huge costs in speech recognizers

8 Tons of Data? [Brants et al, 2007]

9 What about

10 Unknown Words? What about totally unseen words? Most LM applications are closed vocabulary ASR systems will only propose words that are in their pronunciation dictionary MT systems will only propose words that are in their phrase tables (modulo special models for numbers, etc) In principle, one can build open vocabulary LMs E.g. models over character sequences rather than word sequences Back-off needs to go down into a generate new word model Typically if you need this, a high-order character model will do

11 What s in an N-Gram? Just about every local correlation! Word class restrictions: will have been Morphology: she, they Semantic class restrictions: danced the Idioms: add insult to World knowledge: ice caps have Pop culture: the empire strikes But not the long-distance ones The computer which I had just put into the machine room on the fifth floor.

12 What Actually Works? Trigrams and beyond: Unigrams, bigrams generally useless Trigrams much better 4-, 5-grams and more are really useful in MT, but gains are more limited for speech Discounting Absolute discounting, Good- Turing, held-out estimation, Witten-Bell, etc Context counting Kneser-Ney construction of lower-order models See [Chen+Goodman] reading for tons of graphs [Graph from Joshua Goodman]

13 What s in an N-Gram? Just about every local correlation! Word class restrictions: will have been Morphology: she, they Semantic class restrictions: danced the Idioms: add insult to World knowledge: ice caps have Pop culture: the empire strikes But not the long-distance ones The computer which I had just put into the machine room on the fifth floor.

14 Linguistic Pain? The N-Gram assumption hurts one s inner linguist! Many linguistic arguments that language isn t regular Long-distance dependencies Recursive structure Answers N-grams only model local correlations, but they get them all As N increases, they catch even more correlations N-gram models scale much more easily than structured LMs Not convinced? Can build LMs out of our grammar models (later in the course) Take any generative model with words at the bottom and marginalize out the other variables

15 What Gets Captured? Bigram model: [texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house, said, mr., gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, fifty, five, yen] [outside, new, car, parking, lot, of, the, agreement, reached] [this, would, be, a, record, november] PCFG model: [This, quarter, s, surprisingly, independent, attack, paid, off, the, risk, involving, IRS, leaders, and, transportation, prices,.] [It, could, be, announced, sometime,.] [Mr., Toseland, believes, the, average, defense, economy, is, drafted, from, slightly, more, than, 12, stocks,.]

16 Other Techniques? Lots of other techniques Maximum entropy LMs (soon) Neural network LMs (soon) Syntactic / grammar-structured LMs (much later)

17 How to Build an LM

18 Tons of Data Good LMs need lots of n-grams! [Brants et al, 2007]

19 Storing Counts Key function: map from n-grams to counts searching for the best searching for the right searching for the cheapest searching for the perfect searching for the truth searching for the searching for the most searching for the latest searching for the next searching for the lowest searching for the name 8402 searching for the finest 8171

20 Example: Google N-Grams

21 Efficient Storage

22 Naïve Approach 0 c(cat) = 12 c(the) = 87 hash(cat) = 2 hash(the) = key value cat 12 the 87 c(and) = 76 c(dog) = 11 hash(and) = 5 hash(dog) = and 76 7 dog 11 c(have) =? hash(have) = 2

23 A Simple Java Hashmap? Per 3-gram: 1 Pointer = 8 bytes 1 Map.Entry = 8 bytes (obj) +3x8 bytes (pointers) 1 Double = 8 bytes (obj) + 8 bytes (double) 1 String[] = 8 bytes (obj) + + 3x8 bytes (pointers) at best Strings are canonicalized Total: > 88 bytes Obvious alternatives: - Sorted arrays - Open addressing

24 Open Address Hashing c(cat) = 12 c(the) = 87 c(and) = 76 c(dog) = 11 hash(cat) = 2 hash(the) = 2 hash(and) = 5 hash(dog) = key value

25 Open Address Hashing key value c(cat) = 12 hash(cat) = 2 0 c(the) = 87 c(and) = 76 hash(the) = 2 hash(and) = cat the c(dog) = 11 hash(dog) = and 5 6 c(have) =? hash(have) = 2 7 dog 7

26 Open Address Hashing c(cat) = 12 c(the) = 87 c(and) = 76 c(dog) = 11 hash(cat) = 2 hash(the) = 2 hash(and) = 5 hash(dog) = key value 14 15

27 Efficient Hashing Closed address hashing Resolve collisions with chains Easier to understand but bigger Open address hashing Resolve collisions with probe sequences Smaller but easy to mess up Direct-address hashing No collision resolution Just eject previous entries Not suitable for core LM storage

28 A Simple Java Hashmap? Per 3-gram: 1 Pointer = 8 bytes 1 Map.Entry = 8 bytes (obj) +3x8 bytes (pointers) 1 Double = 8 bytes (obj) + 8 bytes (double) 1 String[] = 8 bytes (obj) + + 3x8 bytes (pointers) at best Strings are canonicalized Total: > 88 bytes Obvious alternatives: - Sorted arrays - Open addressing

29 Integer Encodings word ids the cat laughed 233 n-gram count

30 Bit Packing Got 3 numbers under 2 20 to store? bits 20 bits 20 bits Fits in a primitive 64-bit long

31 Integer Encodings = n-gram encoding the cat laughed 233 n-gram count

32 Rank Values c(the) = < bits to represent integers between 0 and bits 35 bits n-gram encoding count

33 Rank Values # unique counts = < bits to represent ranks of all counts 60 bits 20 bits n-gram encoding rank rank freq

34 So Far Word indexer N-gram encoding scheme unigram: f(id) = id bigram: f(id 1, id 2 ) =? trigram: f(id 1, id 2, id 3 ) =? unigram Count DB bigram trigram Rank lookup

35 Hashing vs Sorting

36 Context Tries

37 Tries

38 Context Encodings [Many details from Pauls and Klein, 2011]

39 Context Encodings

40 N-Gram Lookup

41 Compression

42 Idea: Differential Compression

43 Variable Length Encodings Encoding Length in Unary Number in Binary [Elias, 75]

44 Speed-Ups

45 Rolling Queries

46 Idea: Fast Caching LM can be more than 10x faster w/ directaddress caching

47 Approximate LMs Simplest option: hash-and-hope Array of size K ~ N (optional) store hash of keys Store values in direct-address Collisions: store the max What kind of errors can there be? More complex options, like bloom filters (originally for membership, but see Talbot and Osborne 07), perfect hashing, etc

48 Maximum Entropy Models

49 Improving on N-Grams? N-grams don t combine multiple sources of evidence well P(construction After the demolition was completed, the) Here: the gives syntactic constraint demolition gives semantic constraint Unlikely the interaction between these two has been densely observed We d like a model that can be more statistically efficient

50 Maximum Entropy LMs Want a model over completions y given a context x: P y x = P( close the door close the ) Want to characterize the important aspects of y = (v,x) using a feature function f F might include Indicator of v (unigram) Indicator of v, previous word (bigram) Indicator whether v occurs in x (cache) Indicator of v and each non-adjacent previous word

51 Some Definitions INPUTS CANDIDATE SET CANDIDATES close the {close the door, close the table, } close the table TRUE OUTPUTS close the door FEATURE VECTORS close in x Ù v= door v -1 = the Ù v= door door in x and v

52 Linear Models: Maximum Entropy Maximum entropy (logistic regression) Use the scores as probabilities: Make positive Normalize Maximize the (log) conditional likelihood of training data

53 Maximum Entropy II Motivation for maximum entropy: Connection to maximum entropy principle (sort of) Might want to do a good job of being uncertain on noisy cases in practice, though, posteriors are pretty peaked Regularization (smoothing)

54 Derivative for Maximum Entropy Big weights are bad Expected feature vector over possible candidates Total count of feature n in correct candidates

55 Convexity The maxent objective is nicely behaved: Differentiable (so many ways to optimize) Convex (so no local optima*) Convex Non-Convex Convexity guarantees a single, global maximum value because any higher points are greedily reachable

56 Unconstrained Optimization Once we have a function f, we can find a local optimum by iteratively following the gradient For convex functions, a local optimum will be global Basic gradient ascent isn t very efficient, but there are simple enhancements which take into account previous gradients: conjugate gradient, L-BFGs Online methods (e.g. AdaGrad) now very popular

57 Implicit Representation

Natural Language Processing

Natural Language Processing Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The

More information

Algorithms for NLP. Language Modeling II. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Language Modeling II. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Language Modeling II Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Should be able to really start project ager today s lecture Get familiar with bit- twiddling

More information

Smoothing. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Smoothing. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler Smoothing BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de November 1, 2016 Last Week Language model: P(Xt = wt X1 = w1,...,xt-1 = wt-1)

More information

Please note that some of the resources used in this assignment require a Stanford Network Account and therefore may not be accessible.

Please note that some of the resources used in this assignment require a Stanford Network Account and therefore may not be accessible. Please note that some of the resources used in this assignment require a Stanford Network Account and therefore may not be accessible. CS 224N / Ling 237 Programming Assignment 1: Language Modeling Due

More information

N-Gram Language Modelling including Feed-Forward NNs. Kenneth Heafield. University of Edinburgh

N-Gram Language Modelling including Feed-Forward NNs. Kenneth Heafield. University of Edinburgh N-Gram Language Modelling including Feed-Forward NNs Kenneth Heafield University of Edinburgh History of Language Model History Kenneth Heafield University of Edinburgh 3 p(type Predictive) > p(tyler Predictive)

More information

CS 288: Statistical NLP Assignment 1: Language Modeling

CS 288: Statistical NLP Assignment 1: Language Modeling CS 288: Statistical NLP Assignment 1: Language Modeling Due September 12, 2014 Collaboration Policy You are allowed to discuss the assignment with other students and collaborate on developing algorithms

More information

Sparse Non-negative Matrix Language Modeling

Sparse Non-negative Matrix Language Modeling Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language

More information

Natural Language Processing

Natural Language Processing Natural Language Processing N-grams and minimal edit distance Pieter Wellens 2012-2013 These slides are based on the course materials from the ANLP course given at the School of Informatics, Edinburgh

More information

Lign/CSE 256, Programming Assignment 1: Language Models

Lign/CSE 256, Programming Assignment 1: Language Models Lign/CSE 256, Programming Assignment 1: Language Models 16 January 2008 due 1 Feb 2008 1 Preliminaries First, make sure you can access the course materials. 1 The components are: ˆ code1.zip: the Java

More information

Statistical NLP Spring 2009

Statistical NLP Spring 2009 Statistical NLP Spring 2009 Learning Models with EM Hard EM: alternate between E-step: Find best completions Y for fixed θ M-step: Find best parameters θ for fixed Y Example: K-Means Lecture 5: WSD / Maxent

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking

More information

CS159 - Assignment 2b

CS159 - Assignment 2b CS159 - Assignment 2b Due: Tuesday, Sept. 23 at 2:45pm For the main part of this assignment we will be constructing a number of smoothed versions of a bigram language model and we will be evaluating its

More information

KenLM: Faster and Smaller Language Model Queries

KenLM: Faster and Smaller Language Model Queries KenLM: Faster and Smaller Language Model Queries Kenneth heafield@cs.cmu.edu Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm What KenLM Does Answer language model queries using less time and memory.

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview

More information

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information

More information

Log- linear models. Natural Language Processing: Lecture Kairit Sirts

Log- linear models. Natural Language Processing: Lecture Kairit Sirts Log- linear models Natural Language Processing: Lecture 3 21.09.2017 Kairit Sirts The goal of today s lecture Introduce the log- linear/maximum entropy model Explain the model components: features, parameters,

More information

An empirical study of smoothing techniques for language modeling

An empirical study of smoothing techniques for language modeling Computer Speech and Language (1999) 13, 359 394 Article No. csla.1999.128 Available online at http://www.idealibrary.com on An empirical study of smoothing techniques for language modeling Stanley F. Chen

More information

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)? Admin Updated slides/examples on backoff with absolute discounting (I ll review them again here today) Assignment 2 Watson vs. Humans (tonight-wednesday) PARING David Kauchak C159 pring 2011 some slides

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Statistical NLP Spring 2009

Statistical NLP Spring 2009 Statistical NLP Spring 2009 Lecture 5: WSD / Maxent Dan Klein UC Berkeley Learning Models with EM Hard EM: alternate between E-step: Find best completions Y for fixed θ M-step: Find best parameters θ for

More information

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Class Classification Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation

More information

A Simple (?) Exercise: Predicting the Next Word

A Simple (?) Exercise: Predicting the Next Word CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Compressed Representations of Text Documents

Compressed Representations of Text Documents Compressed Representations of Text Documents Papers by Paskov et al. 2013 1, 2016 3 Presented by Misha Khodak 10 May 2017 1 Hristo S. Paskov, Robert West, John C. Mitchell, and Trevor J. Hastie. Compressive

More information

Recap: lecture 2 CS276A Information Retrieval

Recap: lecture 2 CS276A Information Retrieval Recap: lecture 2 CS276A Information Retrieval Stemming, tokenization etc. Faster postings merges Phrase queries Lecture 3 This lecture Index compression Space estimation Corpus size for estimates Consider

More information

Bloom Filter and Lossy Dictionary Based Language Models

Bloom Filter and Lossy Dictionary Based Language Models Bloom Filter and Lossy Dictionary Based Language Models Abby D. Levenberg E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science Cognitive Science and Natural Language Processing School of Informatics

More information

CS 188: Artificial Intelligence Fall Machine Learning

CS 188: Artificial Intelligence Fall Machine Learning CS 188: Artificial Intelligence Fall 2007 Lecture 23: Naïve Bayes 11/15/2007 Dan Klein UC Berkeley Machine Learning Up till now: how to reason or make decisions using a model Machine learning: how to select

More information

Web Information Retrieval. Lecture 4 Dictionaries, Index Compression

Web Information Retrieval. Lecture 4 Dictionaries, Index Compression Web Information Retrieval Lecture 4 Dictionaries, Index Compression Recap: lecture 2,3 Stemming, tokenization etc. Faster postings merges Phrase queries Index construction This lecture Dictionary data

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

Models for Document & Query Representation. Ziawasch Abedjan

Models for Document & Query Representation. Ziawasch Abedjan Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview

More information

EECS 496 Statistical Language Models. Winter 2018

EECS 496 Statistical Language Models. Winter 2018 EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading

More information

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process!2 Indexes Storing document information for faster queries Indexes Index Compression

More information

Overview. Lecture 3: Index Representation and Tolerant Retrieval. Type/token distinction. IR System components

Overview. Lecture 3: Index Representation and Tolerant Retrieval. Type/token distinction. IR System components Overview Lecture 3: Index Representation and Tolerant Retrieval Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group 1 Recap 2

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

Query Evaluation Strategies

Query Evaluation Strategies Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Research (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa

More information

A Neuro Probabilistic Language Model Bengio et. al. 2003

A Neuro Probabilistic Language Model Bengio et. al. 2003 A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash

More information

Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models

Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,

More information

Information Retrieval. Lecture 3 - Index compression. Introduction. Overview. Characterization of an index. Wintersemester 2007

Information Retrieval. Lecture 3 - Index compression. Introduction. Overview. Characterization of an index. Wintersemester 2007 Information Retrieval Lecture 3 - Index compression Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Dictionary and inverted index:

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships

More information

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Machine Translation Machine Translation: Examples Levels of Transfer Word-Level MT: Examples la politique

More information

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison Natural Language Processing Basics Yingyu Liang University of Wisconsin-Madison Natural language Processing (NLP) The processing of the human languages by computers One of the oldest AI tasks One of the

More information

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze)

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 05 Index Compression 1 05 Index Compression - Information Retrieval - 05 Index Compression 2 Last lecture index construction Sort-based indexing

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without

More information

Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 19 Python Exercise on Naive Bayes Hello everyone.

More information

CSC 2515 Introduction to Machine Learning Assignment 2

CSC 2515 Introduction to Machine Learning Assignment 2 CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error

More information

Scalable Trigram Backoff Language Models

Scalable Trigram Backoff Language Models Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work

More information

UNIT 13B AI: Natural Language Processing. Announcement (1)

UNIT 13B AI: Natural Language Processing. Announcement (1) UNIT 13B AI: Natural Language Processing 1 Announcement (1) Exam on Wednesday November 28 Covered topics: Randomness, Concurrency, Internet, Simulation, AI, Recursion Rooms for Exam 3: Sections A, B, C,

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS3245 Information Retrieval Lecture 6: Index Compression 6 Last Time: index construction Sort- based indexing Blocked Sort- Based Indexing Merge sort is effective

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 21: ML: Naïve Bayes 11/10/2011 Dan Klein UC Berkeley Example: Spam Filter Input: email Output: spam/ham Setup: Get a large collection of example emails,

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 5: Index Compression Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-04-17 1/59 Overview

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

Faster and Smaller N-Gram Language Models

Faster and Smaller N-Gram Language Models Faster and Smaller N-Gram Language Models Adam Pauls Dan Klein Computer Science Division University of California, Berkeley {adpauls,klein}@csberkeleyedu Abstract N-gram language models are a major resource

More information

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014 CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up Nicki Dell Spring 2014 Final Exam As also indicated on the web page: Next Tuesday, 2:30-4:20 in this room Cumulative but

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 4: CSPs 9/9/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 1 Announcements Grading questions:

More information

Announcements. CS 188: Artificial Intelligence Fall Large Scale: Problems with A* What is Search For? Example: N-Queens

Announcements. CS 188: Artificial Intelligence Fall Large Scale: Problems with A* What is Search For? Example: N-Queens CS 188: Artificial Intelligence Fall 2008 Announcements Grading questions: don t panic, talk to us Newsgroup: check it out Lecture 4: CSPs 9/9/2008 Dan Klein UC Berkeley Many slides over the course adapted

More information

IE in Context. Machine Learning Problems for Text/Web Data

IE in Context. Machine Learning Problems for Text/Web Data Machine Learning Problems for Text/Web Data Lecture 24: Document and Web Applications Sam Roweis Document / Web Page Classification or Detection 1. Does this document/web page contain an example of thing

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Naïve Bayes Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document

More information

Artificial Intelligence Naïve Bayes

Artificial Intelligence Naïve Bayes Artificial Intelligence Naïve Bayes Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [M any slides adapted from those created by Dan Klein and Pieter Abbeel for CS188

More information

CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018

CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 Dan Jurafsky Tuesday, January 23, 2018 1 Part 1: Group Exercise We are interested in building

More information

Announcements. CS 188: Artificial Intelligence Spring Today. Example: Map-Coloring. Example: Cryptarithmetic.

Announcements. CS 188: Artificial Intelligence Spring Today. Example: Map-Coloring. Example: Cryptarithmetic. CS 188: Artificial Intelligence Spring 2010 Lecture 5: CSPs II 2/2/2010 Pieter Abbeel UC Berkeley Many slides from Dan Klein Announcements Project 1 due Thursday Lecture videos reminder: don t count on

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

Efficient Data Structures for Massive N-Gram Datasets

Efficient Data Structures for Massive N-Gram Datasets Efficient ata Structures for Massive N-Gram atasets Giulio Ermanno Pibiri University of Pisa and ISTI-CNR Pisa, Italy giulio.pibiri@di.unipi.it Rossano Venturini University of Pisa and ISTI-CNR Pisa, Italy

More information

Announcements: projects

Announcements: projects Announcements: projects 805 students: Project proposals are due Sun 10/1. If you d like to work with 605 students then indicate this on your proposal. 605 students: the week after 10/1 I will post the

More information

CS281 Section 3: Practical Optimization

CS281 Section 3: Practical Optimization CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical

More information

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,

More information

CS5112: Algorithms and Data Structures for Applications

CS5112: Algorithms and Data Structures for Applications CS5112: Algorithms and Data Structures for Applications Lecture 3: Hashing Ramin Zabih Some figures from Wikipedia/Google image search Administrivia Web site is: https://github.com/cornelltech/cs5112-f18

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

Query Evaluation Strategies

Query Evaluation Strategies Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Labs (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa Research

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificial Intelligence Fall 2006 Lecture 22: Naïve Bayes 11/14/2006 Dan Klein UC Berkeley Announcements Optional midterm On Tuesday 11/21 in class Review session 11/19, 7-9pm, in 306 Soda Projects

More information

Announcements. CS 188: Artificial Intelligence Fall Machine Learning. Classification. Classification. Bayes Nets for Classification

Announcements. CS 188: Artificial Intelligence Fall Machine Learning. Classification. Classification. Bayes Nets for Classification CS 88: Artificial Intelligence Fall 00 Lecture : Naïve Bayes //00 Announcements Optional midterm On Tuesday / in class Review session /9, 7-9pm, in 0 Soda Projects. due /. due /7 Dan Klein UC Berkeley

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING

LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING Joshua Goodman Speech Technology Group Microsoft Research Redmond, Washington 98052, USA joshuago@microsoft.com http://research.microsoft.com/~joshuago

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 4: Index Construction 1 Plan Last lecture: Dictionary data structures Tolerant retrieval Wildcards Spell correction Soundex a-hu hy-m n-z $m mace madden mo

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 04 Index Construction 1 04 Index Construction - Information Retrieval - 04 Index Construction 2 Plan Last lecture: Dictionary data structures Tolerant

More information

Steven Skiena. skiena

Steven Skiena.   skiena Lecture 22: Introduction to NP-completeness (1997) Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Among n people,

More information

Administrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks

Administrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks Administrative Index Compression! n Assignment 1? n Homework 2 out n What I did last summer lunch talks today David Kauchak cs458 Fall 2012 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 20: Naïve Bayes 4/11/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. W4 due right now Announcements P4 out, due Friday First contest competition

More information

Mr G s Java Jive. #11: Formatting Numbers

Mr G s Java Jive. #11: Formatting Numbers Mr G s Java Jive #11: Formatting Numbers Now that we ve started using double values, we re bound to run into the question of just how many decimal places we want to show. This where we get to deal with

More information

Detection and Extraction of Events from s

Detection and Extraction of Events from  s Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Text To Knowledge IR and Boolean Search Text to Knowledge (IE)

More information

Digital Libraries: Language Technologies

Digital Libraries: Language Technologies Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3) CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Language Model" Unigram language

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information