N-Gram Language Modelling including Feed-Forward NNs. Kenneth Heafield. University of Edinburgh
|
|
- Emory Anderson
- 5 years ago
- Views:
Transcription
1 N-Gram Language Modelling including Feed-Forward NNs Kenneth Heafield University of Edinburgh
2 History of Language Model History Kenneth Heafield University of Edinburgh
3 3
4 p(type Predictive) > p(tyler Predictive) 4
5 p(lose Win or) p(loose Win or) [Church et al, 2007] 5
6 Heated indoor swimming pool 6
7 Chambre Bedroom 7
8 présidente de la Chambre des représentants chairwoman of the Bedroom of Representatives 8
9 présidente de la Chambre des représentants chairwoman of the House of Representatives 9
10 présidente de la Chambre des représentants chairwoman of the House of Representatives p(chairwoman of the House of Representatives) > p(chairwoman of the Bedroom of Representatives) 10
11 p(another one bites the dust.) > p(another one rides the bus.) 11
12 Prediction Spelling Translation Speech Essential Component: Language Model p(moses Compiles) =? 12
13 Sequence Models Chain Rule p(moses compiles)=p(moses)p(compiles Moses) 13
14 Sequence Model log p(iran <s> )= log p(is <s> iran )= log p(one <s> iran is )= log p(of <s> iran is one )= log p(the <s> iran is one of )= log p(few <s> iran is one of the )= log p(countries <s> iran is one of the few )= log p(. <s> iran is one of the few countries )= log p(</s> <s> iran is one of the few countries.)= = log p(<s> iran is one of the few countries. </s> )=
15 Sequence Model log p(iran <s> )= log p(is <s> iran )= log p(one <s> iran is )= log p(of <s> iran is one )= log p(the <s> iran is one of )= log p(few <s> iran is one of the )= log p(countries <s> iran is one of the few )= log p(. <s> iran is one of the few countries )= log p(</s> <s> iran is one of the few countries.)= = log p(<s> iran is one of the few countries. </s> )= Explicit begin and end of sentence. 15
16 Sequence Model log p(iran <s> )= log p(is <s> iran )= log p(one <s> iran is )= log p(of <s> iran is one )= log p(the <s> iran is one of )= log p(few <s> iran is one of the )= log p(countries <s> iran is one of the few )= log p(. <s> iran is one of the few countries )= log p(</s> <s> iran is one of the few countries.)= = log p(<s> iran is one of the few countries. </s> )= Where do these probabilities come from? 16
17 Probabilities from Text Model p(raw in the) 17
18 Estimating from Text help in the search for an answer. Copper burned in the raw wood. put forward in the paper Highs in the 50s to lower 60s.. = p(raw in the)
19 Estimating from Text help in the search for an answer. Copper burned in the raw wood. put forward in the paper Highs in the 50s to lower 60s.. = p(raw in the) 1 4 p(ugrasena in the) 0 19
20 Estimating from Text help in the search for an answer. Copper burned in the raw wood. put forward in the paper Highs in the 50s to lower 60s.. = p(raw in the) 1 6 p(ugrasena in the)
21 Problem in the Ugrasena was not seen, but could happen. count(in the Ugrasena) p(ugrasena in the) = count(in the) = 0? 21
22 Problem in the Ugrasena was not seen, but could happen. count(in the Ugrasena) p(ugrasena in the) = = 0? count(in the) count(the Ugrasena) = = count(the) 22
23 Problem in the Ugrasena was not seen, but could happen. count(in the Ugrasena) p(ugrasena in the) = = 0? count(in the) count(the Ugrasena) = = count(the) Stupid Backoff: Drop context until count is non-zero [Brants et al, 2007] Can we be less stupid? 23
24 Smoothing in the Ugrasena was not seen, but could happen. 1 Backoff: maybe the Ugrasena was seen? 2 Neural Networks: classifier predicts next word 24
25 Backoff Smoothing in the Ugrasena was not seen try the Ugrasena p(ugrasena in the) p(ugrasena the) 25
26 Backoff Smoothing in the Ugrasena was not seen try the Ugrasena p(ugrasena in the) p(ugrasena the) the Ugrasena was not seen try Ugrasena p(ugrasena the) p(ugrasena) 26
27 Backoff Smoothing in the Ugrasena was not seen try the Ugrasena p(ugrasena in the) = p(ugrasena the)b(in the) the Ugrasena was not seen try Ugrasena p(ugrasena the) = p(ugrasena)b(the) Backoff b is a penalty for not matching context. 27
28 Backoff Language Model Unigrams Words log p log b <s> 2.0 iran is one of Bigrams Words log p log b <s> iran iran is is one one of Trigrams Words log p <s> iran is 1.1 iran is one 2.0 is one of
29 Backoff Language Model Unigrams Words log p log b <s> 2.0 iran is one of Bigrams Words log p log b <s> iran iran is is one one of Query log p(is <s> iran) = 1.1 Trigrams Words log p <s> iran is 1.1 iran is one 2.0 is one of
30 Backoff Language Model Unigrams Words log p log b <s> 2.0 iran is one of Bigrams Words log p log b <s> iran iran is is one one of Query : p(of iran is) log p(of) 2.5 log b(is) 1.4 log b(iran is) log p(of iran is) = 4.3 Trigrams Words log p <s> iran is 1.1 iran is one 2.0 is one of
31 Close words matter more. Though long-distance matters: Grammatical structure Topical coherence Words tend to repeat Cross-sentence dependencies 31
32 Where do p and b come from? Text! Kneser-Ney Witten-Bell Good-Turing 32
33 Kneser-Ney Common high-quality smoothing 1 Adjust 2 Normalize 3 Interpolate 33
34 Adjusted counts are: Trigrams Count in the text. Others Number of unique words to the left. Lower orders are used when a trigram did not match. How freely does the text associate with new words? 34
35 Adjusted counts are: Trigrams Count in the text. Others Number of unique words to the left. Lower orders are used when a trigram did not match. How freely does the text associate with new words? Input Trigam Count are one of 1 is one of 5 are two of 3 Output 1-gram Adjusted of 2 Output 2-gram Adjusted one of 2 two of 1 35
36 Discounting and Normalization Save mass for unseen events pseudo(w n w n 1 1 ) = adjusted(w 1 n ) discount n (adjusted(w1 n )) n 1 adjusted(w1 x) x Normalize 36
37 Discounting and Normalization Save mass for unseen events pseudo(w n w n 1 1 ) = adjusted(w 1 n ) discount n (adjusted(w1 n )) n 1 adjusted(w1 x) x Normalize Input 3-gram Adjusted are one of 1 are one that 2 is one of 5 Output 3-gram Pseudo are one of 0.26 are one that 0.47 is one of
38 Interpolate: Sparsity vs. Specificity Interpolate unigrams with the uniform distribution. 1 p(of) = pseudo(of) + backoff(ɛ) vocabulary 38
39 Interpolate: Sparsity vs. Specificity Interpolate unigrams with the uniform distribution, 1 p(of) = pseudo(of) + backoff(ɛ) vocabulary Interpolate bigrams with unigrams, etc. p(of one) = pseudo(of one) + backoff(one)p(of) 39
40 Interpolate: Sparsity vs. Specificity Interpolate unigrams with the uniform distribution, 1 p(of) = pseudo(of) + backoff(ɛ) vocabulary Interpolate bigrams with unigrams, etc. p(of one) = pseudo(of one) + backoff(one)p(of) Input n-gram pseudo interpolation weight of 0.1 backoff( ɛ) = 0.1 one of 0.2 backoff( one) = 0.3 are one of 0.4 backoff(are one) = 0.2 Output n-gram p of one of are one of
41 Kneser-Ney Intuition Adjust Measure association with new words. Normalize Leave space for unseen events. Interpolate Handle sparsity. How do we implement it? 41
42 LM queries often account for more than 50% of the CPU [Green et al, WMT 2014] 500 billion entries in my largest model Need speed and memory efficiency 42
43 Counting n-grams <s> Australia is one of the few 5-gram Count <s> Australia is one of 1 Australia is one of the 1 is one of the few 1 Hash table? 43
44 Counting n-grams <s> Australia is one of the few 5-gram Count <s> Australia is one of 1 Australia is one of the 1 is one of the few 1 Hash table? Runs out of RAM. 44
45 Spill to Disk When RAM Runs Out Text Hash Table File 45
46 Split Data Text Hash Table Hash Table File File 46
47 Split and Merge Text Hash Table Hash Table Sort Sort File File Merge Sort 47
48 Training Problem: Batch process large number of records. Solution: Split/merge Stupid backoff in one pass Kneser-Ney in three passes 48
49 Training Problem: Batch process large number of records. Solution: Split/merge Stupid backoff in one pass Kneser-Ney in three passes Training is designed for mutable batch access. What about queries? 49
50 Query stupid(w n w n 1 1 ) = count(w1 n ) count(w n 1 if count(w1 n ) > 0 1 ) 0.4stupid(w n w n 1 2 ) if count(w1 n ) = 0 stupid(few is one of the) count(is one of the few) = 5 count(is one of the) = 12 50
51 Query stupid(w n w n 1 1 ) = count(w1 n ) count(w n 1 if count(w1 n ) > 0 1 ) 0.4stupid(w n w n 1 2 ) if count(w1 n ) = 0 stupid(periwinkle is one of the) count(is one of the periwinkle) = 0 count(one of the periwinkle) = 0 count(of the periwinkle) = 0 count(the periwinkle) = 3 count(the) =
52 Save Memory: Forget Keys Giant hash table with n-grams as keys and counts as values. Replace the n-grams with 64-bit hashes: Store hash(is one of) instead of is one of. Ignore collisions. 52
53 Save Memory: Forget Keys Giant hash table with n-grams as keys and counts as values. Replace the n-grams with 64-bit hashes: Store hash(is one of) instead of is one of. Ignore collisions. Birthday attack: 2 64 = = Low chance of collision until 4 billion entries. 53
54 Default Hash Table boost::unordered_map and gnu_cxx::hash_map n-grams Bucket array 54
55 Default Hash Table boost::unordered_map and gnu_cxx::hash_map n-grams Bucket array Lookup requires two random memory accesses. 55
56 Linear Probing Hash Table 1.5 buckets/entry (so buckets = 6). Ideal bucket = hash mod buckets. Resolve bucket collisions using the next free bucket. Bigrams Words Ideal Hash Count iran is 0 0x959e48455f4a2e90 3 0x0 0 is one 2 0x186a7caef34acf16 5 one of 2 0xac db8dac 2 <s> iran 4 0xf0ae9c2442c6920e 1 0x0 0 56
57 Minimal Perfect Hash Table Maps every n-gram to a unique integer [0, n grams ) Use these as array offsets. 57
58 Minimal Perfect Hash Table Maps every n-gram to a unique integer [0, n grams ) Use these as array offsets. Entries not in the model get assigned offsets Store a fingerprint of each n-gram 58
59 Minimal Perfect Hash Table Maps every n-gram to a unique integer [0, n grams ) Use these as array offsets. Low memory, but potential for false positives 59
60 Less Memory: Sorted Array Look up zebra in a dictionary. Binary search Open in the middle. O(log n) time. Interpolation search Open near the end. O(log log n) time. 60
61 100 Lookups/µs 10 1 probing hash_set unordered_set interpolation binary_search set Entries 61
62 Trie Reverse n-grams, arrange in a trie. is of one are <s> is Australia <s> is Australia one are are Australia <s> 62
63 Saving More RAM Quantization: store approximate values Collapse probability and backoff 63
64 Implementation Summary Implementation involves sparse mapping Hash table Probing hash table Minimal perfect hash table Sorted array with binary or interpolation search 64
65 Problem with Backoff: in the Ugrasena kingdom Ugrasena is unseen use unigram to predict kingdom p(kingdom in the Ugrasena) = p(kingdom) One rare word of context breaks conditioning. 65
66 More Conditioning Word class/cluster model Skip-gram: skip one or more context words Neural networks 66
67 More Conditioning Word class/cluster model Skip-gram: skip one or more context words Neural networks 67
68 Neural N-grams (aka Feed-Forward) Language modeling as classification: Input in the Ugrasena Output kingdom Predict (classify) the next word given preceding words. 68
69 Turning Words into Vectors in kingdom Ugrasena the Assign each word a unique row 69
70 Query: Concatenate Vectors in the Ugrasena Neural Network Prediction 70
71 Issue: Large Vocabulary The input has vocab history dimensions. = The input matrix has vocab history hidden parameters. Say vocab 1 billion. About 1 trillion parameters. 71
72 Dealing with Vocabulary Size Word vectors Map unpopular words to unknown or part of speech Don t use words: characters or short snippets 72
73 Turning Words into Vectors in kingdom Ugrasena the Vectors from the input matrix... or your favorite ACL paper. 73
74 Use Word Vectors Steal word representation from another task (language modeling, IR, SVD). Reduce from vocab one-hot to 100 dimensional dense input. Pro Other task might have more data Con Does not jointly learn word representation 74
75 Output vocabulary Output is still vocab -way classification. = Large output matrix, lots to normalize over. = Slow evaluation, overfitting. 75
76 Dealing with Vocabulary Size Word vectors Map unpopular words to unknown or part of speech Don t use words: characters or short snippets 76
77 Translation Modeling p(bedroom of the American, de la Chambre des représentants des États-Unis) [Devin et al 2014] 77
78 Conclusion Language models measure fluency. Neural networks and backoff are the dominant formalisms.... But you should probably use recurrent networks. 78
KenLM: Faster and Smaller Language Model Queries
KenLM: Faster and Smaller Language Model Queries Kenneth heafield@cs.cmu.edu Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm What KenLM Does Answer language model queries using less time and memory.
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationNatural Language Processing
Natural Language Processing N-grams and minimal edit distance Pieter Wellens 2012-2013 These slides are based on the course materials from the ANLP course given at the School of Informatics, Edinburgh
More informationSmoothing. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler
Smoothing BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de November 1, 2016 Last Week Language model: P(Xt = wt X1 = w1,...,xt-1 = wt-1)
More informationCS159 - Assignment 2b
CS159 - Assignment 2b Due: Tuesday, Sept. 23 at 2:45pm For the main part of this assignment we will be constructing a number of smoothed versions of a bigram language model and we will be evaluating its
More informationAlgorithms for NLP. Language Modeling II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Language Modeling II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Should be able to really start project after today s lecture Get familiar with bit-twiddling
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationLog- linear models. Natural Language Processing: Lecture Kairit Sirts
Log- linear models Natural Language Processing: Lecture 3 21.09.2017 Kairit Sirts The goal of today s lecture Introduce the log- linear/maximum entropy model Explain the model components: features, parameters,
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationNatural Language Processing
Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The
More informationCS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018
CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 Dan Jurafsky Tuesday, January 23, 2018 1 Part 1: Group Exercise We are interested in building
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationTable ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum
Table ADT and Sorting Algorithm topics continuing (or reviewing?) CS 24 curriculum A table ADT (a.k.a. Dictionary, Map) Table public interface: // Put information in the table, and a unique key to identify
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Language Model" Unigram language
More informationUnderstand how to deal with collisions
Understand the basic structure of a hash table and its associated hash function Understand what makes a good (and a bad) hash function Understand how to deal with collisions Open addressing Separate chaining
More informationCMSC 341 Lecture 16/17 Hashing, Parts 1 & 2
CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash
More informationData Structures And Algorithms
Data Structures And Algorithms Hashing Eng. Anis Nazer First Semester 2017-2018 Searching Search: find if a key exists in a given set Searching algorithms: linear (sequential) search binary search Search
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationModule 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo
Module 5: Hashing CS 240 - Data Structures and Data Management Reza Dorrigiv, Daniel Roche School of Computer Science, University of Waterloo Winter 2010 Reza Dorrigiv, Daniel Roche (CS, UW) CS240 - Module
More informationHash-Based Indexing 1
Hash-Based Indexing 1 Tree Indexing Summary Static and dynamic data structures ISAM and B+ trees Speed up both range and equality searches B+ trees very widely used in practice ISAM trees can be useful
More informationCoding for Random Projects
Coding for Random Projects CS 584: Big Data Analytics Material adapted from Li s talk at ICML 2014 (http://techtalks.tv/talks/coding-for-random-projections/61085/) Random Projections for High-Dimensional
More informationCSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018
CSE 332: Data Structures & Parallelism Lecture 10:Hashing Ruth Anderson Autumn 2018 Today Dictionaries Hashing 10/19/2018 2 Motivating Hash Tables For dictionary with n key/value pairs insert find delete
More informationKathleen Durant PhD Northeastern University CS Indexes
Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look
More informationSymbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management
Hashing Symbol Table Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management In general, the following operations are performed on
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+
More informationEfficient Data Structures for Massive N-Gram Datasets
Efficient ata Structures for Massive N-Gram atasets Giulio Ermanno Pibiri University of Pisa and ISTI-CNR Pisa, Italy giulio.pibiri@di.unipi.it Rossano Venturini University of Pisa and ISTI-CNR Pisa, Italy
More informationEECS 496 Statistical Language Models. Winter 2018
EECS 496 Statistical Language Models Winter 2018 Introductions Professor: Doug Downey Course web site: www.cs.northwestern.edu/~ddowney/courses/496_winter2018 (linked off prof. home page) Logistics Grading
More informationHash Tables. Hash Tables
Hash Tables Hash Tables Insanely useful One of the most useful and used data structures that you ll encounter They do not support many operations But they are amazing at the operations they do support
More informationHash-Based Indexes. Chapter 11
Hash-Based Indexes Chapter 11 1 Introduction : Hash-based Indexes Best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist: Trade-offs similar to ISAM vs.
More informationOpen Addressing: Linear Probing (cont.)
Open Addressing: Linear Probing (cont.) Cons of Linear Probing () more complex insert, find, remove methods () primary clustering phenomenon items tend to cluster together in the bucket array, as clustering
More informationIntroduction to Information Retrieval (Manning, Raghavan, Schutze)
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 3 Dictionaries and Tolerant retrieval Chapter 4 Index construction Chapter 5 Index compression Content Dictionary data structures
More informationHashed-Based Indexing
Topics Hashed-Based Indexing Linda Wu Static hashing Dynamic hashing Extendible Hashing Linear Hashing (CMPT 54 4-) Chapter CMPT 54 4- Static Hashing An index consists of buckets 0 ~ N-1 A bucket consists
More informationCMSC 723: Computational Linguistics I Session #12 MapReduce and Data Intensive NLP. University of Maryland. Wednesday, November 18, 2009
CMSC 723: Computational Linguistics I Session #12 MapReduce and Data Intensive NLP Jimmy Lin and Nitin Madnani University of Maryland Wednesday, November 18, 2009 Three Pillars of Statistical NLP Algorithms
More informationAdvanced Algorithmics (6EAP) MTAT Hashing. Jaak Vilo 2016 Fall
Advanced Algorithmics (6EAP) MTAT.03.238 Hashing Jaak Vilo 2016 Fall Jaak Vilo 1 ADT asscociative array INSERT, SEARCH, DELETE An associative array (also associative container, map, mapping, dictionary,
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 3: Dictionaries and tolerant retrieval 1 Outline Dictionaries Wildcard queries skip Edit distance skip Spelling correction skip Soundex 2 Inverted index Our
More informationCARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS
CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/ general
More informationFast Lookup: Hash tables
CSE 100: HASHING Operations: Find (key based look up) Insert Delete Fast Lookup: Hash tables Consider the 2-sum problem: Given an unsorted array of N integers, find all pairs of elements that sum to a
More informationIntroduction to Hashing
Lecture 11 Hashing Introduction to Hashing We have learned that the run-time of the most efficient search in a sorted list can be performed in order O(lg 2 n) and that the most efficient sort by key comparison
More informationA Simple (?) Exercise: Predicting the Next Word
CS11-747 Neural Networks for NLP A Simple (?) Exercise: Predicting the Next Word Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Are These Sentences OK? Jane went to the store. store to Jane
More informationBloom Filter and Lossy Dictionary Based Language Models
Bloom Filter and Lossy Dictionary Based Language Models Abby D. Levenberg E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science Cognitive Science and Natural Language Processing School of Informatics
More informationTABLES AND HASHING. Chapter 13
Data Structures Dr Ahmed Rafat Abas Computer Science Dept, Faculty of Computer and Information, Zagazig University arabas@zu.edu.eg http://www.arsaliem.faculty.zu.edu.eg/ TABLES AND HASHING Chapter 13
More informationHash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1
Hash Open Indexing Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up Consider a StringDictionary using separate chaining with an internal capacity of 10. Assume our buckets are implemented
More informationIndex Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search
Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More information1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1
Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 2. O(n) 2. [1 pt] What is the solution to the recurrence T(n) = T(n/2) + n, T(1)
More informationHashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms
Hashing CmSc 250 Introduction to Algorithms 1. Introduction Hashing is a method of storing elements in a table in a way that reduces the time for search. Elements are assumed to be records with several
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 15, March 15, 2015 Mohammad Hammoud Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+ Tree) and Hash-based (i.e., Extendible
More informationDATA STRUCTURES/UNIT 3
UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.
More informationIndexing. Lecture Objectives. Text Technologies for Data Science INFR Learn about and implement Boolean search Inverted index Positional index
Text Technologies for Data Science INFR11145 Indexing Instructor: Walid Magdy 03-Oct-2017 Lecture Objectives Learn about and implement Boolean search Inverted index Positional index 2 1 Indexing Process
More informationB+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert
B+ Tree Review CSE2: Data Abstractions Lecture 10: More B Trees; Hashing Dan Grossman Spring 2010 M-ary tree with room for L data items at each leaf Order property: Subtree between keys x and y contains
More informationModule 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.
The Lecture Contains: Hashing Dynamic hashing Extendible hashing Insertion file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture9/9_1.htm[6/14/2012
More informationLign/CSE 256, Programming Assignment 1: Language Models
Lign/CSE 256, Programming Assignment 1: Language Models 16 January 2008 due 1 Feb 2008 1 Preliminaries First, make sure you can access the course materials. 1 The components are: ˆ code1.zip: the Java
More informationSparse Non-negative Matrix Language Modeling
Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language
More informationHash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Hash-Based Indexes Chapter Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k
More informationData Structures and Algorithms
Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationUNIT 6A Organizing Data: Lists Principles of Computing, Carnegie Mellon University
UNIT 6A Organizing Data: Lists Carnegie Mellon University 1 Data Structure The organization of data is a very important issue for computation. A data structure is a way of storing data in a computer so
More informationSelection Queries. to answer a selection query (ssn=10) needs to traverse a full path.
Hashing B+-tree is perfect, but... Selection Queries to answer a selection query (ssn=) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any
More informationData and File Structures Chapter 11. Hashing
Data and File Structures Chapter 11 Hashing 1 Motivation Sequential Searching can be done in O(N) access time, meaning that the number of seeks grows in proportion to the size of the file. B-Trees improve
More informationHash[ string key ] ==> integer value
Hashing 1 Overview Hash[ string key ] ==> integer value Hash Table Data Structure : Use-case To support insertion, deletion and search in average-case constant time Assumption: Order of elements irrelevant
More informationPS2 out today. Lab 2 out today. Lab 1 due today - how was it?
6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationThe dictionary problem
6 Hashing The dictionary problem Different approaches to the dictionary problem: previously: Structuring the set of currently stored keys: lists, trees, graphs,... structuring the complete universe of
More informationSome Practice Problems on Hardware, File Organization and Indexing
Some Practice Problems on Hardware, File Organization and Indexing Multiple Choice State if the following statements are true or false. 1. On average, repeated random IO s are as efficient as repeated
More informationUNIT 6A Organizing Data: Lists. Last Two Weeks
UNIT 6A Organizing Data: Lists 1 Last Two Weeks Algorithms: Searching and sorting Problem solving technique: Recursion Asymptotic worst case analysis using the big O notation 2 1 This Week Observe: The
More informationReview: Memory, Disks, & Files. File Organizations and Indexing. Today: File Storage. Alternative File Organizations. Cost Model for Analysis
File Organizations and Indexing Review: Memory, Disks, & Files Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck, and Co.,
More informationRecurrent Neural Networks
Recurrent Neural Networks 11-785 / Fall 2018 / Recitation 7 Raphaël Olivier Recap : RNNs are magic They have infinite memory They handle all kinds of series They re the basis of recent NLP : Translation,
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly
More informationAn empirical study of smoothing techniques for language modeling
Computer Speech and Language (1999) 13, 359 394 Article No. csla.1999.128 Available online at http://www.idealibrary.com on An empirical study of smoothing techniques for language modeling Stanley F. Chen
More informationCMSC 341 Hashing (Continued) Based on slides from previous iterations of this course
CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course Today s Topics Review Uses and motivations of hash tables Major concerns with hash tables Properties Hash function Hash
More informationScalable Trigram Backoff Language Models
Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work
More informationHashing. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University
Hashing CptS 223 Advanced Data Structures Larry Holder School of Electrical Engineering and Computer Science Washington State University 1 Overview Hashing Technique supporting insertion, deletion and
More informationFinding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing
Finding Similar Sets Applications Shingling Minhashing Locality-Sensitive Hashing Goals Many Web-mining problems can be expressed as finding similar sets:. Pages with similar words, e.g., for classification
More informationLecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40
Lecture 16 Hashing Hash table and hash function design Hash functions for integers and strings Collision resolution strategies: linear probing, double hashing, random hashing, separate chaining Hash table
More informationModels for Document & Query Representation. Ziawasch Abedjan
Models for Document & Query Representation Ziawasch Abedjan Overview Introduction & Definition Boolean retrieval Vector Space Model Probabilistic Information Retrieval Language Model Approach Summary Overview
More informationLearning Data Systems Components
Work partially done at Learning Data Systems Components Tim Kraska [Disclaimer: I am NOT talking on behalf of Google] Comments on Social Media Joins Sorting HashMaps Tree Bloom Filter
More informationGraphs. Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1
Graphs Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1 Warmup Discuss with your neighbors: Come up with as many kinds of relational data as you can (data that can be represented with a graph).
More informationPlease note that some of the resources used in this assignment require a Stanford Network Account and therefore may not be accessible.
Please note that some of the resources used in this assignment require a Stanford Network Account and therefore may not be accessible. CS 224N / Ling 237 Programming Assignment 1: Language Modeling Due
More informationIndexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton
Indexing Index Construction CS6200: Information Retrieval Slides by: Jesse Anderton Motivation: Scale Corpus Terms Docs Entries A term incidence matrix with V terms and D documents has O(V x D) entries.
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationIII Data Structures. Dynamic sets
III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 4: Index Construction 1 Plan Last lecture: Dictionary data structures Tolerant retrieval Wildcards Spell correction Soundex a-hu hy-m n-z $m mace madden mo
More informationCOSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Data Structures Hashing Structures Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Hashing Structures I. Motivation and Review II. Hash Functions III. HashTables I. Implementations
More informationData Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University
Data Structure Lecture#22: Searching 3 (Chapter 9) U Kang Seoul National University U Kang 1 In This Lecture Motivation of collision resolution policy Open hashing for collision resolution Closed hashing
More informationHashing for searching
Hashing for searching Consider searching a database of records on a given key. There are three standard techniques: Searching sequentially start at the first record and look at each record in turn until
More informationFile Structures and Indexing
File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures
More informationCS698F Advanced Data Management. Instructor: Medha Atre. Aug 11, 2017 CS698F Adv Data Mgmt 1
CS698F Advanced Data Management Instructor: Medha Atre Aug 11, 2017 CS698F Adv Data Mgmt 1 Recap Query optimization components. Relational algebra rules. How to rewrite queries with relational algebra
More informationCOS 318: Operating Systems. File Systems. Topics. Evolved Data Center Storage Hierarchy. Traditional Data Center Storage Hierarchy
Topics COS 318: Operating Systems File Systems hierarchy File system abstraction File system operations File system protection 2 Traditional Data Center Hierarchy Evolved Data Center Hierarchy Clients
More informationWorst-case running time for RANDOMIZED-SELECT
Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case
More information1. Attempt any three of the following: 15
(Time: 2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory. (2) Make suitable assumptions wherever necessary and state the assumptions made. (3) Answers to the same question must be written
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationFile Organization and Storage Structures
File Organization and Storage Structures o Storage of data File Organization and Storage Structures Primary Storage = Main Memory Fast Volatile Expensive Secondary Storage = Files in disks or tapes Non-Volatile
More informationAdministrative. Distributed indexing. Index Compression! What I did last summer lunch talks today. Master. Tasks
Administrative Index Compression! n Assignment 1? n Homework 2 out n What I did last summer lunch talks today David Kauchak cs458 Fall 2012 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt
More information