KenLM: Faster and Smaller Language Model Queries

Size: px

Start display at page:

Download "KenLM: Faster and Smaller Language Model Queries"

Joshua Poole
5 years ago
Views:

1 KenLM: Faster and Smaller Language Model Queries Kenneth Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm

2 What KenLM Does Answer language model queries using less time and memory. log p(<s> iran) = log p(<s> iran is ) = log p(<s> iran is one) = log p(<s> iran is one of ) = log p( iran is one of the ) = log p( is one of the few ) =

3 Related Work Downloadable Baselines SRI Popular and considered fast but high-memory IRST Open source, low-memory, single-threaded Rand Low-memory lossy compression MIT Mostly estimates models but also does queries Papers Without Code TPT Better memory locality Sheffield Lossy compression techniques

4 Related Work Downloadable Baselines SRI Popular and considered fast but high-memory IRST Open source, low-memory, single-threaded Rand Low-memory lossy compression MIT Mostly estimates models but also does queries Papers Without Code TPT Better memory locality Sheffield Lossy compression techniques After KenLM s Public Release Berkeley Java; slower and larger than KenLM

5 Why I Wrote KenLM Decoding takes too long Answer queries quickly Load quickly with memory mapping Thread-safe

6 Why I Wrote KenLM Decoding takes too long Answer queries quickly Load quickly with memory mapping Thread-safe Bigger models Conserve memory

7 Why I Wrote KenLM Decoding takes too long Answer queries quickly Load quickly with memory mapping Thread-safe Bigger models Conserve memory SRI doesn t compile Distribute and compile with decoders

8 Outline State 1 State 2 Probing 3 Perplexity Translation

9 Example Language Model State Unigrams Words log p Back <s> iran is one of Bigrams Words log p Back <s> iran iran is is one one of Trigrams Words log p <s> iran is -1.1 iran is one -2.0 is one of -0.3

10 Example Queries State Unigrams Words log p Back <s> iran is one of Bigrams Words log p Back <s> iran iran is is one one of Trigrams Words log p <s> iran is -1.1 iran is one -2.0 is one of -0.3 Query: <s> iran is log p(<s> iran is) = -1.1 Query: iran is of log p(of) -2.5 Backoff(is) -1.4 Backoff(iran is) log p(iran is of) = -4.3

11 State Lookups Performed by Queries <s> iran is Lookup 1 is 2 iran is 3 <s> iran is Score log p(<s> iran is) = -1.1 Lookup 1 of 2 is of (not found) 3 is 4 iran is Score iran is of log p(of) -2.5 Backoff(is) -1.4 Backoff(iran is) log p(iran is of) = -4.3

12 State Lookups Performed by Queries <s> iran is Lookup 1 is 2 iran is 3 <s> iran is Score log p(<s> iran is) = -1.1 Lookup 1 of 2 is of (not found) 3 is 4 iran is Score iran is of log p(of) -2.5 Backoff(is) -1.4 Backoff(iran is) log p(iran is of) = -4.3

13 State Lookups Performed by Queries <s> iran is Lookup 1 is 2 iran is 3 <s> iran is Score log p(<s> iran is) = -1.1 State Backoff(is) Backoff(iran is) Lookup 1 of 2 is of (not found) 3 is 4 iran is Score iran is of log p(of) -2.5 Backoff(is) -1.4 Backoff(iran is) log p(iran is of) = -4.3

14 Stateful Query Pattern State log p(<s> iran) = -3.3 log p(<s> iran is ) = -1.1 log p( iran is one) = -2.0 log p( is one of ) = -0.3

15 Stateful Query Pattern State Backoff(<s>) log p(<s> iran) = -3.3 Backoff(iran), Backoff(<s> iran) log p(<s> iran is ) = -1.1 Backoff(is), Backoff(iran is) log p( iran is one) = -2.0 Backoff(one), Backoff(is one) log p( is one of ) = -0.3 Backoff(of), Backoff(one of)

16 Probing Probing Fast. Uses hash tables. Small. Uses sorted arrays. Smaller. with compressed pointers. Key Subproblem Sparse lookup: efficiently retrieve values for sparse keys

17 Probing Sparse Lookup Speed 100 Lookups/µs 10 1 probing hash set unordered interpolation binary search set Entries

18 Probing Sparse Lookup Speed 100 Lookups/µs 10 1 probing hash set unordered interpolation binary search set Entries

19 Probing Linear Probing Hash Table Store 64-bit hashes and ignore collisions. Bigrams Words Hash log p Back <s> iran 0xf0ae9c2442c6920e iran is 0x959e48455f4a2e is one 0x186a7caef34acf one of 0xac db8dac

20 Probing Linear Probing Hash Table 1.5 buckets/entry (so buckets = 6). Ideal bucket = hash mod buckets. Resolve bucket collisions using the next free bucket. Bigrams Words Ideal Hash log p Back iran is 0 0x959e48455f4a2e x0 0 0 is one 2 0x186a7caef34acf one of 2 0xac db8dac <s> iran 4 0xf0ae9c2442c6920e x0 0 0 Array

21 Probing Probing Data Structure Unigrams Words log p Back <s> iran is one of Array Bigrams Words log p Back <s> iran iran is is one one of Probing Hash Table Trigrams Words log p <s> iran is -1.1 iran is one -2.0 is one of -0.3 Probing Hash Table

22 Probing Probing Hash Table Summary Hash tables are fast. But memory is 24 bytes/entry. Next: Saving memory with.

23 Probing Uses Sorted Arrays Sort in suffix order. Unigrams Words log p Back Ptr <s> iran is one of Bigrams Words log p Back Ptr <s> iran iran is one is <s> one is one one of Trigrams Words log p <s> iran is -1.1 <s> one is -2.3 iran is one -2.0 <s> one of -0.5 is one of -0.3

24 Probing Sort in suffix order. Encode suffix using pointers. Unigrams Words log p Back Ptr <s> iran is one of Array Bigrams Words log p Back Ptr <s> iran <s> is iran is one is <s> one is one one of Array Trigrams Words log p <s> iran is -1.1 <s> one is -2.3 iran is one -2.0 <s> one of -0.5 is one of -0.3 Array

25 Probing Interpolation Search In Interpolation Search O(log log n) Each trie node is a sorted array. Bigrams: * is Words log p Back Ptr <s> is iran is one is key A.first pivot = A A.last A.first Binary Search: O(logn) pivot = A 2

26 Probing Saving Memory with Bit-Level Packing Store word index and pointer using the minimum number of bits. Optional Quantization Cluster floats into 2 q bins, store q bits/float (same as IRSTLM).

27 Probing : Compress Pointers Bigrams Words log p Back Ptr <s> iran iran is one is <s> one is one one of Increasing

28 Probing : Compress Pointers Offset Ptr Binary Raj and Whittaker (2003)

29 Probing : Compress Pointers Offset Ptr Binary ped Offset 1 6 Raj and Whittaker (2003)

30 Probing : Compress Pointers Offset Ptr Binary ped Offset Raj and Whittaker (2003)

31 Probing / Summary Save memory: bit packing, quantization, and pointer compression.

32 Outline Perplexity Translation 1 State 2 Probing 3 Perplexity Translation

33 Perplexity Task Perplexity Translation Score the English Gigaword corpus. Model SRILM 5-gram from Europarl + De-duplicated News Crawl Measurements Queries/ms Loaded Memory Peak Memory Excludes loading and file reading time Resident after loading Peak virtual after scoring

34 Perplexity Task: Exact Models 2000 Ken Probing Perplexity Translation 1500 Queries/ms 1000 Ken SRI 500 Ken IRST Inverted IRST SRI Compact MIT 0 Loaded Peak Memory (GB)

35 Perplexity Translation Perplexity Task: Berkeley Always Quantizes to 19 bits 2000 Ken Probing 1500 Queries/ms Ken Ken IRST Inverted 19 IRST Berkeley Compress 19 Berkeley Scroll Berkeley Hash SRI Compact MIT Memory (GB) SRI

36 Perplexity Translation Perplexity Task: RandLM from an ARPA file 2000 Ken Probing 1500 Queries/ms Ken 8 Ken 8 Berkeley Scroll Berkeley Hash IRST Inverted 19 IRST SRI Compact Berkeley Compress 19 8 Rand Backoff p(false) = 2 8 MIT Memory (GB) SRI

37 Translation Task Perplexity Translation Translate 3003 sentences using Moses. System WMT 2011 French-English baseline, Europarl+News LM Measurements Time Total wall time, including loading Memory Total resident memory after decoding

38 Moses Benchmarks: 8 Threads 3 4 Perplexity Translation 8 Rand Backoff 2 8 false Time (h) Probing SRI Memory (GB)

39 Perplexity Translation Moses Benchmarks: Single Threaded 4 8 Rand Backoff 2 8 false 3 Time (h) IRST SRI 8 8 Probing Memory (GB)

40 Perplexity Translation Comparison to RandLM (Unpruned Model, One Thread) 6 5 Rand Backoff p(false) = 2 8 p(false) = 2 10 Time (h) Rand Stupid Backoff p(false) = 2 8 p(false) = Memory (GB)

41 Conclusion Perplexity Translation Maximize speed and accuracy subject to memory. Probing > > > RandLM Stupid for both speed and memory. Distributed with decoders: Moses cdec Joshua file KLanguageModel use kenlm=true kheafield.com/code/kenlm/

N-Gram Language Modelling including Feed-Forward NNs. Kenneth Heafield. University of Edinburgh

N-Gram Language Modelling including Feed-Forward NNs Kenneth Heafield University of Edinburgh History of Language Model History Kenneth Heafield University of Edinburgh 3 p(type Predictive) > p(tyler Predictive)