Amusing algorithms and data-structures that power Lucene and Elasticsearch. Adrien Grand

Size: px

Start display at page:

Download "Amusing algorithms and data-structures that power Lucene and Elasticsearch. Adrien Grand"

Bonnie Shaw
5 years ago
Views:

1 Amusing algorithms and data-structures that power Lucene and Elasticsearch Adrien Grand

2 Agenda conjunctions regexp queries numeric doc values compression cardinality aggregation

3 How are conjunctions implemented?

4 Inverted index elastic 0 49 index 5 lucene shard 9 0 Terms dictionary Doc freq Postings lists 4

5 Inverted index next next elastic index lucene shard 9 0 Terms dictionary Doc freq Postings lists 5

6 Inverted index advance(0) Uses skip lists elastic 0 49 advance( search ) Uses a tiny in-memory terms index. index lucene shard Terms dictionary Doc freq Postings lists 6

7 Conjunctions. Sort by cost

8 Conjunctions Sort by cost. Leap frog!

9 Conjunctions next

10 Conjunctions next advance() TOO FAR

11 Conjunctions next advance() advance() TOO FAR

12 Conjunctions next advance() advance() TOO FAR 5 already on

13 Conjunctions next advance() advance() TOO FAR already on advance() MATCH 7

14 Conjunctions next advance() advance() TOO FAR already on advance() next 7 MATCH 7 4

15 Conjunctions next advance() advance() TOO FAR already on advance() next 7 MATCH 7 advance(7) TOO FAR 5

16 Conjunctions next advance() advance() TOO FAR already on advance() next 7 MATCH 7 advance(7) TOO FAR 6

17 Conjunctions next advance() advance() TOO FAR already on advance() next 7 MATCH 7 advance(7) advance() TOO FAR 7

18 Conjunctions next advance() advance() TOO FAR already on advance() next 7 MATCH 7 advance(7) advance() advance() TOO FAR 8

19 Conjunctions next advance() advance() TOO FAR already on advance() next 7 MATCH 7 advance(7) advance() advance() advance() TOO FAR MATCH 9

20 Conjunctions next advance() advance() TOO FAR already on advance() next 7 MATCH 7 advance(7) advance() advance() advance() TOO FAR MATCH next END 0

21 How do regexp queries work?

22 Regexp queries elastic 0 49 Challenge: find matching terms and merge postings lists index 5 Naive way: lucene search iterate over terms - evaluate regexp against every term shard 9 0 SLOWWWWWW

23 Regexp queries Ela[Ss]tic.* E l a S t i c s *

24 Regexp queries Not limited to regexps Fuzzy queries too! example: es~ 4

25 How are numeric doc values compressed? 5

26 Aggregation Execution What is average price of green docs? color blue green red Doc IDs 0, 4, 5 (inverted index) 6

27 Aggregation Execution What is average price of green docs? color Doc IDs Doc ID price blue green red 0, 4, ( ) = (field data) 7

28 Field Data and Doc Values Field Data In-memory, lives on JVM Heap All-or-nothing Lazily constructed at query-time Doc Values Disk-based, leverages OS FS cache Pages in/out of FS cache Precomputed at index-time Allows better compression 8

29 Allows better compression Lots of cool tricks, let s dive in! 9

30 Default strategy Delta Encoding + bit packing Doc ID price Encoding Compute min and max values: min= max=9 max - min = 7 so deltas require bits per value Encode the min value Encode deltas on bits per value: [,0,0,,7,] requires 8 bits 0

31 Default strategy Delta Encoding + bit packing Doc ID price Decoding For doc ID d Read bits at offset d* bits Add the min value

32 Less than 56 unique values Table encoding Doc ID price Encoding Dedup and sort values: [, 4, 5, 9] Encode the table index for every doc: [, 0, 0,,, ] bits per value

33 Less than 56 unique values Table encoding Doc ID price Decoding For doc ID d: Read bits at offset d* bits Gives table index t Read value in table at index t

34 Timestamps without full precision GCD compression Doc ID price 7 5 Encoding Compute minimum value: Compute deltas: [0, 0, 0, 0, 70, 50] Compute GCD of the deltas: 0 Encode min value and GCD Then encode quotients using bit packing [,,,0,7,5]: bits per value 4

35 Timestamps without full precision GCD compression Doc ID price 7 5 Decoding For doc ID d: Read bits at offset d* bits Multiply by the GCD Add the min value 5

36 How does the Cardinality agg work? bit-pattern observable magic 6

37 Distinct Counts: Naive Solution Cardinality agg == SELECT Count(Distinct foo) 7

38 Distinct Counts: Naive Solution Maintain a set of all values blue green red Cardinality == set.size() set.size() == n Memory usage == n * size of each term (Ignoring set overhead) 8

39 Distinct Counts: Naive Solution Gets worse in distributed environment blue green red Node abc xyz abc green xyz Node Node 9

40 Distinct Counts: Naive Solution Gets worse in distributed environment blue green red Node abc xyz abc green xyz Node Merge blue abc green green abc red xyz xyz Node 4 Node 40

41 Distinct Counts: HyperLogLog++ Cardinality agg uses HyperLogLog++ instead Approximates cardinality Uses only a few Kb of memory for billions of distinct values Fast! Lossless unions 4

42 Bit-Observable Patterns Let s flip some coins Probability of a run n 4

43 Bit-Observable Patterns Let s flip some coins Probability of a run n 5 heads in a row 4

44 Bit-Observable Patterns Let s flip some coins Probability of a run n 5 heads in a row 0 heads in a row

45 Bit-Observable Patterns Let s flip some coins Probability of a run n 5 heads in a row 0 heads in a row Could do this in one sitting Might take several days 45

46 Key Insight: ^(Length of the run) ~= duration of coin flipping 46

47 Bit-Observable Patterns Let s hash values, instead of flipping coins v = 45 h(v) = Run of zero 47

48 Bit-Observable Patterns Let s hash values, instead of flipping coins v = 45 Set register to h(v) =

49 Bit-Observable Patterns Let s hash values, instead of flipping coins v = 456 h(v) = Run of zeros 49

50 Bit-Observable Patterns Let s hash values, instead of flipping coins v = 456 Set register to h(v) =

51 Bit-Observable Patterns Let s hash values, instead of flipping coins v = 948 h(v) = Don t update register Run of zeros 5

52 Key Insight: ^(Length of the run) ~= cardinality duration of coin flipping Probability of a run n 5 zeros in a row 0 zeros in a row ~ distinct values ~ distinct values 5

53 What if you get unlucky on first value? v = 98 h(v) = Run of 0 zeros oops :( 5

54 Stochastic Averaging Solution: keep multiple counters v = 98 h(v) =

55 Stochastic Averaging Solution: keep multiple counters v = 98 h(v) = Use first bits as register index Run of 0 zeros 55

56 Stochastic Averaging Solution: keep multiple counters 0 v = 98 h(v) = Set register[0] to 0 56

57 Stochastic Averaging Solution: keep multiple counters 0 v = 748 h(v) = Use first bits as register index Run of zero 57

58 Stochastic Averaging Solution: keep multiple counters v = 748 h(v) = Set register[] to 58

59 Stochastic Averaging 0 ^0=04 ^=4 ^=8 ^= Naive approach: sum = 08 In practice harmonic mean times number of registers works better: 4 * 4 / (/04 + /4 + /8 + /) =8. Gives less weight to outliers 59

60 Other neat attributes Registers are small! bits 60

61 Other neat attributes Unions are lossless! U = Take max of each register 5 6

62 Other neat attributes Which is perfect for distributed environments Merge Node Node Node 4 Node 6

63 In closing Stop worrying and learn to love approximate algorithms 6

64 Thank

Is Elasticsearch the Answer?

High-Performance Big-Data Computation Solution Is Elasticsearch the Answer? Yoav Melamed Navigation The need Optional solutions What is Elasticsearch Not out of the box Shard limitations and capabilities