The dictionary model allows several consecutive symbols, called phrases

Similar documents
COMP 423 lecture 11 Jan. 28, 2008

What are suffix trees?

Fig.25: the Role of LEX

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Algorithm Design (5) Text Search

2 Computing all Intersections of a Set of Segments Line Segment Intersection

Reducing a DFA to a Minimal DFA

Information Retrieval and Organisation

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

The Greedy Method. The Greedy Method

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

COMBINATORIAL PATTERN MATCHING

CS201 Discussion 10 DRAWTREE + TRIES

Definition of Regular Expression

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CS481: Bioinformatics Algorithms

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Intermediate Information Structures

Suffix trees, suffix arrays, BWT

Dr. D.M. Akbar Hussain

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

Presentation Martin Randers

Suffix Tries. Slides adapted from the course by Ben Langmead

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Stack. A list whose end points are pointed by top and bottom

CSCE 531, Spring 2017, Midterm Exam Answer Key

From Dependencies to Evaluation Strategies

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

10.5 Graphing Quadratic Functions

Network Interconnection: Bridging CS 571 Fall Kenneth L. Calvert All rights reserved

CS 241 Week 4 Tutorial Solutions

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

12-B FRACTIONS AND DECIMALS

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

ITEC2620 Introduction to Data Structures

Graphs with at most two trees in a forest building process

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

Lexical analysis, scanners. Construction of a scanner

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Topic 2: Lexing and Flexing

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

MTH 146 Conics Supplement

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

ASTs, Regex, Parsing, and Pretty Printing

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation

Section 3.1: Sequences and Series

Small Business Networking

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

Small Business Networking

UNIT 11. Query Optimization

2014 Haskell January Test Regular Expressions and Finite Automata

OUTPUT DELIVERY SYSTEM

Small Business Networking

Small Business Networking

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

2-3 search trees red-black BSTs B-trees

Lecture T1: Pattern Matching

Ma/CS 6b Class 1: Graph Recap

Lily Yen and Mogens Hansen

Example: 2:1 Multiplexer

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Small Business Networking

Compilers Spring 2013 PRACTICE Midterm Exam

Lexical Analysis: Constructing a Scanner from Regular Expressions

CS 430 Spring Mike Lam, Professor. Parsing

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

CSEP 573 Artificial Intelligence Winter 2016

Slides for Data Mining by I. H. Witten and E. Frank

UT1553B BCRT True Dual-port Memory Interface

Small Business Networking

Engineer To Engineer Note

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

Small Business Networking

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Transcription:

A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion is needed for some k > 2. As we know, the kth order pproximtion pproches the source entropy rte when k. For exmple, for English text, to do second order Mrkov pproximtion, we will need to estimte the proility of ll possile (out 35 3 =42,875, 35 = {-z,(,)...etc} ) triplets, which is imprcticl. Arithmetic codes re inherently dptive, ut it is slow nd works well for inry file. The dictionry-sed methods such s the LZ-fmily of encoders do not use ny sttisticl model, nor do they use vrile size prefix code. Yet, they re universl, dptive, resonly fst nd use modest mount of storge nd computtionl resources. Vrints of LZ lgorithm form the sis of Unix compress, gzip, pkzip, stcker nd for modems operting t more thn 14.4 KBPS. Dictionry Models The dictionry model llows severl consecutive symols, clled phrses stored in dictionry, to e encoded s n ddress in the dictionry. Usully, n dptive model is used where the dictionry is encoded using previously encoded text. As the text is compressed, previously encountered sustrings re dded to the dictionry. Almost ll dptive dictionry models originted from the originl ppers y Ziv nd Lempel which led to severl fmilies of LZ coding techniques. Here we will present couple of those techniques.

LZ77 lgorithms The prior text constitutes the codeook or the dictionry. Rther thn keeping n explicit dictionry, the decoded text up to current time cn e used s dictionry. The figure elow shows the chrcters just decoded nd the decoder is looking t the triplet (5,3,) - numer 5 denotes how fr ck to look into the lredy decoded text strem, numer 3 gives the length of the phrse mtched eginning the first chrcter of yet un-encoded prt of the text nd the chrcter gives the next chrcter from input. This yields to e the next phrse dded. Decoded Output (0,0,) (0,0,) (2,1,) (3,2,) (5,3,) (10,1,) Encoded Output LZ77 Algorithm with Finite Buffer L s 0 1 p W L W Two uffers of finite size W, clled the serch(left) nd the look-hed(right )uffers re connected s shift register. The text to e decoded is shifted in from right to left, initilly plcing W symols in the right uffer nd filling in the left uffer with the first chrcter of the text. The informtion trnsmitted is (p,l,s) nd the uffer is shifted L+1 plces left. Actully, rther thn trnsmitting p, the offset ckwrd in the serch uffer is trnsmitted. The process is repeted until text is fully encoded. L= mximum length of the first sustring from right end of the serch uffer strting t position p tht mtches with sustring in the look-hed uffer eginning t position 1. S= the next symol fter the mtch in the right uffer.

cc cc c cc cc ccc.. Output: (1,1,) Output: (2,1,c) Output: (3,4,) Output: (9,8,c) Text= ccccc The decoding process is quite ovious. Since the first chrcter is not known to the decoder, it is usully ppended with known dummy chrcter greed upon y the encoder nd decoder. Also, note the Pttern eing mtched my spill over to to look-hed s in step 3 ove Red 5.3 nd 5.4 from K. Syood. Pp. 118-133. A forml description of LZ77 with Sliding Window W The min ide of the lgorithm is to use dictionry to store the strings previously encountered. The encoder mintins sliding window W in which the inputs re shifted from right to left. The window is split into two prts: The serch uffer, which is the current dictionry, holding the recently encoded chrcters or symols. The right prt of the window is clled look-hed uffer, contining the text to e encoded. In prcticl implementtion, the size of the serch uffer could e severl thousnd ytes (8k or 16K) wheres the look-hed uffer is very smll (less thn 100 ytes). The encoder serches the serch uffer looking for the longest mtch eginning with the first chrcter in the look-hed uffer. The encoded output is triple (B, l, ch), where B is the distnce trversed ckwrds or the offset in the serch uffer, l is the length of the mtch nd ch is the next chrcter in the look-hed uffer for which the mtch fils. In cse, l=0, B=0, the chrcter ch keeps the encoding process going.

To encode text T [1...N] with sliding window of W chrcters. Algorithm to Encode To Encode Set p 1 /* p points to next chrter in T to e coded */ While there is text remining to e encoded do {Serch for first T[p] in the serch uffer; If T[p] does not pper then {output (0,0,T[p]); p p+1} Else { suppose tht mtches occur t offsets m 1 < m 2 <... < m s with lengths l 1, l 2,... l s. Let l = mx (l 1,l 2... l s ) t offset m mx = m i for some i, 1 i s. If there re more thn one l i with sme vlue of l, tke the vlue of mx closest to the end of the serch uffer. Note, the vlue of p is incremented y n mount l while the pttern mtching opertion tkes plce. Output triple (B= m mx, l, Ch=T[p+1]); Set p p + 2} endwhile /* Assume tht the offsets re mesured in the left direction eginning the lst chrcter of the serch uffer while text is indexed lwys in the positive direction from left to right. */ Set p 1 /*next chrcter of T to e decoded.*/ For ech triple (B, l,ch) input do {If B=l=0 then {T[p]:=ch ; p p+1;} else { T[p,..p+l-1] T[B,B-1,,B- l+1]; T[p+l] ch p p +l+1;} Shift uffer contents left y l+1 plces} In step 2 selecting the lst mtch rther thn the first or second, simplifies the encoder since the lgorithm only hs to keep trck of the lst string mtch detils. But selecting the first mtch (greedy pproch) my mke the vlue of the offsets smller nd hence cn e compressed further using sttisticl coder such s Huffmn (such method y Berhrd is clled LZH).

Note, the string mtching opertion my egin t the serch uffer ut my spill over to the look-hed uffer, which my even mke the length l igger thn the look-hed uffer.... d r r r r r d Serch Buffer Look-Ahed Buffer The LZ77 method hs een improved in the 1980's nd 1990's y severl wys: Use vrile-size Huffmn code for the length (l) nd offset(b) fields. (A fixed formt needs log 2B log 2l its for the serch uffer. its to denote l for the look-hed uffer nd Incresed sizes of the uffer to find longer nd longer mtches. The serch time would increse. A more sophisticted dt structure (TRIE) my improve the serch time. Use circulr queue for the sliding window. In the sliding window, ll the text chrcters hve to e moved left fter ech mtch. A circulr-queue voids this. Exmple:The different stte of 16-chrcter uffer input : sid-estmn-esily (Exmple tken from Dvid Solomn, p.157). s i d - e s t sid - estmn- esi Strt(S) End (E) S E () () In (), 16 yte rry is shown with only 8 ytes occupied, S denoting strt point nd E denoting the end point. In (), ll 16 ytes re occupied lid-estmn-esi lid-estmn-esi ES (c) E S (d) In (c), chrcter s deleted, nd chrcter l inserted. Now, E is locted left of S. In (d), two letters id hve een effectively deleted lthough they re still present in the uffer.

ly--estmn-esi ly-testmn-esi E S E S (e) (f). In (e), two chrcters y- hs een ppended nd pointer E moved. In (f), the pointers show tht the uffer ends t tes nd strts t tmn. Inserting new symols into the circulr queue nd moving the pointers is thus equivlent to shifting the contents of the queue. No ctul shifting or moving is necessry. Elimintes the third element of the triple (ch) y dding n extr flg it. LZSS The improved version is clled LZSS. Uses circulr queue for look-hed uffer, Holds serch uffers (the dictionry) in inry serch tree, nd It cretes tokens with only 2 fields. Exmple: "sid-estmn-clumsily-teses-se-sick-sels" sid-estmn-clum sily-... Temporry Serch Buffer(16) Look-Ahed Buffer(5) The encoder scns the serch uffer creting 12 5-chrcter strings ( size of the look-hed uffer), which re stored in RAM long with inry serch tree, ech node with its offset.

15,id-e 11,stm 16,sid-e 13,-est 14,d-est 8, mn-c 10, stmn 5,-clum 7,n-cl 12,estm 6,n-clu 9,tmn- sid-e 16 id-e 15 d-est 14 -est 13 estm 12 stme 11 stmn 10 tmn- 9 mn-c 8 n-cl 7 n-clu 6 -clum 5 The first symol in the Look-Ahed uffer is 's'. Two words re found t offset 16 nd 10 of which 16 leds to longer mtch 'si' of length 2. The encoder emits (16,2). The next window is sid-estmn-clumsily-te... The tree is updted y deleting 'sid-e' nd 'id-e' nd inserting two new strings 'clums' nd 'lumsi'. Note, the words deleted re lwys from the top ddresses in RAM, nd the words dded re from the ottom of the RAM. This sttement is true in generl if there is longer k-letter mtch. The window hs to e shifted k positions. A simple procedure to updte the tree is to tke the first 5 letter word in the serch uffer, find it in the tree, delete it, slide the uffer y one position to right, prepre string consisting of the lst 5 letters in the serch uffer nd dd this to the tree. This hs to e updted k times. If the tree ecomes unlnced fter severl insertion nd deletion, AVL-tree cn e used. Note the numer of nodes in the tree remins constnt. The token creted hs only two elements if no mtch is found; the chrcter is trnsmitted without ny chnge with flg. The flgs could e collected in 1 yte nd 8 tokens could e trnsmitted together. Typicl size of serch uffer is 2 to 8 Kytes nd look-hed uffer 32 ytes.

LZ78 (Lempel-Ziv-78) One of the mjor drwcks of LZ77 is tht there is n implicit ssumption tht like ptterns occur close together so tht they cn e found during string mtching opertion. If the like ptterns re seprted y gps longer thn the uffer size, LZ77 will not compress t ll. An extreme exmple is : cdefcedfcdef Serch Buffer Look-Ahed Buffer There will e no string mtch nd ech chrcter will e sent with flg, leding to expnsion rther thn compression. For nother exmple, sy the word "economy" occurs mny times in the text ut they occur sufficiently fr wy so tht it will never e compressed. A etter strtegy will e to store the common occurring strings in dictionry rther thn letting them slide wy. It mens it does not hve window to limit how fr ck the sustrings cn e referenced. This is the sic principle of LZ78, which uilds up the dictionry of common phrses. The decoder performs identicl opertion creting the sme dictionry dynmiclly nd in sync. The output is sequence of tokens consisting of two items <i, c>, i = pointer ddress to the dictionry nd 'c' is the next chrcter. LZ78 Algorithm The fmily of LZ lgorithms use n dptive dictionry sed scheme to compress text strings. The sic ide is to replce sustring of the text with pointer (initilly 0) in tle (codeook or dictionry) where tht sustring occurred previously. S String lredy prsed Longest sustring lredy in tle t loction j New Symol S Trnsmit (j,s) nd repet process eginning the next symol fter S. Enter t current pointer +1 loction the longest sustring conctented with with S. Initilize j=0.

Exmple Messge : cccc_ddddd_e Pointer Longest Sustring 1 2 _ 3 4 5 _ 6 c 7 cc 8 c_ 9 d 10 dd 11 dd_ 12 e Trnsmitted Informtion (j,s) 0, 1,_ 0, 3, 0,_ 0,c 6,c 6,_ 0,d 9,d 10,_ 0,e The decoder cn uild n identicl tle t the receiving end. The LZ78 cn e looked upon s prsing of the input strings s phrses, which re entered in the sttic dictionry. Thus, the string is prsed into phrses,,,, nd entered into phrse dictionry s Phrse # Phrse Output Token 1 (0,) 2 (0,) 3 (1,) 4 (2,) 5 (4,) where phrse numer 0 stnds for null phrse. Using tle to store the phrses is not very storge efficient. A more efficient method is to use dt structure clled TRIE (or digitl serch tree) s shown elow. The chrcter of ech phrse specifies pth from the root of the TRIE to the node tht contins the numer of phrse. The chrcters to e encoded re used to trverse the TRIE until the pth is locked either ecuse there is no onwrd pth for indicted chrcter or lef node is reched. The node t which lock occur gives the phrse numer for output. The chrcter is ppended to the output nd new node is creted corresponding to new phrse in the codeook or dictionry.

1 3 0 2 4 If the input lphet is lrge, the TRIE my hve severl pointers emnting from ech node which gives rise to the prolem of llocting enough storge t the eginning of ech node for ll possile future pointers. A linked list dt structure to represent sprse pointer rry my do etter jo. A fster nd simpler method is to use hsh tle in which the current node numer nd the next input chrcter re hshed to determine where the next node cn e found. 5 6 The TRIE dt structure continues to grow s coding proceeds nd eventully it my ecome too lrge. Severl strtegies cn e used when memory is full. The TRIE is removed nd the process is initilized gin. Stop ny further updtes t the cost of less compression. Prtilly reuild it using only the lst few hundred ytes of coded text so tht some knowledge from prior dpttion is retined. Encoding for LZ78 is fster thn LZ77 ut decoding is slower since the decoder must store the prsed phrses. One vrint of the LZ78 scheme, clled LZW hs een used widely in compression systems. LZW (Lempel-Ziv-Welch Algorithm) T The min difference etween LZW nd LZ78 is tht the encoding consists of string of phrse numers nd the 0 explicit next chrcter re not prt of the output. This is done y initilizing the dictionry or the TRIE with ll letters of the lphet. 1 4 2 5 7 8 c c 3 6 Exmple 1 cc. The dictionry D is initilized with three nodes 1, 2 nd 3 corresponding to the lphet A=(,, c). Encoding is in D, not in D, dd 4,output 1 is in D, c not in D, dd 5,output 2 c is in D, c not in D, dd 6,output 3 is in D, not in D, dd 7,output 4 9

c is in D, c not in D, dd 8,output 5 is in D, not in D, dd 9,output 6 is in D, output 1 Prsing: c c Encoder output: 1234571 The decoder does the reverse opertion. It strts with initil dictionry D nd keeps dding new no s it receive the node sequences from the encoder. Decode 1234571 1 output is in D 2 output not in D dd 4 3 output c c not in D dd 5 4 output c not in D, dd 6 5 output c not in D dd 7 7 output c not in D dd 8 1 output is in D dd 9 Note how it is creting new node. Immeditely, fter putting the output, it cretes string : lst phrse conctented with the first chrcter of the current phrse. If this is not in the dictionry, it cretes new node with the next ville numer.

0 c 1 2 3 c 4 5 6 7 8 Exmple 2 T = 9 1 2 3 5 4 6 7 8 9 Note the encoder hs used the phrse 9 immeditely fter it hs een constructed. The finl output of the encoder is: 12133469

Decoding The decoding will proceed smoothly till numer 6 producing output. nd creting phrses upto 8 in the dictionry, ut does not know wht phrse 9 is! Fortuntely, the decoder knows the eginning of new phrse it is x where x is unknown yet. If we now conctente the lst phrse with this new phrse, the text should look like:. x. But the phrse is not in the dictionry so phrse 9 should hve een, which mens tht the chrcter x is. Thus phrse 9 must e nd decoding will proceed. Whenever phrse is referenced s soon s the encoder hs creted it, the lst chrcter of the phrse must e sme s the first chrcter. Despite this little prolem in decoding, LZW works well giving good compression nd efficient implementtion.the following description of the lgorithm is sed on the description in WMB [1990]. Note ++ mens conctention Encoding Algorithm 1 Set p=1 /* p, n index to text T[1 N].*/ 2 For d = 0 to q-1 do D[d] = d /* D is the TRIE nd ssume lphet, A=(0,1,2,..,q-1) is represented y numers which lso denote the first q nodes or phrse numers. */ 3 D = q-1 /* D points to lst entry in the dictionry. The next node numer strts t q */ 4 While input strem not exhusted do 4.1 Trce TRIE D to find the lrgest mtch eginning T[p]. Suppose, the mtch terminte t phrse numer c nd the length of the mtch is k 4.2 output code c 4.3 d = d+1 /* Add new entry to TRIE. */ 4.4 p=p+k 4.4 Set D[d] = D[c]++T[p] /* Crete new phrse y connecting the lst phrse with first chrcter of next phrse. */

LZW Algorithm This lgorithm elimintes the need to trnsmit the next chrcter s in the LZ78 lgorithm.the dictionry is initilized to contin ll chrcters in the lphet. New phrses re dded to the dictionry y ppending the first chrcter of next phrses. The lgorithm is est descried y using trie dt structure to represent ll distinct phrses in the dictionry. The lgorithm is illustrted elow. c c 4 5 6 7 8 9 Trie Alphet = (,,c) Text = cc Trnsmitted messge = 1234571 Text=cc c Text =.. c 4.. c.. c c 4 5 c c 4 5 6 c c 4 5 6 7 8 9..c.........c.... Finl Trie nd its Height Blnced Binry Tree c c 4 5 6 7 1 0 1 4 0 1 7 0 1 0 1 c c 4 5 6 7 8 0 1 0 1 0 0 1 Trnsmitted Code= 1234571= 001001100101010011000 9 2 5 8 3 1 6

Decoding Algorithm Setp1,2,3 re sme s in encoding setting up the initil TRIE or dictionry. Let the code sequence e S=c 1 c 2 c k Step 4: Decode c 1 - output D(c 1 ) Step 5: for j=2 to k do egin If c j is in D, then { output D(c j ),Crete new_phrse y conctenting c j - 1 with the first chrcter of c j if this phrse is not in D ; } else { new_phrse = D(c j -1)++F(c j -1); Output new_phrse } /*F(c j -1) is the first chrcter of the lst phrse decoded.*/ d=d+1; D(d) = new_phrse /*Enter new phrse numer in D*/. end LZW hs een fine-tuned nd hs severl vrints. The Unix compress is one such vrint. Compress uses vrile-length code to represent the phrse numer nd puts mximum limit to the size of the phrse numer. If fterwrds the compression performnce degrdes, the dictionry is re-uilt from scrtch.