The dictionary model allows several consecutive symbols, called phrases
|
|
- Tyler Barrett
- 6 years ago
- Views:
Transcription
1 A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion is needed for some k > 2. As we know, the kth order pproximtion pproches the source entropy rte when k. For exmple, for English text, to do second order Mrkov pproximtion, we will need to estimte the proility of ll possile (out 35 3 =42,875, 35 = {-z,(,)...etc} ) triplets, which is imprcticl. Arithmetic codes re inherently dptive, ut it is slow nd works well for inry file. The dictionry-sed methods such s the LZ-fmily of encoders do not use ny sttisticl model, nor do they use vrile size prefix code. Yet, they re universl, dptive, resonly fst nd use modest mount of storge nd computtionl resources. Vrints of LZ lgorithm form the sis of Unix compress, gzip, pkzip, stcker nd for modems operting t more thn 14.4 KBPS. Dictionry Models The dictionry model llows severl consecutive symols, clled phrses stored in dictionry, to e encoded s n ddress in the dictionry. Usully, n dptive model is used where the dictionry is encoded using previously encoded text. As the text is compressed, previously encountered sustrings re dded to the dictionry. Almost ll dptive dictionry models originted from the originl ppers y Ziv nd Lempel which led to severl fmilies of LZ coding techniques. Here we will present couple of those techniques.
2 LZ77 lgorithms The prior text constitutes the codeook or the dictionry. Rther thn keeping n explicit dictionry, the decoded text up to current time cn e used s dictionry. The figure elow shows the chrcters just decoded nd the decoder is looking t the triplet (5,3,) - numer 5 denotes how fr ck to look into the lredy decoded text strem, numer 3 gives the length of the phrse mtched eginning the first chrcter of yet un-encoded prt of the text nd the chrcter gives the next chrcter from input. This yields to e the next phrse dded. Decoded Output (0,0,) (0,0,) (2,1,) (3,2,) (5,3,) (10,1,) Encoded Output LZ77 Algorithm with Finite Buffer L s 0 1 p W L W Two uffers of finite size W, clled the serch(left) nd the look-hed(right )uffers re connected s shift register. The text to e decoded is shifted in from right to left, initilly plcing W symols in the right uffer nd filling in the left uffer with the first chrcter of the text. The informtion trnsmitted is (p,l,s) nd the uffer is shifted L+1 plces left. Actully, rther thn trnsmitting p, the offset ckwrd in the serch uffer is trnsmitted. The process is repeted until text is fully encoded. L= mximum length of the first sustring from right end of the serch uffer strting t position p tht mtches with sustring in the look-hed uffer eginning t position 1. S= the next symol fter the mtch in the right uffer.
3 cc cc c cc cc ccc.. Output: (1,1,) Output: (2,1,c) Output: (3,4,) Output: (9,8,c) Text= ccccc The decoding process is quite ovious. Since the first chrcter is not known to the decoder, it is usully ppended with known dummy chrcter greed upon y the encoder nd decoder. Also, note the Pttern eing mtched my spill over to to look-hed s in step 3 ove Red 5.3 nd 5.4 from K. Syood. Pp A forml description of LZ77 with Sliding Window W The min ide of the lgorithm is to use dictionry to store the strings previously encountered. The encoder mintins sliding window W in which the inputs re shifted from right to left. The window is split into two prts: The serch uffer, which is the current dictionry, holding the recently encoded chrcters or symols. The right prt of the window is clled look-hed uffer, contining the text to e encoded. In prcticl implementtion, the size of the serch uffer could e severl thousnd ytes (8k or 16K) wheres the look-hed uffer is very smll (less thn 100 ytes). The encoder serches the serch uffer looking for the longest mtch eginning with the first chrcter in the look-hed uffer. The encoded output is triple (B, l, ch), where B is the distnce trversed ckwrds or the offset in the serch uffer, l is the length of the mtch nd ch is the next chrcter in the look-hed uffer for which the mtch fils. In cse, l=0, B=0, the chrcter ch keeps the encoding process going.
4 To encode text T [1...N] with sliding window of W chrcters. Algorithm to Encode To Encode Set p 1 /* p points to next chrter in T to e coded */ While there is text remining to e encoded do {Serch for first T[p] in the serch uffer; If T[p] does not pper then {output (0,0,T[p]); p p+1} Else { suppose tht mtches occur t offsets m 1 < m 2 <... < m s with lengths l 1, l 2,... l s. Let l = mx (l 1,l 2... l s ) t offset m mx = m i for some i, 1 i s. If there re more thn one l i with sme vlue of l, tke the vlue of mx closest to the end of the serch uffer. Note, the vlue of p is incremented y n mount l while the pttern mtching opertion tkes plce. Output triple (B= m mx, l, Ch=T[p+1]); Set p p + 2} endwhile /* Assume tht the offsets re mesured in the left direction eginning the lst chrcter of the serch uffer while text is indexed lwys in the positive direction from left to right. */ Set p 1 /*next chrcter of T to e decoded.*/ For ech triple (B, l,ch) input do {If B=l=0 then {T[p]:=ch ; p p+1;} else { T[p,..p+l-1] T[B,B-1,,B- l+1]; T[p+l] ch p p +l+1;} Shift uffer contents left y l+1 plces} In step 2 selecting the lst mtch rther thn the first or second, simplifies the encoder since the lgorithm only hs to keep trck of the lst string mtch detils. But selecting the first mtch (greedy pproch) my mke the vlue of the offsets smller nd hence cn e compressed further using sttisticl coder such s Huffmn (such method y Berhrd is clled LZH).
5 Note, the string mtching opertion my egin t the serch uffer ut my spill over to the look-hed uffer, which my even mke the length l igger thn the look-hed uffer.... d r r r r r d Serch Buffer Look-Ahed Buffer The LZ77 method hs een improved in the 1980's nd 1990's y severl wys: Use vrile-size Huffmn code for the length (l) nd offset(b) fields. (A fixed formt needs log 2B log 2l its for the serch uffer. its to denote l for the look-hed uffer nd Incresed sizes of the uffer to find longer nd longer mtches. The serch time would increse. A more sophisticted dt structure (TRIE) my improve the serch time. Use circulr queue for the sliding window. In the sliding window, ll the text chrcters hve to e moved left fter ech mtch. A circulr-queue voids this. Exmple:The different stte of 16-chrcter uffer input : sid-estmn-esily (Exmple tken from Dvid Solomn, p.157). s i d - e s t sid - estmn- esi Strt(S) End (E) S E () () In (), 16 yte rry is shown with only 8 ytes occupied, S denoting strt point nd E denoting the end point. In (), ll 16 ytes re occupied lid-estmn-esi lid-estmn-esi ES (c) E S (d) In (c), chrcter s deleted, nd chrcter l inserted. Now, E is locted left of S. In (d), two letters id hve een effectively deleted lthough they re still present in the uffer.
6 ly--estmn-esi ly-testmn-esi E S E S (e) (f). In (e), two chrcters y- hs een ppended nd pointer E moved. In (f), the pointers show tht the uffer ends t tes nd strts t tmn. Inserting new symols into the circulr queue nd moving the pointers is thus equivlent to shifting the contents of the queue. No ctul shifting or moving is necessry. Elimintes the third element of the triple (ch) y dding n extr flg it. LZSS The improved version is clled LZSS. Uses circulr queue for look-hed uffer, Holds serch uffers (the dictionry) in inry serch tree, nd It cretes tokens with only 2 fields. Exmple: "sid-estmn-clumsily-teses-se-sick-sels" sid-estmn-clum sily-... Temporry Serch Buffer(16) Look-Ahed Buffer(5) The encoder scns the serch uffer creting 12 5-chrcter strings ( size of the look-hed uffer), which re stored in RAM long with inry serch tree, ech node with its offset.
7 15,id-e 11,stm 16,sid-e 13,-est 14,d-est 8, mn-c 10, stmn 5,-clum 7,n-cl 12,estm 6,n-clu 9,tmn- sid-e 16 id-e 15 d-est 14 -est 13 estm 12 stme 11 stmn 10 tmn- 9 mn-c 8 n-cl 7 n-clu 6 -clum 5 The first symol in the Look-Ahed uffer is 's'. Two words re found t offset 16 nd 10 of which 16 leds to longer mtch 'si' of length 2. The encoder emits (16,2). The next window is sid-estmn-clumsily-te... The tree is updted y deleting 'sid-e' nd 'id-e' nd inserting two new strings 'clums' nd 'lumsi'. Note, the words deleted re lwys from the top ddresses in RAM, nd the words dded re from the ottom of the RAM. This sttement is true in generl if there is longer k-letter mtch. The window hs to e shifted k positions. A simple procedure to updte the tree is to tke the first 5 letter word in the serch uffer, find it in the tree, delete it, slide the uffer y one position to right, prepre string consisting of the lst 5 letters in the serch uffer nd dd this to the tree. This hs to e updted k times. If the tree ecomes unlnced fter severl insertion nd deletion, AVL-tree cn e used. Note the numer of nodes in the tree remins constnt. The token creted hs only two elements if no mtch is found; the chrcter is trnsmitted without ny chnge with flg. The flgs could e collected in 1 yte nd 8 tokens could e trnsmitted together. Typicl size of serch uffer is 2 to 8 Kytes nd look-hed uffer 32 ytes.
8 LZ78 (Lempel-Ziv-78) One of the mjor drwcks of LZ77 is tht there is n implicit ssumption tht like ptterns occur close together so tht they cn e found during string mtching opertion. If the like ptterns re seprted y gps longer thn the uffer size, LZ77 will not compress t ll. An extreme exmple is : cdefcedfcdef Serch Buffer Look-Ahed Buffer There will e no string mtch nd ech chrcter will e sent with flg, leding to expnsion rther thn compression. For nother exmple, sy the word "economy" occurs mny times in the text ut they occur sufficiently fr wy so tht it will never e compressed. A etter strtegy will e to store the common occurring strings in dictionry rther thn letting them slide wy. It mens it does not hve window to limit how fr ck the sustrings cn e referenced. This is the sic principle of LZ78, which uilds up the dictionry of common phrses. The decoder performs identicl opertion creting the sme dictionry dynmiclly nd in sync. The output is sequence of tokens consisting of two items <i, c>, i = pointer ddress to the dictionry nd 'c' is the next chrcter. LZ78 Algorithm The fmily of LZ lgorithms use n dptive dictionry sed scheme to compress text strings. The sic ide is to replce sustring of the text with pointer (initilly 0) in tle (codeook or dictionry) where tht sustring occurred previously. S String lredy prsed Longest sustring lredy in tle t loction j New Symol S Trnsmit (j,s) nd repet process eginning the next symol fter S. Enter t current pointer +1 loction the longest sustring conctented with with S. Initilize j=0.
9 Exmple Messge : cccc_ddddd_e Pointer Longest Sustring 1 2 _ _ 6 c 7 cc 8 c_ 9 d 10 dd 11 dd_ 12 e Trnsmitted Informtion (j,s) 0, 1,_ 0, 3, 0,_ 0,c 6,c 6,_ 0,d 9,d 10,_ 0,e The decoder cn uild n identicl tle t the receiving end. The LZ78 cn e looked upon s prsing of the input strings s phrses, which re entered in the sttic dictionry. Thus, the string is prsed into phrses,,,, nd entered into phrse dictionry s Phrse # Phrse Output Token 1 (0,) 2 (0,) 3 (1,) 4 (2,) 5 (4,) where phrse numer 0 stnds for null phrse. Using tle to store the phrses is not very storge efficient. A more efficient method is to use dt structure clled TRIE (or digitl serch tree) s shown elow. The chrcter of ech phrse specifies pth from the root of the TRIE to the node tht contins the numer of phrse. The chrcters to e encoded re used to trverse the TRIE until the pth is locked either ecuse there is no onwrd pth for indicted chrcter or lef node is reched. The node t which lock occur gives the phrse numer for output. The chrcter is ppended to the output nd new node is creted corresponding to new phrse in the codeook or dictionry.
10 If the input lphet is lrge, the TRIE my hve severl pointers emnting from ech node which gives rise to the prolem of llocting enough storge t the eginning of ech node for ll possile future pointers. A linked list dt structure to represent sprse pointer rry my do etter jo. A fster nd simpler method is to use hsh tle in which the current node numer nd the next input chrcter re hshed to determine where the next node cn e found. 5 6 The TRIE dt structure continues to grow s coding proceeds nd eventully it my ecome too lrge. Severl strtegies cn e used when memory is full. The TRIE is removed nd the process is initilized gin. Stop ny further updtes t the cost of less compression. Prtilly reuild it using only the lst few hundred ytes of coded text so tht some knowledge from prior dpttion is retined. Encoding for LZ78 is fster thn LZ77 ut decoding is slower since the decoder must store the prsed phrses. One vrint of the LZ78 scheme, clled LZW hs een used widely in compression systems. LZW (Lempel-Ziv-Welch Algorithm) T The min difference etween LZW nd LZ78 is tht the encoding consists of string of phrse numers nd the 0 explicit next chrcter re not prt of the output. This is done y initilizing the dictionry or the TRIE with ll letters of the lphet c c 3 6 Exmple 1 cc. The dictionry D is initilized with three nodes 1, 2 nd 3 corresponding to the lphet A=(,, c). Encoding is in D, not in D, dd 4,output 1 is in D, c not in D, dd 5,output 2 c is in D, c not in D, dd 6,output 3 is in D, not in D, dd 7,output 4 9
11 c is in D, c not in D, dd 8,output 5 is in D, not in D, dd 9,output 6 is in D, output 1 Prsing: c c Encoder output: The decoder does the reverse opertion. It strts with initil dictionry D nd keeps dding new no s it receive the node sequences from the encoder. Decode output is in D 2 output not in D dd 4 3 output c c not in D dd 5 4 output c not in D, dd 6 5 output c not in D dd 7 7 output c not in D dd 8 1 output is in D dd 9 Note how it is creting new node. Immeditely, fter putting the output, it cretes string : lst phrse conctented with the first chrcter of the current phrse. If this is not in the dictionry, it cretes new node with the next ville numer.
12 0 c c Exmple 2 T = Note the encoder hs used the phrse 9 immeditely fter it hs een constructed. The finl output of the encoder is:
13 Decoding The decoding will proceed smoothly till numer 6 producing output. nd creting phrses upto 8 in the dictionry, ut does not know wht phrse 9 is! Fortuntely, the decoder knows the eginning of new phrse it is x where x is unknown yet. If we now conctente the lst phrse with this new phrse, the text should look like:. x. But the phrse is not in the dictionry so phrse 9 should hve een, which mens tht the chrcter x is. Thus phrse 9 must e nd decoding will proceed. Whenever phrse is referenced s soon s the encoder hs creted it, the lst chrcter of the phrse must e sme s the first chrcter. Despite this little prolem in decoding, LZW works well giving good compression nd efficient implementtion.the following description of the lgorithm is sed on the description in WMB [1990]. Note ++ mens conctention Encoding Algorithm 1 Set p=1 /* p, n index to text T[1 N].*/ 2 For d = 0 to q-1 do D[d] = d /* D is the TRIE nd ssume lphet, A=(0,1,2,..,q-1) is represented y numers which lso denote the first q nodes or phrse numers. */ 3 D = q-1 /* D points to lst entry in the dictionry. The next node numer strts t q */ 4 While input strem not exhusted do 4.1 Trce TRIE D to find the lrgest mtch eginning T[p]. Suppose, the mtch terminte t phrse numer c nd the length of the mtch is k 4.2 output code c 4.3 d = d+1 /* Add new entry to TRIE. */ 4.4 p=p+k 4.4 Set D[d] = D[c]++T[p] /* Crete new phrse y connecting the lst phrse with first chrcter of next phrse. */
14 LZW Algorithm This lgorithm elimintes the need to trnsmit the next chrcter s in the LZ78 lgorithm.the dictionry is initilized to contin ll chrcters in the lphet. New phrses re dded to the dictionry y ppending the first chrcter of next phrses. The lgorithm is est descried y using trie dt structure to represent ll distinct phrses in the dictionry. The lgorithm is illustrted elow. c c Trie Alphet = (,,c) Text = cc Trnsmitted messge = Text=cc c Text =.. c 4.. c.. c c 4 5 c c c c c c.... Finl Trie nd its Height Blnced Binry Tree c c c c Trnsmitted Code= =
15 Decoding Algorithm Setp1,2,3 re sme s in encoding setting up the initil TRIE or dictionry. Let the code sequence e S=c 1 c 2 c k Step 4: Decode c 1 - output D(c 1 ) Step 5: for j=2 to k do egin If c j is in D, then { output D(c j ),Crete new_phrse y conctenting c j - 1 with the first chrcter of c j if this phrse is not in D ; } else { new_phrse = D(c j -1)++F(c j -1); Output new_phrse } /*F(c j -1) is the first chrcter of the lst phrse decoded.*/ d=d+1; D(d) = new_phrse /*Enter new phrse numer in D*/. end LZW hs een fine-tuned nd hs severl vrints. The Unix compress is one such vrint. Compress uses vrile-length code to represent the phrse numer nd puts mximum limit to the size of the phrse numer. If fterwrds the compression performnce degrdes, the dictionry is re-uilt from scrtch.
COMP 423 lecture 11 Jan. 28, 2008
COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring
More informationWhat are suffix trees?
Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl
More informationFig.25: the Role of LEX
The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing
More informationCompression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv
Compression Outline 15-853:Algorithms in the Rel World Dt Compression III Introduction: Lossy vs. Lossless, Benchmrks, Informtion Theory: Entropy, etc. Proility Coding: Huffmn + Arithmetic Coding Applictions
More informationTries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries
Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer
More informationIn the last lecture, we discussed how valid tokens may be specified by regular expressions.
LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.
More informationAlgorithm Design (5) Text Search
Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:
More information2 Computing all Intersections of a Set of Segments Line Segment Intersection
15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design
More informationReducing a DFA to a Minimal DFA
Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,
More informationInformation Retrieval and Organisation
Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d
More informationCS321 Languages and Compiler Design I. Winter 2012 Lecture 5
CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,
More informationCSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011
CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the
More informationThe Greedy Method. The Greedy Method
Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm
More informationIf you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.
Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online
More informationCOMBINATORIAL PATTERN MATCHING
COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized
More informationCS201 Discussion 10 DRAWTREE + TRIES
CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the
More informationDefinition of Regular Expression
Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll
More informationCSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe
CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()
More informationCS481: Bioinformatics Algorithms
CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in
More informationPosition Heaps: A Simple and Dynamic Text Indexing Data Structure
Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,
More informationOutline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST
Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is
More informationIntermediate Information Structures
CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t
More informationSuffix trees, suffix arrays, BWT
ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time
More informationDr. D.M. Akbar Hussain
Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence
More informationCS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis
CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl
More informationPresentation Martin Randers
Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes
More informationSuffix Tries. Slides adapted from the course by Ben Langmead
Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes
More informationLecture 10 Evolutionary Computation: Evolution strategies and genetic programming
Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting
More informationApplied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016
Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore
More informationStack. A list whose end points are pointed by top and bottom
4. Stck Stck A list whose end points re pointed by top nd bottom Insertion nd deletion tke plce t the top (cf: Wht is the difference between Stck nd Arry?) Bottom is constnt, but top grows nd shrinks!
More informationCSCE 531, Spring 2017, Midterm Exam Answer Key
CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (
More informationFrom Dependencies to Evaluation Strategies
From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute
More informationCS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata
CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;
More informationCSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona
CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded
More informationLanguages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *
Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte
More informationLexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay
Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input
More informationA Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards
A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin
More informationImplementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona
Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this
More informationLR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table
TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph
More informationSystems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits
Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion
More informationAlignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey
Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment
More information10.5 Graphing Quadratic Functions
0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions
More informationNetwork Interconnection: Bridging CS 571 Fall Kenneth L. Calvert All rights reserved
Network Interconnection: Bridging CS 57 Fll 6 6 Kenneth L. Clvert All rights reserved The Prolem We know how to uild (rodcst) LANs Wnt to connect severl LANs together to overcome scling limits Recll: speed
More informationCS 241 Week 4 Tutorial Solutions
CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it
More informationEECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining
EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more
More information12-B FRACTIONS AND DECIMALS
-B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn
More informationCS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08
CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008
More informationITEC2620 Introduction to Data Structures
ITEC0 Introduction to Dt Structures Lecture 7 Queues, Priority Queues Queues I A queue is First-In, First-Out = FIFO uffer e.g. line-ups People enter from the ck of the line People re served (exit) from
More informationGraphs with at most two trees in a forest building process
Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,
More informationGeorge Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables
George Boole IT 3123 Hrdwre nd Softwre Concepts My 28 Digitl Logic The Little Mn Computer 1815 1864 British mthemticin nd philosopher Mny contriutions to mthemtics. Boolen lger: n lger over finite sets
More informationLexical analysis, scanners. Construction of a scanner
Lexicl nlysis scnners (NB. Pges 4-5 re for those who need to refresh their knowledge of DFAs nd NFAs. These re not presented during the lectures) Construction of scnner Tools: stte utomt nd trnsition digrms.
More informationSolving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016
Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl
More informationDeterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1
Deterministic Finite Automt And Regulr Lnguges Fll 2018 Costs Busch - RPI 1 Deterministic Finite Automton (DFA) Input Tpe String Finite Automton Output Accept or Reject Fll 2018 Costs Busch - RPI 2 Trnsition
More informationToday. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.
CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke
More informationTopic 2: Lexing and Flexing
Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of
More informationFinite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015
Finite Automt Lecture 4 Sections 3.6-3.7 Ro T. Koether Hmpden-Sydney College Wed, Jn 21, 2015 Ro T. Koether (Hmpden-Sydney College) Finite Automt Wed, Jn 21, 2015 1 / 23 1 Nondeterministic Finite Automt
More informationMTH 146 Conics Supplement
105- Review of Conics MTH 146 Conics Supplement In this section we review conics If ou ne more detils thn re present in the notes, r through section 105 of the ook Definition: A prol is the set of points
More informationΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών
ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop
More informationStack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures
Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions
More informationA Sparse Grid Representation for Dynamic Three-Dimensional Worlds
A Sprse Grid Representtion for Dynmic Three-Dimensionl Worlds Nthn R. Sturtevnt Deprtment of Computer Science University of Denver Denver, CO, 80208 sturtevnt@cs.du.edu Astrct Grid representtions offer
More informationEfficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism
Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw
More informationAI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley
AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility
More informationthis grammar generates the following language: Because this symbol will also be used in a later step, it receives the
LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.
More informationASTs, Regex, Parsing, and Pretty Printing
ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,
More informationAllocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation
Alloctor Bsics Dynmic Memory Alloction in the Hep (mlloc nd free) Pges too corse-grined for llocting individul objects. Insted: flexible-sized, word-ligned blocks. Allocted block (4 words) Free block (3
More informationSection 3.1: Sequences and Series
Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one
More informationSmall Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology
More informationBefore We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):
Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters
More informationCS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig
CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of
More informationSmall Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology
More informationUNIT 11. Query Optimization
UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,
More information2014 Haskell January Test Regular Expressions and Finite Automata
0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded
More informationOUTPUT DELIVERY SYSTEM
Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking
More informationSmall Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology
More informationSmall Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology
More informationAnnouncements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem
Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil
More informationScanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an
Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,
More information2-3 search trees red-black BSTs B-trees
2-3 serch trees red-lck BTs B-trees 3 2-3 tree llow 1 or 2 keys per node. 2-node: one key, two children. 3-node: two keys, three children. ymmetric order. Inorder trversl yields keys in scending order.
More informationLecture T1: Pattern Matching
Introduction to Theoreticl CS Lecture T: Pttern Mtchin Two fundmentl questions. Wht cn computer do? Wht cn computer do with limited resources? Generl pproch. Don t tlk out specific mchines or prolems.
More informationMa/CS 6b Class 1: Graph Recap
M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/
More informationLily Yen and Mogens Hansen
SKOLID / SKOLID No. 8 Lily Yen nd Mogens Hnsen Skolid hs joined Mthemticl Myhem which is eing reformtted s stnd-lone mthemtics journl for high school students. Solutions to prolems tht ppered in the lst
More informationExample: 2:1 Multiplexer
Exmple: 2:1 Multiplexer Exmple #1 reg ; lwys @( or or s) egin if (s == 1') egin = ; else egin = ; 1 s B. Bs 114 Exmple: 2:1 Multiplexer Exmple #2 Normlly lwys include egin nd sttements even though they
More informationWhat do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers
Wht do ll those bits men now? bits (...) Number Systems nd Arithmetic or Computers go to elementry school instruction R-formt I-formt... integer dt number text chrs... floting point signed unsigned single
More informationFall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.
15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or
More informationSmall Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology
More informationCompilers Spring 2013 PRACTICE Midterm Exam
Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs
More informationLexical Analysis: Constructing a Scanner from Regular Expressions
Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction
More informationCS 430 Spring Mike Lam, Professor. Parsing
CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie
More informationSolving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence
Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component
More informationCSEP 573 Artificial Intelligence Winter 2016
CSEP 573 Artificil Intelligence Winter 2016 Luke Zettlemoyer Problem Spces nd Serch slides from Dn Klein, Sturt Russell, Andrew Moore, Dn Weld, Pieter Abbeel, Ali Frhdi Outline Agents tht Pln Ahed Serch
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully
More informationUT1553B BCRT True Dual-port Memory Interface
UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl
More informationSmall Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology
More informationEngineer To Engineer Note
Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit
More informationQuiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex
Long Quiz2 45mins Nme: Personl Numer: Prolem. (20pts) Here is n Tle of Perl Regulr Ex Chrcter Description. single chrcter \s whitespce chrcter (spce, t, newline) \S non-whitespce chrcter \d digit (0-9)
More informationAn Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure
, Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured
More informationSmall Business Networking
Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology
More informationP(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have
Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using
More informationDynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012
Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.
More informationUnit #9 : Definite Integral Properties, Fundamental Theorem of Calculus
Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl
More information