Expert Systems with Applications

Size: px
Start display at page:

Download "Expert Systems with Applications"

Transcription

1 Expert Systems with Applictions 36 (2009) Contents lists ville t ScienceDirect Expert Systems with Applictions journl homepge: A new document representtion using term frequency nd vectorized grph connectionists with ppliction to document retrievl Tommy W.S. Chow *, Hijun Zhng, M.K.M. Rhmn Deprtment of Electronic Engineering, City University of Hong Kong, 83 Tt Chee Avenue, Kowloon, Hong Kong rticle info strct Keywords: Grph representtion Multiple fetures Document retrievl Self-orgnizing mp This pper presents new document representtion with vectorized multiple fetures including term frequency nd term-connection-frequency. A document is represented y undirected nd directed grph, respectively. Then terms nd vectorized grph connectionists re extrcted from the grphs y employing severl feture extrction methods. This hyrid document feture representtion more ccurtely reflects the underlying semntics tht re difficult to chieve from the currently used term histogrms, nd it fcilittes the mtching of complex grph. In ppliction level, we develop document retrievl system sed on self-orgnizing mp (SOM) to speed up the retrievl process. We perform extensive experimentl verifiction, nd the results suggest tht the proposed method is computtionlly efficient nd ccurte for document retrievl. Ó 2009 Elsevier Ltd. All rights reserved.. Introduction Internet ccess, such s World Wide We (WWW), hs mde document retrievl incresingly demnding s collection nd serching of documents hs ecome n integrl prt of mny people s lives. Accurcy nd speed re two key mesurements of effective retrievl methodologies. Existing document retrievl systems use sttisticl methods nd nturl lnguge processing (NLP) pproches comined with different document representtion nd query structures. Document retrievl hs creted mny interests in the informtion retrievl community. Document retrievl refers to finding similr documents for given user s query. A user s query cn e rnged from full description of document to few keywords. Most of the extensively used retrievl pproches re keywords sed serching methods, e.g., in which untrined users provide few keywords to the serch engine finding the relevnt documents in returned list. Another type of document retrievl is to use query document to serch similr ones. Using n entire document s query performs well in improving retrievl ccurcy, ut it is more computtionlly demnding compred with the keywords sed method. In ddition to retrievl tsk, document clssifiction nd clustering hs lso ecome importnt in orgnizing the mssive mount of document dt, which lso uses similr feture extrction pproches to fcilitte the clssifiction nd clustering process. Until now, most conventionl models use rough document fetures, such s terms in documents s feture units. Usully the connections * Corresponding uthor. E-mil ddress: eetchow@cityu.edu.hk (T.W.S. Chow). mong terms re overlooked which results in losing importnt semntic informtion of documents. Thus, there is need of developing more effective document representtion scheme to enhnce the performnce of relevnt document dt mining. Most currently used methods of document representtion in text dt mining re sed on vector spce, proilistic nd lnguge models. The vector spce model (VSM) (Slton & McGill, 983), the most populr nd widely used tf-idf scheme, uses sic voculry of words or terms for feture description. The term frequency (tf) is the numer of occurrences of ech term, nd the inverse document frequency (idf) is function of the numer of document where term took plce. A term weighted vector is then constructed for ech document using tf nd idf. Similrity etween two documents is then mesured using cosine distnce or ny other distnce functions (Zoel & Mofft, 998). Thus, this VSM scheme reduces ritrry length of term vector in ech document to fixed length. But lengthy vector is required for descriing the frequency informtion of terms, ecuse the numer of words involved is usully huge. This cuses significnt increse of computtionl urden mking the VSM model imprcticl for lrge corpus. In ddition, VSM scheme revels little sttisticl structure out document. To overcome these shortcomings, reserchers hve proposed severl dimensionlity reduction methods such s ltent semntic indexing (LSI) (Deerwester & Dumis, 990), proilistic ltent semntic indexing (PLSI) (Hofmnn, 999), ltent Dirichlet lloction (LDA) (Blei, Ng, & Jordn, 2003) nd exponentil fmily hrmonium model (EFHM) (Welling, Rosen-Zvi, & Hinton, 2004). LSI mps the documents nd terms to ltent spce representtion y employing liner projection to compress the feture vector of the VSM model into low dimension. In ddition /$ - see front mtter Ó 2009 Elsevier Ltd. All rights reserved. doi:0.06/j.esw

2 2024 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) to feture compression, the LSI model is useful in encoding the semntics (Berry, Dumis, & O Brien, 995). A step forwrd in proilistic models is PLSI tht defines proper genertive model of dt to model ech word in document s smple from mixture distriution nd develop fctor representtions for mixture components. By relizing overfitting prolems nd lck of description t the level of documents, Blei et l. (2003) introduced further extension in this regrd, ltent Dirichlet lloction. LDA is viewed s three-level hierrchicl Byesin model, in which ech document is modeled s finite mixture over n underlying set of topics. Using proilistic pproch then provides n explicit representtion of document. Compred with LDA, exponentil fmily hrmonium model is n lterntive two-lyer model using exponentil fmily distriutions nd the semntics of undirected models. EFHM is le to reduce the feture dimension significntly using few ltent vriles to represent document. Aprt from these proilistic models, lnguge model (Ponte & Croft, 998), n lterntive to the trditionl tf.idf relevnce models, hs ecome quite populr. The relevnce of document to given query is rnked y using sttisticl techniques nd underlying lnguge model of the document. Moreover, Erkn (2006) introduced lnguge model-sed document representtion using rndom wlks for document clustering. Speeding up retrievl system for rel time ppliction is nother eqully importnt issue. In ddition to the ove mentioned techniques for reducing feture dimension tht re le to improve the retrievl speed in some extent, other ttempts such s clustering methods nd novel system structures re suggested to speed up the retrievl opertion. Rooney, Ptterson, Glushk, nd Dorynin (2006) introduced n ide tht clustering of lrge document corpus could e used for speeding up document retrievl. In order to reduce the serching effort, the scheme is to nrrow the serching scope y compring query to group of documents tht re clustered ccording to the document nture. Fuzzy concept employing clustering techniques (Horng, Chen, Chng, & Lee, 2005; Rldvn, Tutuncu, & Allhverdi, 2007) ws lso used in document retrievl. Other thn clustering techniques, new file structure (Du, Ghnt, Mly, & Shrrock, 989) ws lso suggested to speed up the retrievl process. Despite the progress on the re of document retrievl, most reported techniques re lrgely sed on typicl term frequency informtion of g of words model. This pproch ignores the connections mong terms. In ll the ove mentioned pproches, it is noticed tht they ll use independent word s feture unit. These feture schemes re rough representtion of document. For exmple, two documents contining similr term frequencies my e contextully different when the sptil distriution of terms re very different, i.e., school, computer, nd science mens very different when they pper in different prts of document compred to the cse of school of computer science tht pper together. In ddition, with the evolution of nturl lnguge, there re incresing comintoril words emerged such s computer network, neurl network, complex network, nture network, etc. Thus, only using term frequency informtion from the g of words model is not the most effective wy to ccount contextul similrity tht includes the word inter-connections nd sptil distriution of words throughout the document. The semntics my e very different whether considering the term-connections or not. To ddress the ove shortcomings nd improve the retrievl ccurcy, we in this pper introduce grphs for document representtion tht resulting in more semntic informtion to e included. It is worth mentioning tht grph representtion for document is not new. An interesting ppliction of grph representtion descriing words links with perspective of evolving complex network for humn lnguge study cn e found in Dorogovtsev nd Mendes (200), Cncho nd Sole (200). In Schenker, Lst, Bunke, nd Kndel (2003), Schenker, Lst, Bunke, nd Kndel (2004), different directed grphs with few most frequent terms s nodes were defined to represent document, k- Nerest Neighor lgorithm (k-nn) with different grph mtching distnces sed on mximum common sugrph ws pplied to we document clssifiction. Although it is quite successful to enhnce the clssifiction ccurcy, grph mtching cn e ccomplished in polynomil time mking it imprcticl for lrge dt sets. Aprt from the computtion time limittion, there my e difficulties in finding mximum common sugrph (sugrph isomorphism) etween two documents. It is much difficult to define resonle common sugrphs tht re le to ctch the semntic similrity etween documents ecuse portions in documents re usully not exctly similr (generlly, quite few prts re similr). In this pper, in order to void time consuming mtching process, first, we extrct term-connections from grph representtions with extensive feture extrction methods. Ech document is then projected into feture vector spce forming term-connection-frequency (tcf) together with term frequency (tf). This pproch enles more semntic informtion to e utilized for document dt mining. In ppliction to retrievl process, we employ SOM to ccelerte the serching opertions y mtching ech document to topologiclly ordered neurons, nd we further use query feedck to improve the retrievl ccurcy. In summry, the contriution of this pper is twofold. First, we propose new composite vector for representing document comined with trditionl term frequency nd term-connection-frequency extrcted from grphs. Multiple fetures re le to express more semntic informtion of the word inter-connections nd sptil distriution of words throughout the document. As result, it enhnces the document retrievl ccurcy. Second, Vectorized grph connectionists fcilitte the mtching of complex grph especilly when the system hndles lrge dtsets. The vectorized grph uses fixed length vector, resulting in sustntil reduction in computtionl cost. We employ SOM together with relevnce feedck pproch to improve the computtionl efficiency nd the document retrievl ccurcy. The method using vectorized multiple fetures cn serve s unified feture extrction frmework for performing oth document retrievl nd document clssifiction. The remining sessions of this pper re orgnized s follows. Undirected nd directed grph representtions of documents re introduced in Section 2. In Section 3, three extrction schemes for term-connection fetures re descried in detils, multiple fetures re projected into vector spce. Section 4 presents the SOM implementtion for multiple fetures of documents. SOM sed retrievl system is descried in Section 5. Extensive simultion results followed y discussions re presented in Section 6. The pper ends with conclusions nd future work propositions in Section Grph representtions of documents 2.. Directed grph representtion In our work, we use grphs to represent ech document in corpus. It is quite strightforwrd to pply directed grph to express the semntics using terms in sequence ppering in the document. First, we remove the stop words (set of common words such s in, the, re, etc.) which deliver little discriminte informtion. Then, we use the rest of terms to form directed grph. A directed grph ~ G for document is denoted y ~ G ¼ð ~ V; ~ E; ~ /; ~ hþ, where, ~ V represents set of vertices (i.e. terms), ~ E is set of edges or connections etween terms, ~ / : ~ V! ~ LV ssigns n ttriute (i.e. term frequency) to ech vertex of ~ V, similrly, ~ h : ~ E! ~ LE ssigns n ttriute (i.e. term-connection-

3 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) frequency) to ech edge of ~ E. For exmple, Fig. illustrtes how such grph would look like for sentence we found it significntly more expensive for sending money to Mexico, ut slightly less for sending money to the United Kingdom. Note tht we use only single vertex for ech term even if term ppers more thn once in the document. In n erly implementtion, we used single vertex to represent term chin consisting of two nd three words tht pper together throughout the document, ut lter found tht using only single vertex for ech term is sufficient nd improves the performnce of our ppliction. Ech vertex is leled with term frequency mesure tht indictes how mny times the relted term ppers in the document. Similrly, ech edge is leled with term-connection-frequency mesure tht indictes how mny times the connected terms pper together in the document. Here, connected mens tht two terms re djcent to ech other in the specified sequence in the document Undirected grph representtion Using directed grph ccording to the sequence of words is le to represent the semntics of document. In mny cses the sequence of words is convertile, lthough it conveys the sme semntics for humn lnguge. For exmple, computer science cn e expressed s science of computer, which delivers the sme mening. Thus, in this pper we propose to use n undirected grph for representtion of ech document. Similrly, first we remove the stop words, nd develop n undirected grph G for ech document denoted y G =(V,E,/,h) where nottions re similr to directed grph G. Fig. 2 illustrtes the sme exmple discussed in the previous section y using undirected grph. Likewise, ech vertex is leled with term frequency mesure tht indictes how mny times the relted term ppers in the document. Ech edge is leled with term-connection-frequency mesure tht indictes how mny times the connected terms pper together in the document. Here, connected mens tht two terms re djcent to ech other without differing the word sequence. 3. Multiple fetures extrction In this section, we descrie the multiple fetures (terms nd term-connections) extrction pproches to extrct more informtion from ech document for etter document nlysis. Fig.. Directed grph s n exmple: we found it significntly more expensive for sending money to Mexico, ut slightly less for sending money to the United Kingdom. (Here, we, it, more, for, to, ut, less, for, the re stop words tht re removed.). Fig. 2. Undirected grph s n exmple: we found it significntly more expensive for sending money to Mexico, ut slightly less for sending money to the United Kingdom (here, we, it, more, for, to, ut, less, for, the re stop words tht re removed). 3.. Term-frequency-sed feture extrction First, extrct ll the words from ll documents except for stop words in dtse nd pply stemming lgorithm to ech word. Here, Porter stemming lgorithm (Porter, 980) is pplied to extrct stem of ech word, nd stems re used s sic fetures insted of originl words. Thus, send, sent nd sending re ll considered the sme word. Store the stemmed words together with the informtion of term-frequency f t nd the document-frequency f t d. Then, construct the voculry sed on term-frequency fetures. We use term-weighting mesure in clculting the weight of ech word, which is similr to VSM (Slton & Buckley, 996) p W t ¼ ffiffiffi f t idf ; ðþ where the inverse-document-frequency idf ¼ log 2, nd N is the totl numer of documents in the corpus. Then, the words re sorted in descending order ccording to the weights nd the first N t words re selected to construct the voculry. The choice of N t depends on the dtse Term-connection-frequency-sed feture extrction Feture extrction of terms-connection-frequency is sed on the word voculry, which is constructed in Section 3.. We use terms in the word voculry to uild directed or undirected grph for ech document. Bsed on grph representtions, if we directly use grph mtching methods to clculte the semntic similrity like references (Schenker et l., 2004), much time nd storge spce will e wsted for lrge dtsets ecuse the djcent mtrix of ech document is so sprse. The djcent mtrix A k (k =,2,...,N) for grph G k (or ~ G k ) is denoted y A k ¼½A k ij Š NtNt where A k ij ¼ f k;tc ij represents the term-connection-frequency etween term i nd term j in document k. Then, we clculte the totl term-connection-frequency djcent mtrix for ll the documents (i.e. A ¼ P N k¼ Ak ). We lso store the document frequency f tc d;ij for term-connection etween term i nd term j in the dtse. We then use three schemes to extrct the term-connection-frequency to construct term-connection sed voculry. The first one, the simplest wy, is top term-connection-sed method tht is to select the most frequent N tc term-connections from mtrix A. Second, we use the sme weighting mesure to clculte the weight of ech term-connection for pir of terms W tc ij qffiffiffiffiffi ¼ f tc ij idf tc ij ; N f t d ð2þ

4 where B t nd B tc re the projection mtrixes with dimen- nd m tc F re feture from term-frequency nd term-connection-frequency respectively. The projected fetures in F h re ordered ccording to their sttisticl importnce. In our ppliction, oth m t F nd mtc F re set to e 00, respectively. (6) Sve the projection mtrixes for mking the fetures of new query document. The multiple fetures of query document re extrcted in the sme wy except for steps (), (2) nd (4) tht re computed only once over the dtse T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) where the inverse-document-frequency idf tc N ij ¼ log 2. Then, we f tc d;ij sion N sort the term-connections y using the weights in descending order t m t F nd N tc m tc F, respectively, mt F the dimension of the projected nd select the first N tc term-connections. Finlly, we use similr entropy-sed mesure (Lochum & Streeter, 989) to weight ech term-connection entropy ij ¼ XN k¼ A k ij log f tc 2 d;ij f tc d;ij A k ij 3.3. Projection to vector spce! : ð3þ After fetures re extrcted from documents, we cn use pproprite dt reduction methods to otin lower dimensionl feture due to the curse of dimension. Here, we pply PCA, populr tool to project higher dimensionl dt into lower dimensionl feture without losing much sttisticl informtion, to construct document histogrm vector comined with term-frequency feture nd term-connection-frequency feture. The overll procedures of extrcting multiple fetures re summrized s follows. The ove extrcted hyrid feture vector cn ctully e used in vrious document pplictions such s document clssifiction, ctegoriztion nd retrievl. In this pper, we only used document retrievl s n ppliction exmple to provide insights to the sptil structure of documents nd exhiit its performnce. Other pplictions will e esily comined to this frmework using ove hyrid document feture. () Extrct words from ll the documents in the corpus excepting for stop words nd pply stemming to ech word. Clculte the weight of ech word ccording to Eq. (), nd select the first N t words to construct term-frequency-sed voculry. (2) Build grph for ech document using selected words s nodes nd clculte the totl djcent mtrix A. Select the first N tc term-connections (or the indexes of edges in grph) sed on ove mentioned three schemes (i.e. top tcf-sed, weighted tcf-sed, nd entropy tcf-sed) to construct the term-connection-frequency-sed voculry. (3) Clculte term histogrms nd term-connection histogrms for documents tht represent the multiple fetures of documents. Ech element of the histogrms indictes the numer of times tht the corresponding term or term-connection ppers in document. Finlly, we normlize the histogrm s follows. h i H ¼ h t h t 2 h t N t h tc h tc 2 h tc ; h t i ¼ f t ðn t i Þ; htc j ¼ f tc ðn tc ; ð4þ j Þ where n t i is the frequency of ith term in the voculry, similrly, n tc j is the frequency of jth term-connection in the voculry, f t n t i nd f tc n tc j re normliztion functions. In fct, there re mny lterntive normliztion pproches such s men normliztion, mximum normliztion, or trditionl tf-idf (i.e. VSM) normliztion. In this pper, we use tf-idf normliztion method on the histogrm vector h t i ¼ nt i N P Nt log 2 ; i¼ nt ðf t Þ i i! h tc j N log 2 : ð5þ j ¼ P ntc Ntc j¼ ntc j f tc d j (4) Use the normlized histogrm to construct the PCA projection mtrix. We use the MATLAB toolox (Esen, 2005) to compute the PCA projection mtrix. (5) Project the normlized histogrm into the lower dimensionl PCA feture y using PCA projection mtrix. The PCA fetures re computed s follows F h ¼ F t h F tc h ; h i F t h ¼ ht h t 2 h t N t h F tc h ¼ htc h tc 2 h tc N tc B t ; i B tc : N tc ð6þ 4. SOM implementtion for document retrievl Self-orgnizing mp (SOM) (Kohonen, 997) is verstile unsupervised neurl network used for dimension reduction, vector quntiztion nd visuliztion. It is le to preserve topologiclly ordered output mp, where input dt re mpped into smll numer of neurons. In this wy, SOM cn form mpping from document corpus to topologiclly ordered neurons. There re mny ttempts to employ SOM for document feture projection to reduce the dt dimensionlity (Ampzis & Perntonis, 2004; Honkel, Kski, Lgus, & Kohonen, 997). SOM hs een used for document orgniztion nd we mining (Antonio et l., 2008; Georgkis, Kotropoulos, Xfopoulos, & Pits, 2004). Document clustering nd rowsing using SOM re introduced in Is, Kllimni, nd Lee (2008), Freemn nd Yin (2005). In this pper, we employ SOM to speed up the retrievl process. SOM consists of M neurons locted t regulr low dimensionl grid tht is usully in 2-D grid. The lttice of the grid is either hexgonl or rectngle. The SOM lgorithm is itertive. Ech neuron i contins d-dimensionl feture vector w i ¼ ½w i w i2 w id Š T. At ech trining step t, smple dt vector x(t) is rndomly chosen from trining set. Distnces etween x(t) nd ll the feture vectors in the grid re computed. The winning neuron, denoted y c, is the neuron with the feture vector closest to x(t) c ¼ rg mxðfðxðtþ; w i ÞÞ; i 2f; 2;...; Mg; ð7þ i where F() is distnce function to compute the similrity etween x(t) nd w i. In this pper, we use cosine distnce to define the similrity X t W t X tc W tc FðxðtÞ; w i Þ¼C þð CÞ kx t kkw t k kx tc kkw tc k ; where h i T; X t ¼ ðxðtþþ ; ðxðtþþ m t F W t ¼ w i ; w im t T; F h X tc ¼ ðxðtþþ m t þ; ðxðtþþ i T; m t F F þmtc F W tc ¼ w iðm t F þþ ; w iðm t T; F þmtc F Þ where indictes the dot product opertion, nd C(0 6 C 6 ) is weight prmeter to lnce the importnce of term-frequency feture nd term-connection-frequency feture. The first prt of the ð8þ

5 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) expression computes the similrity sed on tf feture, nd the second prt computes the similrity sed on tcf feture. It is worth noting tht the expression provides flexiility to users to chnge the vlue of C to lnce this similrity mesure ccording to their expecttions. In this pper, the effect of the prmeter C is lso included nd studied in Section 6. Here, the neighorhood kernel function is tken s Gussin function, nd the lerning rte decreses monotoniclly with itertion. We do not include the detils of the trining process of SOM. Reders re referred to ny stndrd ook such s Kohonen (997). 5. SOM-sed retrievl system frmework 5.. Pre-processing of the retrievl system The pre-processing for document retrievl cn e summrized s follows. () Sve previously constructed voculry se nd PCA projection mtrix for multiple fetures. (2) Trin the SOM with ll the documents of the dtse. (3) Sve the indexes of trining dt with their winning neurons, which is used for the document retrievl. (4) Sve the feture vectors (i.e. weights) of the SOM neurons. Sve the SOM inputs of ll documents constructed during trining process, which is used for relevnce feedck. Thus the trined SOM is redy to perform retrievl tsk for ny new query document Document retrievl A document is represented y histogrm vector comined with term-frequency feture nd term-connection-frequency feture. Ech document is indexed ginst its winning neuron t the grid of SOM. This ssocition etween document nd neuron tht is constructed y the pre-processing stge is prepred for document retrievl. The overll SOM-sed retrievl system cn e summrized s follows. () For given query document, extrct its multiple fetures. Compute the projected feture using pre-stored voculry se nd PCA projection mtrix. (2) Mtch the projected feture to find the most similr neurons on the SOM grid nd return their ttched documents. (3) Go through the sorted neurons in descending order nd dd their ttched documents into the retrievl list until t lest user-defined N ret documents re ppended. Here, the totl numer of documents cn e lrger thn N ret. (4) Sort the documents in the retrievl list y compring the query ccording to Eq. (8). Return the first N ret documents to users Relevnce feedck In order to mke this system prcticl, we lso provide n inference to llow users to rowse through the preliminrily retrieved documents nd give the relevnce feedck to the retrievl system. In this study, we use query modifiction (Chow, Rhmn, & Wu, 2006) for relevnce feedck opertion. The modified query dt X new is otined y verging ll the fetures of the query nd relevnt documents X new ¼ N R þ X q þ XNR X r!; ð9þ r¼ where N R is the numer of relevnt documents, X q is the feture vector of the query document, nd X r is the feture vector of the rth relevnt document. After query modifiction, the retrievl process cn e summrized s follows. () Mtch the modified query feture vector X new with the neurons on the SOM grid. (2) Sort the neurons in descending order ccording to distnce with the query X new. (3) Go through the sorted neurons in descending order nd dd their ttched documents into the retrievl list until t lest user-defined N ret documents re ppended. (4) Sort the documents in the retrievl list y compring the new query with the preliminry retrievl results in step (3) nd then return the first N ret documents to users. In fct, this relevnce feedck process cn e executed in severl times. In this study, we use this process only once. 6. Experimentl results nd discussion 6.. Dtse nd experimentl setup In this study, the document dtse, Html_CityU, which consists of 25 ctegories, were used for ll simultions. Ech ctegory includes 400 documents mking totl numer of 0,000 documents. The corpus ws split into trining set nd test set tht is used for query. One thousnd test documents were rndomly selected from the 25 ctegories, i.e The remining 9000 documents were used for trining. In order to provide more rel-life testing pltform, we estlished this dtse consisting of documents with size rnged from few hundred words to over 20 thousnd words. For ech ctegory, 400 documents were retrieved from Google using set of keywords. Some of the keywords re shred mong different ctegories, ut the set of keywords for ctegory is different from tht of other ctegories. The dtse cn e found online t Prmeters of the SOM lgorithm re set s follows. The size of the SOM grid ws set t 30 30; the initil lerning rte l 0 ws set to 0.3; the initil rdius of the neighorhood function ws set to hlf-length of the SOM squre grid; the numer of totl trining itertions ws set to 27,000 (i.e. three epochs multiply 9000 documents); nd the lnce weight prmeter C for term-frequency nd term-connection-frequency feture ws set to. All the ove prmeters were found to deliver good performnce. But it ws lso noticed tht mild devition from these settings would not hve noticele effect on the overll performnce. After SOM trining, the test set ws used to verify the performnce of this work. All the simultions were performed on PC with Intel Core GHz nd 2 GB memory. The feture extrction progrms were written in Jv progrmming lnguge, nd ll the document retrievl progrms were tested in Mtl Results nd discussion In this section, we present our simultion results using multiples fetures nd SOM-sed retrievl system. In our comprtive study, we compre the results of SOM with tht of direct method y using single feture nd multiple fetures. In direct method (tht is similr to LSI), the query dt with projected feture vector is compred directly with the documents in the trining set y using cosine distnce like Eq. (8). To quntify the retrievl results, we used verged precision nd recll vlues for ech query document from the test set nd retrieving unto 360 documents. The precision nd recll mesure re defined s follows

6 2028 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) No: of correctly retrieved documents ¼ ; No: of totl retrieved documents ð0þ No: of correctly retrieved documents Recll ¼ No: of totl documents in relevnt ctegory : ðþ First, we summrize the results y using undirected grph for document representtion with different feture extrction pproches in Tles 3. Retrieved documents, the most similr trining documents from the dtsets for every query, vry from to 360, the precision nd recll vlues in the cse of 0, 40, nd 360 documents retrieved re listed. Performnces of different retrievl pproches like SOM, RF in SOM, nd direct method re compred in different feture extrction schemes. In order to show the effect of term-connection-frequency, we summrized the vlues of precision nd recll with different numer of retrieved documents when term-connection-frequency (tcf) is used s single feture (i.e. lnce weight prmeter C = 0). We list the results with term-frequency (tf) s single feture (i.e. C = ). From Tle, sed on top-frequency feture extrction method, it is oserved tht different retrievl tools using fetures comined tf with tcf chieve significnt improvement of retrievl ccurcy with 0 retrieved documents. The results sed on the pproch of multiple fetures consistently delivers etter results thn those using single feture (either tf or tcf) when the numer of retrieved documents increses from 0 to 360. Similr results re lso otined for the recll mesurement. On the other hnd, through relevnce feedck SOM delivers the est retrievl results mong three different query types, nd pure SOM pproch performs etter thn direct method. For weighted-frequency sed feture extrction method shown in Tle 2, using multiple Tle Retrievl results with top-frequency-sed feture extrction method y using undirected grph for document representtion. Feture extrction scheme Query type Feture No. of retrieved documents (%) Recll (%) Top-frequency-sed method SOM tf + tcf tf tcf RF in SOM tf + tcf tf tcf Direct method tf + tcf tf tcf Tle 2 Retrievl results with weighted-frequency-sed feture extrction method y using undirected grph for document representtion. Feture extrction scheme Query type Feture No. of retrieved documents (%) Recll (%) Weighted-frequency-sed method SOM tf + tcf tf tcf RF in SOM tf + tcf tf tcf Direct method tf + tcf tf tcf Tle 3 Retrievl results with entropy-sed feture extrction method y using undirected grph for document representtion. Feture extrction scheme Query type Feture No. of retrieved documents (%) Recll (%) Entropy-sed method SOM tf + tcf tf tcf RF in SOM tf + tcf tf tcf Direct method tf + tcf tf tcf

7 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) fetures otins out 2.5% improvement compred with only using tf feture. RF in SOM encourgingly delivers etter performnce with 92.40% precision nd 2.57% recll for 0 documents retrieved. From Tle 3, using multiple fetures consistently perform well sed on entropy feture extrction pproch. SOM nd RF in SOM chieve 79.44% nd 8.09% in precision mesurement, respectively even for 360 documents retrieved. Fig. 3 visully summrizes the precision results ginst numer of retrieved documents nd the precision results ginst recll vlues with top-frequency-sed feture extrction scheme. Fig. 3 shows the results of using different fetures for document retrievl, i.e. single tf feture nd comined tcf with tf feture, nd it lso shows the results of the first retrievl s well s the retrievl results fter relevnce feedck for ech feture comintion. It is oserved tht retrievl precision decreses with the increse of numer of retrieved documents. nd tcf chieves out 98% precision when only one document retrieved. Using multiple fetures is le to otin the highest improvement of ccurcy when 20 documents re retrieved. The shrp slope pproximtes right-ngle in Fig. 3() due to the reltionship etween precision nd recll mesurement. Numer of retrieved documents vrying from 50 to 200 shows interesting results. The precision vlue decreses in n insignificnt rte using tf nd tcf fetures, whilst it increses slightly for the cse of using only tf. Similr visul results re shown in Figs. 4 nd 5 y using weighted-frequency-sed feture extrction nd entropy-sed feture extrction scheme, respectively, with different feture comintions. In summry, ccording to ove quntittive nd visul results, it is oserved tht using multiple fetures is le to otin significnt improvement of retrievl ccurcy compred with using term-frequency feture only in ll the cses. This indictes tht the ddition of term-connection feture is significnt in providing discriminte informtion for document retrievl. Compred with single term-frequency feture, using multiple fetures improves No. of Retrieved Document Recll Fig. 3. Retrievl results sed on top-frequency feture extrction using undirected grph: () precision ginst no. of retrieved document nd () precision ginst recll No. of Retrieved Document Recll Fig. 4. Retrievl results sed on weighted-frequency feture extrction using undirected grph: () precision ginst no. of retrieved document nd () precision ginst recll.

8 2030 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) t lest 2% quntittively in precision. It is more interesting to note tht SOM pproch delivers etter performnce thn direct method fter comining tf nd tcf fetures compred with the cse of using single feture. This is elieved to e minly ttriuted to the choice of normliztion methods (here, we used VSM normliztion). The use of relevnce feedck (i.e. RF-SOM) is useful s it is le to consistently improve the retrievl ccurcy from out 2% to 3% compred with the first retrievl using SOM pproch without relevnce feedck. In our study, top-frequency-sed method performs etter thn weighted-frequency-sed method, when we used SOM pproch without relevnce feedck together with undirected grph for document representtion. After relevnce feedck is conducted, weighted-frequency-sed method shows etter results thn top-frequency-sed feture extrction scheme. Interesting results re oserved for entropy feture extrction pproch. Entropy-sed method exhiits the worst performnce compred with other two feture extrction pproches when few documents re retrieved. As the numer of retrieved documents increses to 360, entropy pproch delivers the est performnce compred to other two methods No. of Retrieved Document Recll Fig. 5. Retrievl results sed on entropy feture extrction using undirected grph: () precision ginst no. of retrieved document nd () precision ginst recll. Tle 4 Retrievl results with top-frequency-sed feture extrction method y using directed grph for document representtion. Feture extrction scheme Query type Feture No. of retrieved documents (%) Recll (%) Top-frequency-sed method SOM tf + tcf tf tcf RF in SOM tf + tcf tf tcf Direct method tf + tcf tf tcf Tle 5 Retrievl results with weighted-frequency-sed feture extrction method y using directed grph for document representtion. Feture extrction scheme Query type Feture No. of retrieved documents (%) Recll (%) Weighted-frequency-sed method SOM tf + tcf tf tcf RF in SOM tf + tcf tf tcf Direct method tf + tcf tf tcf

9 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) The reported results using recll mesurement lso deliver the similr performnce with the ove comprtive nlysis y using different types of fetures nd retrievl pproches. Tles 4 6 summrize the retrievl results y using directed grph for document representtion with different feture extrction pproches. Figs. 6 8 illustrte the visul retrievl results Tle 6 Retrievl results with entropy-sed feture extrction method y using directed grph for document representtion. Feture extrction scheme Query type Feture No. of retrieved documents (%) Recll (%) Entropy-sed method SOM tf + tcf tf tcf RF in SOM tf + tcf tf tcf Direct method tf + tcf tf tcf No. of Retrieved Document Recll Fig. 6. Retrievl results sed on top-frequency feture extrction using directed grph: () precision ginst no. of retrieved document nd () precision ginst recll No. of Retrieved Document Recll Fig. 7. Retrievl results sed on weighted-frequency feture extrction using undirected grph: () precision ginst no. of retrieved document nd () precision ginst recll.

10 2032 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) No. of Retrieved Document Recll Fig. 8. Retrievl results sed on entropy feture extrction using undirected grph: () precision ginst no. of retrieved document nd () precision ginst recll. Tle 7 Query time with different methods. Query type SOM RF in SOM Direct method Query time (s) including different feture extrction methods. Similr comprtive retrievl results re otined y using SOM-sed retrievl system. Using multiple fetures consistently performs etter thn using single feture with precision improvement from out 2% to 3%. SOM pproch s well s relevnce feedck exhiits etter performnce thn direct method. For using directed grph for document representtion, the top-frequency-sed nd the weightedfrequency-sed feture extrction schemes deliver etter retrievl results compred with entropy-sed pproch. It is oserved tht weighted-frequency-sed feture extrction method exhiits etter performnce thn top-frequency-sed pproch in directed grph representtion. By comprtive studies nd nlyzing the cses of different document representtions, it is elieved tht using either undirected grph or directed grph is dependent on the testing dtsets. Our comprtive study nd nlysis indicte tht the superior performnce delivered y SOM-sed retrievl system is ttriuted to the co-existence of the SOM properties of self-orgnizing, topologicl ordering nd non-liner projection. In order to show the serching speed of the retrievl system, Tle 7 summrizes the verge execution time from different retrievl pproches on the sme simultion pltform. It is ovious tht the SOM pproch is much fster thn direct method with improvement of retrievl Tle 8 Retrievl results with SOM using undirected grph for document representtion from 0-fold cross-vlidtion sets. Cross-vlidtion set Query type No. of retrieved documents (%) Recll (%) SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM

11 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) speed of out 36%. Relevnce feedck with SOM lso improves the retrievl speed y out 25%. In order to ensure the generliztion of the proposed system, we lso exmined the retrievl results from 0-fold cross-vlidtion sets. Ech vlidtion set includes 000 documents without ny overlpped dt. Tles 8 nd 9 list the precision nd recll results from 0-fold cross-vlidtion sets y using different document representtions. Similr results re otined from 0 vlidtion sets, which indicte the stility of the system. Finlly, we studied the roustness of the proposed system over SOM initiliztion. We summrized the precision results in different trining sessions in Tles 0 nd y using different grphs for document representtions. It is oserved tht SOM pproch is le to deliver similr retrievl results under different initiliztions. Tle 9 Retrievl results with SOM using directed grph for document representtion from 0-fold cross-vlidtion sets. Cross-vlidtion set Query type No. of retrieved documents (%) Recll (%) SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM Tle 0 results with SOM from different trining sessions using undirected grph for document representtion (%). Trining sessions Query type No. of retrieved documents SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM Tle results with SOM from different trining sessions using directed grph for document representtion (%). Trining sessions Query type No. of retrieved documents SOM RF in SOM SOM RF in SOM SOM RF in SOM SOM RF in SOM

12 2034 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) vg C vg C Fig. 9. Effect of weight prmeter C: () for undirected grph nd () for directed grph Prmeter study This section studies the effect of prmeter of lnce weight C. Bsed on the ssumption tht using comined multiple fetures is le to deliver superior performnce, there must exist n optiml weight C to lnce the effect of term-frequency feture nd term-connection-frequency feture. We used the verge precision mesure to evlute the effect of different vlue of C in Eq. (8). The verge precision mesure is defined s follows. P Nmx ret i¼ precisionðiþ vg C ¼ ; ð2þ N mx ret where N mx ret is the mximum numer of retrieved documents, in our experiments, N mx, ret = 360, precision(i) is denoted s the verge precision when retrieving i documents. We included the verge precision results when C = 0.00, 0.05, 0.0,..., in Fig. 9. It is ovious tht there exists n optiml C tht is le to optimlly comine the multiple fetures ccording to our experiments. The optiml vlue of C, round 0.7 in this study, is dependent upon different dt sets. Users cn specify the reltive emphsis etween term-frequency feture nd term-connection-frequency feture y choosing n pproprite vlue of C ccording to the nture of documents. 7. Conclusion A new document representtion using multiple fetures including term frequency nd vectorized grph connectionists is proposed. This representtion turns complex grph mtching process into fixed length vector tht contins more semntic informtion, nd it cn serve unified feture extrction frmework for vrious document mining tsks. Different feture extrction methods re extensively exmined in this study. We then develop SOM-sed retrievl system in the ppliction level of the new document representtion. Experimentl results show tht vectorized multiple fetures extrcted from different grphs of document representtion re le to enhnce the retrievl ccurcy. SOM is used to speed up the retrievl process nd lso improves the ccurcy of retrievl in our experiments. With the support of relevnce feedck, the SOM-sed system consistently further enhnces the retrievl ccurcy. It is suggested tht when deling with lrge dtset like our pplictions, SOM-retrievl system is le to sve much computtion cost nd cn e prcticl tool for rel time ppliction. References Ampzis, N., & Perntonis, S. (2004). LSISOM: A ltent semntic indexing pproch to self-orgnizing mps of document collections. Neurl Processing Letters, 9(2), Antonio, S. A. et l. (2008). We mining sed on growing hierrchicl selforgnizing mps: Anlysis of rel citizen we portl. Expert Systems with Applictions, 34, Berry, M. W., Dumis, S. T., & O Brien, G. W. (995). Using liner lger for intelligent informtion retrievl. SIAM Review, 37(4), Blei, D., Ng, A., & Jordn, M. (2003). Ltent Dirichlet lloction. Journl of Mchine Lerning Reserch, 3, Cncho, R. F. I., & Sole, R. V. (200). The smll-world of humn lnguge. Proceedings of the Royl Society B: Biologicl Sciences, 268(482), Chow, T. W. S., Rhmn, M. K. M., & Wu, S. (2006). Content sed imge retrievl y using tree-structured fetures nd multi-lyer SOM. Pttern Anlysis nd Applictions, 9(), 20. Deerwester, S., & Dumis, S. (990). Indexing y ltent semntic nlysis. Journl of the Americn Society of Informtion Science, 4(6), Dorogovtsev, S. N., & Mendes, J. F. F. (200). Lnguge s n evolving word we. Proceedings of the Royl Society B: Biologicl Sciences, 268(485), Du, D. H. C., Ghnt, S., Mly, K. J., & Shrrock, S. M. (989). An efficient file structure for document retrievl in the utomted office environment. IEEE Trnsctions on Knowledge nd Dt Engineering, (2), Erkn, G. (2006). Lnguge model-sed document clustering using rndom wlks. In Proceedings of the humn lnguge technology conference of the North Americn Chpter of the ACL, New York, Americ, pp Esen HØgh-Rsmussen (2005). BBTools Mtl toolox for lck-ox computtions. Neuroiology Reserch Unit, Copenhgen University Hospitl. URL: Freemn, Richrd T., & Yin, Hujun (2005). We content mngement y selforgniztion. IEEE Trnsctions on Neurl Networks, 6(5), Georgkis, A., Kotropoulos, C., Xfopoulos, A., & Pits, I. (2004). Mrginl medin SOM for document orgniztion nd retrievl. Neurl Networks, 7(3), Hofmnn, T. (999). Proilistic ltent semntic indexing. In Proceedings of the twenty-second nnul interntionl SIGIR conference. Honkel, T., Kski, S., Lgus, K., Kohonen, T. (997). WEBSOM self-orgnizing mps of document collections. In Proceedings of WSOM 97, workshop on self-orgnizing mps, Espoo, Finlnd, Helsinki University of Technology, Neurl Networks Reserch Centre, pp Horng, Yih- Jen, Chen, Shyi-Ming, Chng, Yu-Chun, & Lee, Chi-Hong (2005). A new method for fuzzy informtion retrievl sed on fuzzy hierrchicl clustering nd fuzzy inference techniques. IEEE Trnsction on Fuzzy Systems, 3(2), Is, D., Kllimni, V. P., Lee, L. H. (2009). Using the self-orgnizing mp for clustering of text documents. Expert Systems with Applictions, 36(5),

13 T.W.S. Chow et l. / Expert Systems with Applictions 36 (2009) Kohonen, T. (997). Self-orgnizing mps. Berlin, Germny: Springer-Verlg. Lochum, K. E., & Streeter, L. A. (989). Compring nd comining the effectiveness of ltent semntic indexing nd the ordinry vector spce model for informtion retrievl. Journl of Informtion Science, 6, Ponte, J. M., Croft, W. B. (998). A lnguge modeling pproch to informtion retrievl. In Proceedings. of the 2st nnul interntionl ACM SIGIR conference on reserch nd development in informtion retrievl, Melourne, Austrli, pp Porter, M. F. (980). An lgorithm for suffix stripping. Progrm, 4(3), Rldvn, Srcoglu, Tutuncu, Keml, & Allhverdi, Novruz (2007). A fuzzy clustering pproch for finding similr documents using novel similrity mesure. Expert Systems with Applictions, 33, Rooney, N., Ptterson, D., Glushk, M., & Dorynin, V. (2006). A sclle document clustering pproch for lrge document corpor. Informtion Processing nd Mngement, 42, Slton, G., & Buckley, C. (996). Term weighting pproches in utomtic text retrievl. Informtion Processing nd Mngement, 32(4), Slton, G., & McGill, M. (Eds.). (983). Introduction to modern informtion retrievl. McGrw-Hill. Schenker, A., Lst, M., Bunke, H., Kndel, A. (2003). Clssifiction of we document using grph model. In Proceedings of the 7th interntionl conference on document nlysis nd recognition (ICDAR 03). Schenker, A., Lst, M., Bunke, H., & Kndel, A. (2004). Clssifiction of we documents using grph mtching. Interntionl Journl of Pttern Recognition nd Artificil Intelligence, 8(3), Welling, M., Rosen-Zvi, M., & Hinton, G. (2004). Exponentil fmily hrmoniums with n ppliction to informtion retrievl. Advnces in neurl informtion processing systems (vol. 7, pp ). Cmridge, MA: MIT Press. Zoel, J., & Mofft, A. (998). Exploring the similrity spce. ACM SIGIR Forum, 32(), 8 34.

Text mining: bag of words representation and beyond it

Text mining: bag of words representation and beyond it Text mining: bg of words representtion nd beyond it Jsmink Dobš Fculty of Orgniztion nd Informtics University of Zgreb 1 Outline Definition of text mining Vector spce model or Bg of words representtion

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

Spectral Analysis of MCDF Operations in Image Processing

Spectral Analysis of MCDF Operations in Image Processing Spectrl Anlysis of MCDF Opertions in Imge Processing ZHIQIANG MA 1,2 WANWU GUO 3 1 School of Computer Science, Northest Norml University Chngchun, Jilin, Chin 2 Deprtment of Computer Science, JilinUniversity

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM Interntionl Journl of Innovtive Reserch in Advnced Engineering (IJIRAE) Volume1 Issue1 (Mrch 2014) A Comprison of the Discretiztion Approch for CST nd Discretiztion Approch for VDM Omr A. A. Shib Fculty

More information

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES GENEATING OTHOIMAGES FO CLOSE-ANGE OBJECTS BY AUTOMATICALLY DETECTING BEAKLINES Efstrtios Stylinidis 1, Lzros Sechidis 1, Petros Ptis 1, Spiros Sptls 2 Aristotle University of Thessloniki 1 Deprtment of

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

Statistical classification of spatial relationships among mathematical symbols

Statistical classification of spatial relationships among mathematical symbols 2009 10th Interntionl Conference on Document Anlysis nd Recognition Sttisticl clssifiction of sptil reltionships mong mthemticl symbols Wl Aly, Seiichi Uchid Deprtment of Intelligent Systems, Kyushu University

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yroslvsky. Fundmentls of Digitl Imge Processing. Course 0555.330 Lecture. Imge enhncement.. Imge enhncement s n imge processing tsk. Clssifiction of imge enhncement methods Imge enhncement is processing

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Mobile IP route optimization method for a carrier-scale IP network

Mobile IP route optimization method for a carrier-scale IP network Moile IP route optimiztion method for crrier-scle IP network Tkeshi Ihr, Hiroyuki Ohnishi, nd Ysushi Tkgi NTT Network Service Systems Lortories 3-9-11 Midori-cho, Musshino-shi, Tokyo 180-8585, Jpn Phone:

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Approximation of Two-Dimensional Rectangle Packing

Approximation of Two-Dimensional Rectangle Packing pproximtion of Two-imensionl Rectngle Pcking Pinhong hen, Yn hen, Mudit Goel, Freddy Mng S70 Project Report, Spring 1999. My 18, 1999 1 Introduction 1-d in pcking nd -d in pcking re clssic NP-complete

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Inference of node replacement graph grammars

Inference of node replacement graph grammars Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. Intelligent Dt Anlysis (27) 24 IOS Press Inference of node replcement grph grmmrs Jcek P. Kukluk, Lwrence B. Holder nd Dine J. Cook Deprtment of Computer

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Principles nd Prdigms Chpter 11 (version April 7, 2008) Mrten vn Steen Vrije Universiteit Amsterdm, Fculty of Science Dept. Mthemtics nd Computer Science Room R4.20. Tel: (020) 598 7784

More information

documents 1. Introduction

documents 1. Introduction www.ijcsi.org 4 Efficient structurl similrity computtion etween XML documents Ali Aïtelhdj Computer Science Deprtment, Fculty of Electricl Engineering nd Computer Science Mouloud Mmmeri University of Tizi-Ouzou

More information

Neurocomputing. Kernel sparse representation based classification. Jun Yin a,n, Zhonghua Liu a, Zhong Jin a, Wankou Yang b. abstract.

Neurocomputing. Kernel sparse representation based classification. Jun Yin a,n, Zhonghua Liu a, Zhong Jin a, Wankou Yang b. abstract. Neurocomputing 77 (2012) 120 128 Contents lists ville t SciVerse ScienceDirect Neurocomputing journl homepge: www.elsevier.com/locte/neucom Kernel sprse representtion sed clssifiction Jun Yin,n, Zhonghu

More information

Cone Cluster Labeling for Support Vector Clustering

Cone Cluster Labeling for Support Vector Clustering Cone Cluster Lbeling for Support Vector Clustering Sei-Hyung Lee Deprtment of Computer Science University of Msschusetts Lowell MA 1854, U.S.A. slee@cs.uml.edu Kren M. Dniels Deprtment of Computer Science

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Agilent Mass Hunter Software

Agilent Mass Hunter Software Agilent Mss Hunter Softwre Quick Strt Guide Use this guide to get strted with the Mss Hunter softwre. Wht is Mss Hunter Softwre? Mss Hunter is n integrl prt of Agilent TOF softwre (version A.02.00). Mss

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Outline. Two combinatorial optimization problems in machine learning. Talk objectives. Grammar induction. DFA induction.

Outline. Two combinatorial optimization problems in machine learning. Talk objectives. Grammar induction. DFA induction. Outline Two comintoril optimiztion prolems in mchine lerning Pierre.Dupont@uclouvin.e 1 Feture selection ICTEAM Institute Université ctholique de Louvin Belgium My 1, 011 P. Dupont (UCL Mchine Lerning

More information

Chapter 2 Sensitivity Analysis: Differential Calculus of Models

Chapter 2 Sensitivity Analysis: Differential Calculus of Models Chpter 2 Sensitivity Anlysis: Differentil Clculus of Models Abstrct Models in remote sensing nd in science nd engineering, in generl re, essentilly, functions of discrete model input prmeters, nd/or functionls

More information

Efficient Regular Expression Grouping Algorithm Based on Label Propagation Xi Chena, Shuqiao Chenb and Ming Maoc

Efficient Regular Expression Grouping Algorithm Based on Label Propagation Xi Chena, Shuqiao Chenb and Ming Maoc 4th Ntionl Conference on Electricl, Electronics nd Computer Engineering (NCEECE 2015) Efficient Regulr Expression Grouping Algorithm Bsed on Lbel Propgtion Xi Chen, Shuqio Chenb nd Ming Moc Ntionl Digitl

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

Slicer Method Comparison Using Open-source 3D Printer

Slicer Method Comparison Using Open-source 3D Printer IOP Conference Series: Erth nd Environmentl Science PAPER OPEN ACCESS Slicer Method Comprison Using Open-source 3D Printer To cite this rticle: M K A Mohd Ariffin et l 2018 IOP Conf. Ser.: Erth Environ.

More information

Vulnerability Analysis of Electric Power Communication Network. Yucong Wu

Vulnerability Analysis of Electric Power Communication Network. Yucong Wu 2nd Interntionl Conference on Advnces in Mechnicl Engineering nd Industril Informtics (AMEII 2016 Vulnerbility Anlysis of Electric Power Communiction Network Yucong Wu Deprtment of Telecommunictions Engineering,

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs SAPPER: Sugrph Indexing nd Approximte Mtching in Lrge Grphs Shijie Zhng, Jiong Yng, Wei Jin EECS Dept., Cse Western Reserve University, {shijie.zhng, jiong.yng, wei.jin}@cse.edu ABSTRACT With the emergence

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementry Figure y (m) x (m) prllel perpendiculr Distnce (m) Bird Stndrd devition for distnce (m) c 6 prllel perpendiculr 4 doi:.8/nture99 SUPPLEMENTARY FIGURE Confirmtion tht movement within the flock

More information

Parallel Square and Cube Computations

Parallel Square and Cube Computations Prllel Squre nd Cube Computtions Albert A. Liddicot nd Michel J. Flynn Computer Systems Lbortory, Deprtment of Electricl Engineering Stnford University Gtes Building 5 Serr Mll, Stnford, CA 945, USA liddicot@stnford.edu

More information

Approximation by NURBS with free knots

Approximation by NURBS with free knots pproximtion by NURBS with free knots M Rndrinrivony G Brunnett echnicl University of Chemnitz Fculty of Computer Science Computer Grphics nd Visuliztion Strße der Ntionen 6 97 Chemnitz Germny Emil: mhrvo@informtiktu-chemnitzde

More information

Lily Yen and Mogens Hansen

Lily Yen and Mogens Hansen SKOLID / SKOLID No. 8 Lily Yen nd Mogens Hnsen Skolid hs joined Mthemticl Myhem which is eing reformtted s stnd-lone mthemtics journl for high school students. Solutions to prolems tht ppered in the lst

More information

Registering as an HPE Reseller

Registering as an HPE Reseller Registering s n HPE Reseller Quick Reference Guide for new Prtners Mrch 2019 Registering s new Reseller prtner There re four min steps to register on the Prtner Redy Portl s new Reseller prtner: Appliction

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

Topological Queries on Graph-structured XML Data: Models and Implementations

Topological Queries on Graph-structured XML Data: Models and Implementations Topologicl Queries on Grph-structured XML Dt: Models nd Implementtions Hongzhi Wng, Jinzhong Li, nd Jizhou Luo Astrct In mny pplictions, dt is in grph structure, which cn e nturlly represented s grph-structured

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

Video-rate Image Segmentation by means of Region Splitting and Merging

Video-rate Image Segmentation by means of Region Splitting and Merging Video-rte Imge Segmenttion y mens of Region Splitting nd Merging Knur Anej, Florence Lguzet, Lionel Lcssgne, Alin Merigot Institute for Fundmentl Electronics, University of Pris South Orsy, Frnce knur.nej@gmil.com,

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

Computer Vision and Image Understanding

Computer Vision and Image Understanding Computer Vision nd Imge Understnding 116 (2012) 25 37 Contents lists ville t SciVerse ScienceDirect Computer Vision nd Imge Understnding journl homepge: www.elsevier.com/locte/cviu A systemtic pproch for

More information

Registering as a HPE Reseller. Quick Reference Guide for new Partners in Asia Pacific

Registering as a HPE Reseller. Quick Reference Guide for new Partners in Asia Pacific Registering s HPE Reseller Quick Reference Guide for new Prtners in Asi Pcific Registering s new Reseller prtner There re five min steps to e new Reseller prtner. Crete your Appliction Copyright 2017 Hewlett

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE 3.1 Scheimpflug Configurtion nd Perspective Distortion Scheimpflug criterion were found out to be the best lyout configurtion for Stereoscopic PIV, becuse

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center Resource Overview Quntile Mesure: Skill or Concept: 80Q Multiply two frctions or frction nd whole numer. (QT N ) Excerpted from: The Mth Lerning Center PO Box 99, Slem, Oregon 9709 099 www.mthlerningcenter.org

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds A Sprse Grid Representtion for Dynmic Three-Dimensionl Worlds Nthn R. Sturtevnt Deprtment of Computer Science University of Denver Denver, CO, 80208 sturtevnt@cs.du.edu Astrct Grid representtions offer

More information

Efficient Rerouting Algorithms for Congestion Mitigation

Efficient Rerouting Algorithms for Congestion Mitigation 2009 IEEE Computer Society Annul Symposium on VLSI Efficient Rerouting Algorithms for Congestion Mitigtion M. A. R. Chudhry*, Z. Asd, A. Sprintson, nd J. Hu Deprtment of Electricl nd Computer Engineering

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

A Modified Sparse Distributed Memory Model for Extracting Clean Patterns from Noisy Inputs

A Modified Sparse Distributed Memory Model for Extracting Clean Patterns from Noisy Inputs Proceedings of Interntionl Joint Conference on Neurl Networks, Atlnt, Georgi, USA, June 14-19, 2009 A Modified Sprse istributed Memory Model for Extrcting Clen Ptterns from Noisy Inputs Hongying Meng Kofi

More information

USING HOUGH TRANSFORM IN LINE EXTRACTION

USING HOUGH TRANSFORM IN LINE EXTRACTION Stylinidis, Efstrtios USING HOUGH TRANSFORM IN LINE EXTRACTION Efstrtios STYLIANIDIS, Petros PATIAS The Aristotle University of Thessloniki, Deprtment of Cdstre Photogrmmetry nd Crtogrphy Univ. Box 473,

More information

Character-Stroke Detection for Text-Localization and Extraction

Character-Stroke Detection for Text-Localization and Extraction Chrcter-Stroke Detection for Text-Locliztion nd Extrction Krishn Subrmnin ksubrm@bbn.com Prem Ntrjn pntrj@bbn.com Michel Decerbo mdecerbo@bbn.com Dvid Cstñòn Boston University dc@bu.edu Abstrct In this

More information

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis On the Detection of Step Edges in Algorithms Bsed on Grdient Vector Anlysis A. Lrr6, E. Montseny Computer Engineering Dept. Universitt Rovir i Virgili Crreter de Slou sin 43006 Trrgon, Spin Emil: lrre@etse.urv.es

More information

Using Social Network Theory for Modeling Human Mobility

Using Social Network Theory for Modeling Human Mobility Using Socil Network Theory for Modeling Humn Moility Shusen Yng, Imperil College London Xinyu Yng nd Cho Zhng, Xi n Jiotong University Evngelos Spyrou, Imperil College London Astrct Humn moility modeling

More information

Preserving Constraints for Aggregation Relationship Type Update in XML Document

Preserving Constraints for Aggregation Relationship Type Update in XML Document Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor

More information

Computing offsets of freeform curves using quadratic trigonometric splines

Computing offsets of freeform curves using quadratic trigonometric splines Computing offsets of freeform curves using qudrtic trigonometric splines JIULONG GU, JAE-DEUK YUN, YOONG-HO JUNG*, TAE-GYEONG KIM,JEONG-WOON LEE, BONG-JUN KIM School of Mechnicl Engineering Pusn Ntionl

More information

Real-Time Stereo Vision Techniques

Real-Time Stereo Vision Techniques Rel-Time Stereo Vision Techniques Christos Georgouls nd Ionnis Andredis Lortory of Electronics, Deprtment of Electricl nd Computer Engineering Democritus University of Thrce Xnthi 6700, Greece {cgeorg,indred}@ee.duth.gr

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

LECT-10, S-1 FP2P08, Javed I.

LECT-10, S-1 FP2P08, Javed I. A Course on Foundtions of Peer-to-Peer Systems & Applictions LECT-10, S-1 CS /799 Foundtion of Peer-to-Peer Applictions & Systems Kent Stte University Dept. of Computer Science www.cs.kent.edu/~jved/clss-p2p08

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

PLWAP Sequential Mining: Open Source Code

PLWAP Sequential Mining: Open Source Code PL Sequentil Mining: Open Source Code C.I. Ezeife School of Computer Science University of Windsor Windsor, Ontrio N9B 3P4 cezeife@uwindsor.c Yi Lu Deprtment of Computer Science Wyne Stte University Detroit,

More information

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Unit 5 Vocabulary. A function is a special relationship where each input has a single output. MODULE 3 Terms Definition Picture/Exmple/Nottion 1 Function Nottion Function nottion is n efficient nd effective wy to write functions of ll types. This nottion llows you to identify the input vlue with

More information

ZZ - Advanced Math Review 2017

ZZ - Advanced Math Review 2017 ZZ - Advnced Mth Review Mtrix Multipliction Given! nd! find the sum of the elements of the product BA First, rewrite the mtrices in the correct order to multiply The product is BA hs order x since B is

More information

HOPC: A NOVEL SIMILARITY METRIC BASED ON GEOMETRIC STRUCTURAL PROPERTIES FOR MULTI-MODAL REMOTE SENSING IMAGE MATCHING

HOPC: A NOVEL SIMILARITY METRIC BASED ON GEOMETRIC STRUCTURAL PROPERTIES FOR MULTI-MODAL REMOTE SENSING IMAGE MATCHING ISPRS Annls of the Photogrmmetry, Remote Sensing nd Sptil Informtion Sciences, Volume III-1, 216 XXIII ISPRS Congress, 12 19 July 216, Prgue, Czech Republic : A NOVEL SILARITY METRIC BASED ON GEOMETRIC

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil

More information

Object and image indexing based on region connection calculus and oriented matroid theory

Object and image indexing based on region connection calculus and oriented matroid theory Discrete Applied Mthemtics 147 (2005) 345 361 www.elsevier.com/locte/dm Oject nd imge indexing sed on region connection clculus nd oriented mtroid theory Ernesto Stffetti, Antoni Gru, Frncesc Serrtos c,

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

A Transportation Problem Analysed by a New Ranking Method

A Transportation Problem Analysed by a New Ranking Method (IJIRSE) Interntionl Journl of Innovtive Reserch in Science & Engineering ISSN (Online) 7-07 A Trnsporttion Problem Anlysed by New Rnking Method Dr. A. Shy Sudh P. Chinthiy Associte Professor PG Scholr

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

Analysis of Computed Diffraction Pattern Diagram for Measuring Yarn Twist Angle

Analysis of Computed Diffraction Pattern Diagram for Measuring Yarn Twist Angle Textiles nd Light ndustril Science nd Technology (TLST) Volume 3, 2014 DO: 10.14355/tlist.2014.0301.01 http://www.tlist-journl.org Anlysis of Computed Diffrction Pttern Digrm for Mesuring Yrn Twist Angle

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation Spotting Seprtor Points t Line Terminls in Compressed Document Imges for Text-line Segmenttion Amrnth R. Deprtment of Studies in Computer Science, University of Mysore, Indi. ABSTRACT Line seprtors re

More information

Comparative Study of Universities Web Structure Mining

Comparative Study of Universities Web Structure Mining Comprtive Study of Universities Web Structure Mining Z. Abdullh, A. R. Hmdn Abstrct This pper is ment to nlyze the rnking of University of Mlysi Terenggnu, UMT s website in the World Wide Web. There re

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Joint Deduplication of Multiple Record Types in Relational Data

Joint Deduplication of Multiple Record Types in Relational Data Joint Dedupliction of Multiple Record Types in Reltionl Dt Aron Culott University of Msschusetts 140 Governor s Drive Amherst, MA USA culott@cs.umss.edu Andrew McCllum University of Msschusetts 140 Governor

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables George Boole IT 3123 Hrdwre nd Softwre Concepts My 28 Digitl Logic The Little Mn Computer 1815 1864 British mthemticin nd philosopher Mny contriutions to mthemtics. Boolen lger: n lger over finite sets

More information