Bootstrapping Structured Page Segmentation

Size: px
Start display at page:

Download "Bootstrapping Structured Page Segmentation"

Transcription

1 Bootstrappng Structured Page Segmentaton Huanfeng Ma and Davd Doermann Laboratory for Language and Meda Processng Insttute for Advanced Computer Studes (UMIACS) Unversty of Maryland, College Park, MD {hfma, ABSTRACT In ths paper, we present an approach to the bootstrappng learnng of a page segmentaton model. The dea evolves from attempts to segment dctonares that often have a consstent page structure, and s extended to the segmentaton of more general structured documents. In cases of hghly regular structure, the layout can be learned from examples of only a few pages. The system s frst traned usng a small number of samples, and a larger test set s processed based on the tranng result. After makng correctons to a selected subset of the test set, these corrected samples are combned wth the orgnal tranng samples to generate bootstrap samples. The newly created samples are used to retran the system agan to refne the learned features and resegment the test samples. Ths procedure s appled teratvely untl the learned parameters are stable. Usng ths approach, we do not need to provde a large group of tranng set ntally, and by bootstrappng, the results can be refned step by step. We have appled ths segmentaton to many structured documents such as dctonares, phone books, spoken language transcrpts, and obtaned satsfyng segmentaton performance. Keywords: Bootstrap, Document Segmentaton, OCR. INTRODUCTION AND RELATED WORK Although we can obtan a lot of nformaton onlne, there s stll a lot of nformaton avalable only n the form of prnted document. Many of the documents have a repeated structure at the physcal and semantc levels, and the layout s often based on the functon of entres. Fgure shows typcal examples of structured documents. In a blngual dctonary, the content s usually structured nto translaton entres; In a phone book, the content s structured nto one person s name and hs/her personal nformaton. A restaurant menu may have a herarchcal structure,.e. the content s frst structured nto food type then further nto menu dshes. The analyss of the structure can help humans extract and organze nformaton. Reorganzng the consstency can also help force automated document analyss system. (c) (d) Fgure. Structured Document Examples Englsh-French Dctonary; Phone Book; (c) Advertsement; (d) Restaurant Menu The process of document layout structure analyss s often dvded nto two tasks: physcal segmentaton and logcal analyss. Physcal segmentaton usually dvdes a page nto zones wth specfc physcal characterstcs. Logcal

2 analyss labels each extracted zone wth a specfc functonal or logcal label. The structure complexty of dfferent documents makes t dffcult to desgn a generc document analyss tool that can be appled to all documents. Furthermore, snce the logcal analyss s based on the physcal segmentaton result, the performance of physcal segmentaton s crucal for understandng of document mage and domnates ts performance. In ths paper, we present a segmentaton approach that combnes the physcal and logcal segmentaton and can be used to segment structured pages by learnng the physcal and semantc features that characterze the functonalty of unque entres. A bootstrap technque s appled to the generaton of tranng data to mprove the accuracy of tranng and segmentaton. Before descrbng our page segmentaton approach n detal, however, we provde a bref survey of related work. Lang, Phllps and Haralck present a probablty-based text-lne dentfcaton and segmentaton approach. Ther approach conssts of two phases: an offlne statstcal tranng and onlne text-lne segmentaton. In the onlne text-lne segmentaton phase, an teratve, relaxaton-lke method was appled to fnd an optmal partton of source enttes by mprovng a condtonal probablty. Kopec and Chou apply a stochastc approach to buld Markov source models for text-lne segmentaton under the assumpton that a symbol template s gven and the zone (or text columns) had been extracted. Under the assumpton that the physcal layout structures of document mages can be modeled by a stochastc regular grammar, Kanugo and Mao 5 use a generatve stochastc document model to model a Chnese-Englsh dctonary page, and a weghted fnte state automaton that model the proecton profle at each level of document physcal layout tree s used to segment the dctonary page on all levels. S. Lee and D. Ryu 6 propose a parameter-free method to segment document mages wth varous font szes, text-lne spacng and document layout structures. There are also some rulebased segmentaton methods whch perform the segmentaton based on rules that are ether manually set up by user 7 or learned automatcally by tranng 9,. 2. PAGE SEGMENTATION In document analyss, pages are frst segmented nto dfferent levels of enttes based on physcal features. Wth ournal artcles, for example, the page can be represented wth a herarchcal structure of zones, text-lnes, words, and characters. The segmentaton s performed based on the physcal features such as spacng, relatve poston and textlne attrbutes. After obtanng the physcal segmentaton result, logcal analyss s usually appled to the hghest level -- zones. For ournal artcals, zones can be classfed as ttle, author, abstract, body, and references. The structured page segmentaton problem we present n ths paper s dfferent from tradtonal page segmentaton problems n that the document s assumed to have repeatng entres wth smlar structures, such as an entry n a dctonary, phone book, table of contents, or bblography. Furthermore, these logcal entres may occur n a sngle physcal zone. In some sense, ths problem s a combnaton of physcal and logcal segmentaton because () pages are frst physcally segmented nto physcal zones; (2) one physcal zone can be further functonally segmented nto multple entres based on ther functonal characterstcs; and (3) extracted entres are classfed nto dfferent logcal types. We often fnd that for these types of documents, the desgn of the document s such that the author provdes dfferent functonal propertes of the zones to allow the reader to dstngush between them. The functonal characterstcs of entry are dfferent for dfferent document types but often consstent wthn a sngle document. For entry classfcaton of dctonares, one entry may extend across columns or zones. Other entres may need to be gnored (.e. classfed as nose) because they are not of nterest for logcal labelng (for example, a page number, header or footer). In a dctonary, our functonal segmentaton essentally nserts a new level (entry) between the zone and text-lne of the typcal herarchcal representaton. Thus the segmentaton s not only based on the structural features of the page, but also on the structural features of the entry. Another novelty of our segmentaton approach s the applcaton of a bootstrap technque for the generaton of new tranng samples. Bootstrappng helps to make the segmentaton adaptve and mprove the segmentaton performance. In our segmentaton, we start wth OCR results that nclude text sze, font, face, text-lne and text-zone nformaton. The goal of the segmentaton s to segment each zone nto multple entres wth smlar features (the features wll be defned n the next subsecton) or organze multple text-lnes nto one entry. The page segmentaton s llustrated n Fgure 2 and each teraton conssts of the followng three steps: ) Feature Extracton: The segmentaton system s automatcally traned usng a small set of labeled samples and features of entres are extracted. 2) Segmentaton: Pages are segmented based on the extracted features.

3 3) Correcton and Bootstrappng: The segmented results are fed back to the user, who can make correctons to a small subset of the results wth errors. Based on the corrected segmentaton results, bootstrappng samples are generated and used to retran the system. To warrant tranng, we only concentrate on documents wth a sgnfcant volume or number of pages. Correcton and tranng requres an operator who knows the structure of the document. Feature Extracton Tranng Samples Bootstrappng Correcton & Bootstrappng User s Correcton Selected Results Pror Knowledge Feature Extracton Feature Space Segmentaton Fnal Results Test Document Physcal & Functonal Decomposton Segmentaton Fgure 2. Dagram of the page segmentaton approach 2. Feature Extracton Based on a study of dfferent types of structured documents, the followng Entry parameters has been shown to be useful and can be extracted and appled. Examples are Specal Symbol 2 2 shown n Fgure Word Style Specal symbols: Specal symbols 3 such as punctuaton, numbers, and 4 other non-alphabet symbols are often used to start a new entry, to 5 Symbol Pattern 4 end a entry, or to mark the Word Style Pattern 4 contnuaton of a text lne. 5 5 Lne Structure Word font, face and sze: Word font, face and sze (especally the features of the frst word n each entry) are Fgure 3. Feature Example often mportant entry features. In a dctonary page, for example, the frst word of each entry (typcally the headword) can be bold, all captal, a dfferent font, or larger than the rest of the entry. Word patterns: Words often form dstngushable patterns whch can be used to descrbe the entry structure consstency. Symbol patterns: Combned wth other symbols or regular characters, specal symbols can form some consstent patterns to represent the begnnng or endng of entry. Lne structures: The ndent, spacng, length, heght of text lnes n a entry all can be contaned n the lne structures to represent the features of a entry. Other features: Other features can also be used to segment the entres such as spacng between adacent entres, the poston of text, scrpt type, word spacng, character case and so on.

4 Durng the tranng (feature extracton) phase, a Bayesan framework s used to assgn and update the probabltes of extracted features. Based on estmated probabltes, each extracted feature wll be assgned a weght that can be used to compute the entry score from all extracted features. The detaled procedure s as follows: () Count the occurrence of dfferent features n tranng samples; (2) Compute feature occurrence rate as the feature probablty. Suppose there are totally N tranng entres, and there K are K extracted features, then for feature ( K), the probablty can be computed as: p =, where K s N the number of occurrence of feature. (3) Assgn feature weghts based on the computed probablty as follows: K p w =, where A = A p = and (4) Consder the extracted features as a formed feature space, each entry s proected to ths space and a votng score s computed as follows: Extracted Features & Weghts FV = K = w S where S = f the feature occurs, otherwse S = (5) Obtan the mnmum, maxmum, average votng scores of entres, these values wll be used as thresholds n the segmentaton stage. Dfferent types of entres may have dfferent feature occurrences, so ths procedure s run for each type of entres (refer to Secton 2.3 for a dscusson of entry types). In the feature extracton stage, we scale the weghts by to facltate computaton. From Fgure 4, we can see that the lne structure (negatve ndent of frst textlne) has the heavest weght, whch means t s the most mportant feature n ths document. 2.2 Segmentaton The segmentaton s an teratve procedure whch maxmzes the feature votng score of a entry n the feature space. Based on the features extracted n the feature extracton phase, a document can, n prncple, be segmented nto entres by searchng for the begnnng and endng text-lnes of a entry. Ths search operaton s a threshold-based teratve and relaxaton-lke procedure, and the threshold can be estmated from the tranng set. Consderng the fact that there are a relatvely small number of text lnes n one page, ths search can be done by brute force. The approach s teratve and tranng set can be generated by bootstrappng, so ntal segmentaton results can be refned step by step. The segmentaton procedure s descrbed as: ) Search canddates for the frst text lne n one zone by feature matchng. Ths operaton s equvalent to determnng f the frst lne n one zone s the begnnng of a new entry or a contnuaton of a entry n the prevous zone or prevous page. 2) Search for the end of a entry. Ths operaton s replaced by searchng the begnnng of next entry because the begnnng of a new entry s the endng of the prevous entry. 3) Remove the extracted entres and terate untl all new entres are dentfed. Once we obtan the ntal segmentaton results, before gong to the next step, the results are traversed and f necessary, two smple operatons (splttng, mergng) are appled. The detals can be found n prevous work 4. Weght Frst Word St yl e K.9 Wor d Styl e Pattern 5.48 Endng Symbol Symbol Pattern Wor d Symbol Pattern 6.67 L ne St r uct ur e Fgure 4. Extracted features and assgned weghts of selected entres n the document of Fgure 3

5 2.3 Correcton and Bootstrappng Because of the complexty of many structured documents, t s dffcult to determne the optmal value of some parameters. We attempt to learn as much as possble about the features of the gven tranng set. One possble way to do ths s to generate a new tranng set from the orgnal set and selected new segmentaton results. Ths technque s the so-called bootstrap technque that was frst proposed by Efron 3 n 979. The new generated tranng samples are called bootstrap samples. The bootstrap samples can be generated from the orgnal tranng samples, from the new segmentaton results, or from the combnaton of both. Consderng the stuaton that the orgnal tranng sample set s usually a small set, we always generate bootstrap samples from the combned set of orgnal tranng samples and selected segmentaton results. Before combnng the segmentaton results wth orgnal tranng samples to generate bootstrap samples, the operator makes correctons to the orgnal segmentaton results by performng one or more of the followng operatons: Splttng: splt one segmented entry nto two or more ndvdual entres; Mergng: merge two more adacent entres nto one sngle entry; Reszng: change the sze of a segmented entry Movng: change the boundng box poston of a segmented entry Removng: remove a segmented entry Relabelng: change the type label of a entry In ther paper, Hamamoto et al. 2 analyze four dfferent procedures to generate bootstrap samples. We appled the two of the four procedures n our approach, whch are only dfferent n the computaton of weghts. Frst, let X N = { x, x,..., xn } 2 be a set extracted from the set of orgnal tranng samples and new selected segmentaton results for entry type, where x N ) are the feature vectors wth each vector element the probablty of the ( N =, 2 N wth sze N from the orgnal set, so one of the followng two procedures can be appled to generate the desred bootstrap set. Procedure : B b b b specfc feature entty. We generate a bootstrap sample set X { x x,..., x } ) Select one sample entry wth feature vector xr from X N ; 2) Fnd the k closest samples x r, xr 2,..., xrk to xr n the feature space; 3) Compute a bootstrap sample b k x = = w xr, where w s a weght whch s gven by: w =, k k = c c where s chosen from a unform dstrbuton on [,] and w = ; 4) Repeat untl all N samples are selected. Procedure 2: ) Select one sample entry wth feature vector xr from X N ; k = 2) Fnd the k closed samples x r, xr 2,..., xrk to xr n the feature space;

6 b k 3) Compute one bootstrap sample x = = x r ; k 4) Repeat untl all N samples are selected. In the frst step of both procedures, the sample entres are chosen such that no entry s selected more than once. Generated bootstrap samples are the lnear combnaton of the tranng samples n source. The dfference s that the frst approach combnes samples based on random weghts whle the second one generates bootstrap samples by computng an unweghted mean. We assume the document has consstent functonal structure. Some of the features occur n one entry type may never appear n another. For example, the lne ndent feature wll not appear n a sngle lne entry, whle the endng specal symbol may not appear n a entry that has a contnuaton part on the next page, so we generate the bootstrap samples for each predefned entry type. These entry types could be: Regular entry: A complete mult-lne entry that starts and ends n the same page; Contnuaton entry: A entry that s the contnuaton of a entry from the prevous page; Un-termnated entry: A entry that s not ended n one page and has a contnuaton part n the next page; Open entry: A entry that s the contnuaton part of a entry n the prevous page and does not end n the current page; Sngle-lne entry: A regular entry that contans only one sngle text lne; 2.4 Post-Processng Fgure 5. Segmentaton Errors Caused by OCR Errors. Zone output of OCR; Wrong segmentaton entry. The entry segmentaton result s heavly dependent on the zone segmentaton results. In other words, f the zone segmentaton result s ncorrect, t s mpossble to obtan correct entry segmentaton results from the zone segmentaton wthout any adustment (please refer to Fgure 5). So the task of the post-processng stage s: correct the ncorrect entry segmentaton results caused by ncorrect zone segmentaton and make the segmentaton approach more adaptve. The post-processng procedure can be brefly llustrated usng the flowchart n Fgure 6. The statstcal nformaton ncludes: average wdth of entry, average textlne heght of entry, average regular textlne wdth, word spacng wthn entry, relatve poston of nterested entry and so on. The man operaton n the post-processng stage s: browsng the words n each of entres wth sgnfcant dfference from the statstcal nformaton obtaned from tranng samples and reorganze words nto textlnes, and entres further. Fgure 7 shows the segmentaton result after post-processng, Fgure 6. Flowchart of Post-Processng Fgure 7. Corrected Result after Post-Processng

7 3. EXPERIMENT RESULTS AND PERFORMANCE EVALUATION We have appled ths presented approach to the segmentaton of the three categores of structured documents: () dctonares; () voce transcrptons; and () phone books; For the frst two categores, we ll provde the results and evaluaton, whle for the last category, we only show the segmentaton results. 3. Dctonary Segmentaton Results The segmentaton approach was appled to four dfferent dctonares wth dfferent structure features: French-Englsh dctonary (63 pages), Englsh-French dctonary (657 pages), Turksh-Englsh dctonary (99 pages). Englsh-Turksh dctonary (52 pages). The French-Englsh and the Englsh-French pages are taken from the same blngual dctonary, so they have the same features. Fgure 8- show the segmentaton results of these four dctonares. (c) Fgure 8. Englsh-French Dctonary Segmentaton Results. Word; Text-lne; (c) Entry (page number s nose). Fgure 9. Turksh-Englsh Dctonary Segmentaton Result (wth many sngle-lne entres) Fgure. Englsh-Turksh Dctonary Segmentaton Result (dfferent entry features) (c) Fgure. Progressve Performance Improvement Based on Bootstrappng (4 dctonares) Accuracy Rate Improvement; Intal Features & Weghts; (c)features & Weghts after Bootstrappng;

8 Fgure shows the performance mprovement for the dctonary segmentaton based on bootstrappng. The results evaluaton comes from the statstcal nformaton of 5 pages of each dctonary. The ntal segmentaton was based on four tranng entres (one tranng entry for each entry type descrbe n Secton 2.3). Iteratons follow the ntal segmentaton are based on added dfferent numbers of tranng entres whch are used to generate bootstrap samples. The chart n Fgure shows that the segmentaton can be refned step by step by applyng bootstrap technque. Fgure (c) show the extracted features and assgned weghts n the ntal step and after bootstrappng. It can be seen that the weghts were changed after bootstrappng. New features may be extracted n the bootstrappng step. For example, when LRSpace (lne-entry spacng dfference) feature exsts n a document, only provdng nonadacent entres s not suffcent to extract ths feature, but by bootstrappng, new tranng entres are added, whch makes t possble to extract ths feature. The fnal performance evaluaton s shown n Table. Obtanng ground truth on such a large data set s a very tme-consumng work, so the evaluaton s only based on the avalable ground truths of these dctonares. In Table, the Correct Entres and Incorrect Entres are two complementary parts of the evaluaton results, so the summaton of the four error percentages n Incorrect Entres and the percentage n Correct Entres s. Whle False Alarm error s a value to measure the mpact of nose on the segmentaton result, and Mslabeled Entres error s a measurement to measure the labelng of correctly segmented entres. From Table, we can see the presented segmentaton algorthm works well such that the lowest percentage of correct segmentaton s hgher than 96%, and the best result can even acheve hgher than 99%. Fgure 2, 3 show the examples of all errors (except mssed entry error, whch s easy to understand) lsted n Table. Among the ncorrect entry errors, overlapped error s less serous compared wth the other three error types because ths error s usually caused by: () character or symbol dstrbuted over two textlnes; or (2) textlne spacng s too small (Fgure 2). In Fgure 3, the false alarm occurs due to the nose. Images shown n Fgure 3(c) are two parts of one sngle entry whch are dvded and put n two adacent pages. The entry type n Fgure 3 should be untermnated, whch was labeled normal n the segmentaton result. Ths type of error s usually caused by a specal symbol whch s used to termnate one entry (. n ths case). Table. Result Evaluaton of Four Dctonares Document EnglshFrench FrenchEnglsh EnglshTurksh TurkshEnglsh Page No Total Entres Correct Entres 939 (96.%) 2372 (97.9%) 349 (99.26%) 2627 (98.98%) Mssed (.5%) (.4%) (.4%) Incorrect Entres Overlapped Merged 22 (%) 528 (2.62%) 39 (.6%) 6 (.25%) 6 (.7%) 4 (.%) 7 (.26%)) Splt 53 (.26%) 5 (.2%) 6 (.45%) 9 (.72%) False Alarm 43 (.2%) 6 (.25%) 8 (.23%) 2 (.8%) Mslabeled Entres 6 (.8%) 2 (.49%) (.3%) (.38%) (c) Fgure 2. Incorrect Entry Errors: Overlapped; Merged; (c) Splt (c) Fgure 3. False Alarm and Mslabelng Errors: False Alarm; Mslabelng Error;

9 3.2 APOLLO 5 Voce Transcrpton Segmentaton Results For the dctonary parsng problem, the motvaton s obvous. We wsh to segment the dctonares nto entres that can be tagged and used as lexcal resources. Typed transcrptons of audo content provde a related challenge. We are currently ntegratng audo, vdeo and photographs from the Apollo 5th msson nto an audo retreval nterface. The audo and scanned mages of typed transcrpts are from the Lunar Module (LM), the command Module (CM) and msson control. Our goal s to be able to synchronze references to these mages of the transcrptons to the audo as t s played. In order to do ths, our frst task was to segment the document mages nto spoken unts and label the tmes, sources and the spoken text regons. Whle ths text s not complcated, unparameterzed segmentatons wll not be as accurate as a modeled segmentaton. Fgure 4 shows the segmentaton results of the transcrpton. The transcrpton contans 5 dfferent parts (totally around 34 pages), and each ndvdual part has dfferent structure features. Table 2 shows the evaluaton results of the segmentaton based only on avalable ground truths, and the structure of table 2 s exactly the same as Table. Compared wth the four dctonares, these transcrpton documents have relatvely smpler structures and more obvous structure features, so the segmentaton results are more accurate than the dctonary segmentaton results, wth lowest percentage hgher than 98% and hghest percentage 99.87%. But due to the physcal nose and logcal nose (unnterested entry), the false alarm and mslabelng error are sgnfcantly hgher than the dctonary results, where the hghest false alarm error rate may acheve 8.37% (.25% n dctonary results), and hghest mslabelng rate may acheve 2.75% (.49% n dctonary results). () Fgure 4. Segmentaton Results of Voce Transcrpton ( and have dfferent features). Table 2. Result Evaluaton of Transcrptons Document Page No Total Entres Correct Entres Mssed Incorrect Entres Overlapped Merged Splt False Alarm Mslabeled Entres AS5_CM (99.2%) (.5%) 2 (.57%) 2 (.9%) 2 (.9%) 77 (3.65%) 5 (.23%) AS5_LM (98.7%) 35 (.78%) 3 (.5%) 73 (3.7%) (.48%) AS5_PAO 23 4 (99.2%) 6 (.53%) 3 (.27%) 94 (8.37%) 7 (.38%) AS5_PAC (99.22%) 4 (.26%) 8 (.52%) 8 (5.9%) 45 (2.75%) AS5_TEC (99.87%) 2 (.3%) 9 (.59%) Besdes the dctonares and transcrptons, we also appled ths segmentaton approach to several pages of phone book to test the robustness of ths approach, the result s shown n Fgure 5. The last textlne s gnored as nose (unnterested part). We are usng ScanSoft s Developer Kt 2 (SDK2) to obtan the zone segmentaton results. Snce our approach starts wth the OCR results, the segmentaton performance s sgnfcantly dependent on the OCR output; Once the OCR output has errors, the segmentaton result s bad even f the post-processng s Fgure 5. Segmentaton of Phone Book

10 appled to the segmentaton result. The fact that SDK2 only supports the recognton of Roman and Latn characters makes our segmentaton results for documents contanng dfferent language characters (such as Arabc, Hebrew) even worse. So part of our future work s to mprove our segmentaton approach to make t ndependent of OCR results, relax the restrcton that the current approach can only work on Roman and Latn languages. 4. SUMMARY AND FUTURE WORK In ths paper, we present an approach to page segmentaton usng a bootstrappng technque to learn a segmentaton model. The segmentaton system s frst traned usng a small set of samples. After the operator make correctons to a selected set of newly generated segmentaton results, these corrected results are combned wth the orgnal tranng set to generate a set of bootstrap samples whch are used to retran the system. Startng wth OCR results, ths approach can be appled to the segmentaton of any structured documents whose structure can be learned from tranng. We appled ths approach to many structured documents such as dctonares and voce transcrpt and obtaned satsfyng results. Experment results shows that the bootstrap technque can mprove the performance of segmentaton even wth a small set of tranng samples. Many of the structured documents contan pctures, fgures, tables, forms and some other dfferent content from the regular word content, whch makes the segmentaton more dffcult because these parts usually don t have consstent structures, although they may have consstent poston and sze. Another part of our future work s to extend our approach to document wth these elements. Currently we are only concentratng on black/whte documents, snce the stuaton that many documents are color document and dfferent colors often represent dfferent functonal entres, so, we are extendng our work to color document. In the color document analyss, color can be added to the feature space as a new feature. ACKNOWLEDGEMENTS Ths research s supported by DARPA TIDES proect under grant, the authors thank them for the support. REFERENCES. J. Lang, I.T. Phllps, R.M. Haralck, An optmzaton Methodology for Document Structure Extracton on Latn Character Documents, IEEE Tran. Pattern Analyss & Machne Intellgence, vol. 23:7, , July Y. Hamamoto, S. Uchmura, S Tomta, A Bootstrap Technque for Nearest Neghbor Classfer Desgn, IEEE Tran. Pattern Analyss & Machne Intellgence, vol 9:, 73-79, January B. Efron, Bootstrap Methods: Another Look at the Jackknfe, Annual Statstcs vol 7, -26, D. Doermann, H. Ma, B Karagol-Ayan, D. W. Oard, Translaton Lexcon Acquston from Blngual Dctonares, Proc. SPIE Conf. On Document Recognton and Retreval, 37-48, San Jose, CA, January, T. Kanungo, S. Mao, Stochastc Languate Model for Analyzng Document Physcal Layout, Proc. SPIE Conf. On Document Recognton and Retreval, San Jose, CA, January, S. Lee, D. Ryu, Parameter-Free Geometrc Document Layout Analyss, IEEE Tran. Pattern Analyss & Machne Intellgence, vol 23:, , November, S. Mao, T. Kanungo, Stochastc Language Models for Automatc Acquston of Lexcons from Prnted Blngual Dctonares. DLIA2 Advance Program, Seattle, WA, Sep R.M. Haralck. Document mage understandng: Geometrc and logcal layout. Proc. Int. Conf. on Computer Vson and Pattern Recognton, , Seattle, WA, L. Robadey, O. Htz, R. Ingold, A pattern-based method for document structure recognton, DLIA 2 Advance Program, Seattle, WA, Sep. 2.. D. Malerba, F. Esposto, Learnng Rules for Layout Analyss Correcton, DLIA 2 Advance Program, Seattle, WA, Sep. 2. G. E. Kopec, P. A. Chou, Document Image Decodng usng Markov Source Models, IEEE Tran. Pattern Analyss & Machne Intellgence, vol.6:6, 62-67, June, 994. K. Urwn. Langenschedt s Standard French Dctonary. Germany, 988.

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS by XUNYU PAN (Under the Drecton of Suchendra M. Bhandarkar) ABSTRACT In modern tmes, more and more

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng, Natonal Unversty of Sngapore {shva, phanquyt, tancl }@comp.nus.edu.sg

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection 2009 10th Internatonal Conference on Document Analyss and Recognton A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng,

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Identifying Table Boundaries in Digital Documents via Sparse Line Detection

Identifying Table Boundaries in Digital Documents via Sparse Line Detection Identfyng Table Boundares n Dgtal Documents va Sparse Lne Detecton Yng Lu, Prasenjt Mtra, C. Lee Gles College of Informaton Scences and Technology The Pennsylvana State Unversty Unversty Park, PA, USA,

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Accurate Overlay Text Extraction for Digital Video Analysis

Accurate Overlay Text Extraction for Digital Video Analysis Accurate Overlay Text Extracton for Dgtal Vdeo Analyss Dongqng Zhang, and Shh-Fu Chang Electrcal Engneerng Department, Columba Unversty, New York, NY 10027. (Emal: dqzhang, sfchang@ee.columba.edu) Abstract

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

An Approach to Real-Time Recognition of Chinese Handwritten Sentences

An Approach to Real-Time Recognition of Chinese Handwritten Sentences An Approach to Real-Tme Recognton of Chnese Handwrtten Sentences Da-Han Wang, Cheng-Ln Lu Natonal Laboratory of Pattern Recognton, Insttute of Automaton of Chnese Academy of Scences, Bejng 100190, P.R.

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

A User Selection Method in Advertising System

A User Selection Method in Advertising System Int. J. Communcatons, etwork and System Scences, 2010, 3, 54-58 do:10.4236/jcns.2010.31007 Publshed Onlne January 2010 (http://www.scrp.org/journal/jcns/). A User Selecton Method n Advertsng System Shy

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Efficient Video Coding with R-D Constrained Quadtree Segmentation

Efficient Video Coding with R-D Constrained Quadtree Segmentation Publshed on Pcture Codng Symposum 1999, March 1999 Effcent Vdeo Codng wth R-D Constraned Quadtree Segmentaton Cha-Wen Ln Computer and Communcaton Research Labs Industral Technology Research Insttute Hsnchu,

More information

PRÉSENTATIONS DE PROJETS

PRÉSENTATIONS DE PROJETS PRÉSENTATIONS DE PROJETS Rex Onlne (V. Atanasu) What s Rex? Rex s an onlne browser for collectons of wrtten documents [1]. Asde ths core functon t has however many other applcatons that make t nterestng

More information

Real-Time View Recognition and Event Detection for Sports Video

Real-Time View Recognition and Event Detection for Sports Video Real-Tme Vew Recognton and Event Detecton for Sports Vdeo Authors: D Zhong and Shh-Fu Chang {dzhong, sfchang@ee.columba.edu} Department of Electrcal Engneerng, Columba Unversty For specal ssue on Multmeda

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

ICDAR2005 Page Segmentation Competition

ICDAR2005 Page Segmentation Competition ICDAR2005 Page Segmentaton Competton A. Antonacopoulos 1, B. Gatos 2 and D. Brdson 1 1 Pattern Recognton and Image Analyss (PRImA) Research Lab School of Computng, Scence and Engneerng, Unversty of Salford,

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

A new segmentation algorithm for medical volume image based on K-means clustering

A new segmentation algorithm for medical volume image based on K-means clustering Avalable onlne www.jocpr.com Journal of Chemcal and harmaceutcal Research, 2013, 5(12):113-117 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCRC5 A new segmentaton algorthm for medcal volume mage based

More information

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks Decson Strateges for Ratng Objects n Knowledge-Shared Research etwors ALEXADRA GRACHAROVA *, HAS-JOACHM ER **, HASSA OUR ELD ** OM SUUROE ***, HARR ARAKSE *** * nsttute of Control and System Research,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Modeling Waveform Shapes with Random Effects Segmental Hidden Markov Models

Modeling Waveform Shapes with Random Effects Segmental Hidden Markov Models Modelng Waveform Shapes wth Random Effects Segmental Hdden Markov Models Seyoung Km, Padhrac Smyth Department of Computer Scence Unversty of Calforna, Irvne CA 9697-345 {sykm,smyth}@cs.uc.edu Abstract

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

MATHEMATICS FORM ONE SCHEME OF WORK 2004

MATHEMATICS FORM ONE SCHEME OF WORK 2004 MATHEMATICS FORM ONE SCHEME OF WORK 2004 WEEK TOPICS/SUBTOPICS LEARNING OBJECTIVES LEARNING OUTCOMES VALUES CREATIVE & CRITICAL THINKING 1 WHOLE NUMBER Students wll be able to: GENERICS 1 1.1 Concept of

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Multi-stable Perception. Necker Cube

Multi-stable Perception. Necker Cube Mult-stable Percepton Necker Cube Spnnng dancer lluson, Nobuuk Kaahara Fttng and Algnment Computer Vson Szelsk 6.1 James Has Acknowledgment: Man sldes from Derek Hoem, Lana Lazebnk, and Grauman&Lebe 2008

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

Dynamic Camera Assignment and Handoff

Dynamic Camera Assignment and Handoff 12 Dynamc Camera Assgnment and Handoff Br Bhanu and Ymng L 12.1 Introducton...338 12.2 Techncal Approach...339 12.2.1 Motvaton and Problem Formulaton...339 12.2.2 Game Theoretc Framework...339 12.2.2.1

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Pictures at an Exhibition

Pictures at an Exhibition 1 Pctures at an Exhbton Stephane Kwan and Karen Zhu Department of Electrcal Engneerng Stanford Unversty, Stanford, CA 9405 Emal: {skwan1, kyzhu}@stanford.edu Abstract An mage processng algorthm s desgned

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

MOTION BLUR ESTIMATION AT CORNERS

MOTION BLUR ESTIMATION AT CORNERS Gacomo Boracch and Vncenzo Caglot Dpartmento d Elettronca e Informazone, Poltecnco d Mlano, Va Ponzo, 34/5-20133 MILANO boracch@elet.polm.t, caglot@elet.polm.t Keywords: Abstract: Pont Spread Functon Parameter

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information